Too Strict?
Steinitz, Dominic J
Dominic.J.Steinitz@BritishAirways.com
15 Jan 2001 09:08:13 Z
Can someone help? The program below works fine with small files but when I try to use it on the one I need to (about 3 million lines of data) it produces no output. The hard disk is hammered - I assume this is the run time system paging. My suspicion is that the program is trying to read in the whole file before processing it. Is this correct? If so, how do I make the program lazy so that it processes a line at time?
By the way, the MD5 function which I use and is included as part of HSLIBS has the type String -> IO String. The MD5 algorithm really is a function and should have type String -> String. Do people agree and if so how do I get it changed?
Dominic.
-- Compile with ghc -o test test.hs -static -package util
-- under Windows.
module Main(main) where
import IO(openFile,
hPutStr,
IOMode(ReadMode,WriteMode,AppendMode))
import MD5
import Char
-- showHex and showHex' convert the hashed values to
-- human-readable hexadecimal strings.
showHex :: Integer -> String
showHex =
map hexDigit .
map (fromInteger . (\x -> mod x 16)) .
takeWhile (/=0) .
iterate (\x -> div x 16) .
toInteger
hexDigit x
| (0 <= x) && (x <= 9) = chr(ord '0' + x)
| (10 <= x) && (x <=16) = chr(ord 'a' + (x-10))
| otherwise = error "Outside hexadecimal range"
powersOf256 = 1 : map (*256) powersOf256
showHex' x =
showHex $
sum (zipWith (*)
(map ((\x -> (mod x 16)*16 + (div x 16)) .
toInteger .
ord) x)
powersOf256)
-- The type Anon and function anonymize hide the anonymisation
-- process. In this case, it's a hash function
-- digest :: String -> IO String which implements MD5.
type Anon a = IO a
class Anonymizable a where
anonymize :: a -> Anon a
-- MyString avoids overlapping instances of Strings
-- with the [Char]
data MyString = MyString String
deriving Show
instance Anonymizable MyString where
anonymize (MyString x)
= do s <- digest x
return ((MyString . showHex') s)
instance Anonymizable a => Anonymizable [a] where
anonymize xs = mapM anonymize xs
filename = "ldif1.txt"
fileout = "ldif.out"
readAndWriteAttrVals =
do h <- openFile fileout WriteMode
s <- readFile filename
a <- anonymize((map MyString) (lines s))
hPutStr h (unlines (map (\(MyString x) -> x) a))
main = readAndWriteAttrVals
-------------------------------------------------------------------------------------------------
21st century air travel http://www.britishairways.com