Too Strict?

Steinitz, Dominic J Dominic.J.Steinitz@BritishAirways.com
15 Jan 2001 09:08:13 Z


Can someone help? The program below works fine with small files but when I try to use it on the one I need to (about 3 million lines of data) it produces no output. The hard disk is hammered - I assume this is the run time system paging. My suspicion is that the program is trying to read in the whole file before processing it. Is this correct? If so, how do I make the program lazy so that it processes a line at time?

By the way, the MD5 function which I use and is included as part of HSLIBS has the type String -> IO String. The MD5 algorithm really is a function and should have type String -> String. Do people agree and if so how do I get it changed?

Dominic.

-- Compile with ghc -o test test.hs -static -package util
-- under Windows.

module Main(main) where

import IO(openFile,
          hPutStr,
          IOMode(ReadMode,WriteMode,AppendMode))
import MD5
import Char

-- showHex and showHex' convert the hashed values to 
-- human-readable hexadecimal strings.

showHex :: Integer -> String

showHex =
    map hexDigit .
    map (fromInteger . (\x -> mod x 16)) .
    takeWhile (/=0) .
    iterate (\x -> div x 16) .
    toInteger

hexDigit x
   | (0 <= x) && (x <= 9)  = chr(ord '0' + x)
   | (10 <= x) && (x <=16) = chr(ord 'a' + (x-10))
   | otherwise             = error "Outside hexadecimal range"

powersOf256 = 1 : map (*256) powersOf256

showHex' x = 
   showHex $ 
   sum (zipWith (*) 
                (map ((\x -> (mod x 16)*16 + (div x 16)) . 
                     toInteger . 
                     ord) x) 
                powersOf256)

-- The type Anon and function anonymize hide the anonymisation
-- process. In this case, it's a hash function
-- digest :: String -> IO String which implements MD5. 

type Anon a = IO a

class Anonymizable a where
   anonymize :: a -> Anon a

-- MyString avoids overlapping instances of Strings 
-- with the [Char]

data MyString = MyString String
   deriving Show

instance Anonymizable MyString where
   anonymize (MyString x)
      = do s <- digest x
           return ((MyString . showHex') s)

instance Anonymizable a => Anonymizable [a] where
   anonymize xs = mapM anonymize xs

filename = "ldif1.txt"
fileout  = "ldif.out"

readAndWriteAttrVals =
   do h <- openFile fileout WriteMode
      s <- readFile filename
      a <- anonymize((map MyString) (lines s))
      hPutStr h (unlines (map (\(MyString x) -> x) a)) 

main = readAndWriteAttrVals

-------------------------------------------------------------------------------------------------
21st century air travel     http://www.britishairways.com