[Haskell] Probably a trivial thing for people knowing Haskell

Paul Johnson paul at cogito.org.uk
Sat Oct 18 13:48:21 EDT 2008


Friedrich wrote:
> I've written just a few programs in Haskell one in a comparison for a
> task I had "nearly daily".
>   
The first thing I notice is that this is clearly a direct translation 
from something like Perl.  Thats understandable, but I'd suggest 
rewriting it with something like this (untested, uncompiled code)

-- Concatenate all the files into one big string.  File reading is lazy, 
so this won't take all the memory.
getAllFiles :: [String] -> IO String
getAllFiles paths = do
   contents <- mapM getFile paths
   return $ concat contents

Then use "lines" to split the result into individual lines and process 
them using "filter", "map" and "foldr".  Because file reading is lazy, 
each line is only read when it is to be processed, and then gets reaped 
by the garbage collector.  So it all runs in constant memory.

(By the way, putting in the top level type declarations helps a lot when 
you make a mistake.)

One thing you are doing right is keeping a (sum, count) pair.  A gotcha 
with Haskell is to compute an average of a list of numbers like this:

   mean :: [Double] -> Double
   mean xs = sum xs / fromIntegral (length xs)

The problem with this is that it has to traverse the list twice, which 
means that the whole list has to be held in memory.  So instead you have 
to write something like:

   mean xs = let (total, count) = foldr (\x (t, c) -> (t + x, c+1)) 
(0.0, 0) xs in total / fromIntegral count

This is a pain, but it does only traverse the list once.

See how you get on.

Paul.


> The code analyzes Apache logs and picks some certain stuff from it and
> after that calculates a bit around with it.
>
> Here's the code
> module Main where
> import System
> import System.IO
> import System.Directory
> import System.IO.Error
> import Text.Regex
> import Control.Monad
>
> regexp = mkRegex ("([0-9]+) Windows ex")
>
> main = do
>        files <- show_dir "[0-9].*"
>        (sum,count) <- run_on_all_files (0,0) files
>        let dd = (fromIntegral (sum::Integer))/ (fromIntegral (count::Int))
>            in
>             putStr("Download = " ++ show sum ++ " in " ++ show count ++ " days are " ++ show dd ++ " downloads/day\n") 
>
>
>
>
> run_on_all_files (a,b) [] = return (a,b)
> run_on_all_files (a,b) (x:xs) = do (s,c) <- run_on(a,b) x
>                                    run_on_all_files (s,c) xs
>
>
> run_on (a,b) file_name = do
>     handle <- openFile file_name ReadMode
>     (sum,count) <- for_each_line (a,b) handle
>     hClose handle
>     return ((sum,count))
>                               
> for_each_line (sum,count) handle = do
>                        l <- try (hGetLine handle)
>                        case l of
>                               Left err 
>                                   | isEOFError err -> return(sum,count)
>                                   | otherwise -> ioError err
>                               Right line  -> do 
>                                              let (nsum, ncount) = check_line line sum count
>                                              for_each_line (nsum,ncount) handle 
>                                                                                 
>                        
>           
> check_line line sum count =  
>     let match = matchRegex regexp line
>         in case match of
>                Just strs -> (sum + read (head strs) :: Integer, count + 1)
>                Nothing -> (sum, count)
>     
>                     
>                             
>     
> show_dir regmatch = do   
>                     files <- getDirectoryContents "."
>                     let reg = mkRegex regmatch in
>                               return(filter (\file_name -> let fm = matchRegex reg file_name
>                                       in case fm of
>                                       Just strs -> True
>                                       Nothing -> False) files)
>
>
> The point is this code works if there are just say a few files
> files to check. But  it trashes my machine with around 1751 files.
>
> It sucks memory as wild and so it does not run as I  think it should.
>
> I think I've overseen something which is bad written. Would you mind
> to  tell me where I did "extraordinarily" bad.
>
> With best regards
> Friedrich
>
>
>
> _______________________________________________
> Haskell mailing list
> Haskell at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell
>
>   



More information about the Haskell mailing list