[Haskell-cafe] help with Haskell performance
Don Stewart
dons at galois.com
Sat Nov 7 16:50:57 EST 2009
gpnair78:
> I really hope I'm missing some obvious optimization that's making it so slow
> compared to the perl version, hence this email soliciting feedback.
Here's my first attempt. 1.5s on a 2M line log file in the format you give.
General notes:
* unpack is almost always wrong.
* list indexing with !! is almost always wrong.
* words/lines are often wrong for parsing large files (they build large list structures).
* toList/fromList probably aren't the best strategy
* sortBy (comparing snd)
* use insertWith'
Spefically, avoid constructing intermediate lists, when you can process the
entire file in a single pass. Use O(1) bytestring substring operations like
take and drop.
Compiling:
$ ghc -O2 /tmp/B.hs --make
Running:
$ time /tmp/B
("GET /url1 HTTP/1.1]",1000000)
("GET /url2 HTTP/1.0]",500000)
/tmp/B 1.38s user 0.21s system 99% cpu 1.595 total
And the code:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE BangPatterns #-}
import qualified Data.ByteString.Char8 as L
import qualified Data.Map as M
main = do
l <- L.readFile "/tmp/x"
mapM_ print . M.toList $ go l M.empty
where
go !s !acc
| L.null s = acc
| " 08:" `L.isPrefixOf` s1 = go s4 acc'
| otherwise = go s4 acc
where
s1 = L.drop 11 s -- drop prefix to timestamp
-- now extract the key
(_,s2) = L.breakSubstring "GET" s1
(k,s3) = L.break ((==) ':') s2
-- drop the rest of the line
s4 = L.tail (L.dropWhile ((/=) '\n') s3)
acc' = M.insertWith' (+) k 1 acc
------------------------------------------------------------------------
More information about the Haskell-Cafe
mailing list