[Haskell-cafe] WideFinder

Berlin Brown berlin.brown at gmail.com
Sat Nov 10 03:36:28 EST 2007


Sterling Clover wrote:
> Um... you do realize that the code is only supposed to match against 
> very specific lines in sample data sets that Bray provides, right? If 
> your access log doesn't have lines exactly like those (and why would 
> it?) then there's no reason to expect a result.
>
> --S
>
> On Nov 9, 2007, at 11:19 PM, Berlin Brown wrote:
>
>> Sterling Clover wrote:
>>> I hacked together a version that I'm pretty happy with today. 
>>> Started off trying an algorithm with channels and forking, then 
>>> realized that in Haskell thanks to referential transparency we can 
>>> get parallelism almost for free, and redid it all in 
>>> Control.Parallel (below). Unfortunately, I don't have a multicore 
>>> processor so I can't put this through any special paces. However, 
>>> its compactness and expressively match or beat the simple Ruby, etc. 
>>> scripts while it gets (theoretically) most of the parallel benefits 
>>> of the enormous and unwieldy Erlang and JOcaml ones.
>>>
>>> --S
>>>
>>> module Main where
>>> import qualified Data.ByteString.Lazy.Char8 as LB
>>> import Data.List (foldl', unfoldr, insertBy)
>>> import qualified Data.Map as M
>>> import System.Environment (getArgs)
>>> import Control.Parallel (par)
>>> import Control.Parallel.Strategies (parMap, rwhnf)
>>>
>>> count :: M.Map LB.ByteString Int -> LB.ByteString -> M.Map 
>>> LB.ByteString Int
>>> count m line = if LB.pack "/ongoing/When" `LB.isPrefixOf` myLn then 
>>> M.insertWith' (+) (LB.drop 14 myLn) 1 m else m
>>> where myLn = (LB.takeWhile (/=' ') . LB.dropWhile (/='/') . 
>>> LB.dropWhile (/='\"')) line
>>>
>>> mapUnionPar :: (Ord k, Num a) => [M.Map k a] -> M.Map k a
>>> mapUnionPar m = head $ until (null . tail) mapUnionPar' m
>>> where a |:| b = par a . par b $ a : b
>>> mapUnionPar' (x:x':xs) = (M.unionWith (+) x x' |:| mapUnionPar' xs)
>>> mapUnionPar' x = x
>>>
>>> newPar :: FilePath -> IO (M.Map LB.ByteString Int)
>>> newPar = ((mapUnionPar . parMap rwhnf (foldl' count M.empty) . 
>>> chunkify . LB.lines) `fmap`) . LB.readFile
>>> where chunkify = unfoldr (\x -> if null x then Nothing else Just 
>>> (splitAt 512 x))
>>>
>>> main = mapM_ ((print . fst . foldl' takeTop ([],[]) . M.toList =<<) 
>>> . newPar) =<< getArgs
>>> where takeTop ac@(bs,low) a = if null low || (snd . head) low < snd 
>>> a then (splitAt 10 . insertBy ((. snd) . flip compare . snd) a) bs 
>>> else ac
>>>
>>>
>>> On Nov 7, 2007, at 9:06 AM, Bayley, Alistair wrote:
>>>
>>>>> From: haskell-cafe-bounces at haskell.org
>>>>> [mailto:haskell-cafe-bounces at haskell.org] On Behalf Of manu
>>>>>
>>>>> Haskell is conspicuously absent from the languages used to tackle Tim
>>>>
>>>>> Bray's Wide Finder problem
>>>>>
>>>> (http://www.tbray.org/ongoing/When/200x/2007/10/30/WF-Results?updated). 
>>>>
>>>>> So far we have Ocaml, Erlang, Python, Ruby, etc...
>>>>
>>>> Tim Bray mentions that GHC won't build on Solaris, so presumably that
>>>> problem would need to be solved before Haskell appears in his table. I
>>>> see that there are Solaris binary packages:
>>>> http://www.haskell.org/ghc/download_ghc_661.html#sparcsolaris
>>>>
>>>> so perhaps he just needs to be pointed to them?
>>>>
>>>> Alistair
>>>> *****************************************************************
>>>> Confidentiality Note: The information contained in this message,
>>>> and any attachments, may contain confidential and/or privileged
>>>> material. It is intended solely for the person(s) or entity to
>>>> which it is addressed. Any review, retransmission, dissemination,
>>>> or taking of any action in reliance upon this information by
>>>> persons or entities other than the intended recipient(s) is
>>>> prohibited. If you received this in error, please contact the
>>>> sender and delete the material from any computer.
>>>> *****************************************************************
>>>> _______________________________________________
>>>> Haskell-Cafe mailing list
>>>> Haskell-Cafe at haskell.org
>>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>>>
>>> _______________________________________________
>>> Haskell-Cafe mailing list
>>> Haskell-Cafe at haskell.org
>>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>> You didn't do a whole lot when I tried to run it. I know I am being 
>> mean, but that seems to be what Tim Bray is doing. He takes code and 
>> if it doesnt work, he isn't spending 3 weeks to figure it out.
>>
>> So, I would just like to comment. I ran your code against an 
>> access.log file and it gave me this:
>>
>> []
>>
>> ./a.out access.log
>>
>>
>
Which data set did you test it on?


More information about the Haskell-Cafe mailing list