[Haskell-cafe] strict version of Haskell - does it exist?
Felipe Almeida Lessa
felipe.lessa at gmail.com
Tue Jan 31 10:33:06 CET 2012
On Tue, Jan 31, 2012 at 6:05 AM, Marc Weber <marco-oweber at gmx.de> wrote:
> I didn't say that I tried your code. I gave enumerator package a try
> counting lines which I expected to behave similar to conduits
> because both serve a similar purpose.
> Then I hit the the "sourceFile" returns chunked lines issue (reported
> it, got fixed) - ....
>
> Anyway: My log files are a json dictionary on each line:
>
> { id : "foo", ... }
> { id : "bar", ... }
>
> Now how do I use the conduit package to split a "chunked" file into lines?
> Or should I create a new parser "many json >> newline" ?
Currently there are two solutions. The first one is what I wrote
earlier on this thread:
jsonLines :: C.Resource m => C.Conduit B.ByteString m Value
jsonLines = C.sequenceSink () $ do
val <- CA.sinkParser json'
CB.dropWhile isSpace_w8
return $ C.Emit () [val]
This conduit will run the json' parser (from aeson) and then drop any
whitespace after that. Note that it will correctly parse all of your
files but will also parse some files that don't conform to your
specification. I assume that's fine.
The other solution is going to released with conduit 0.2, probably
today. There's a lines conduit that splits the file into lines, so
you could write jsonLines above as:
mapJson :: C.Resource m => C.Conduit B.ByteString m Value
mapJson = C.sequenceSink () $ do
val <- CA.sinkParser json'
return $ C.Emit () [val]
which doesn't need to care about newlines, and then change main to
main = do
...
ret <- forM_ fileList $ \fp -> do
C.runResourceT $
CB.sourceFile fp C.$=
CB.lines C.$= -- new line is here
mapJson C.$=
CL.mapM processJson C.$$
CL.consume
print ret
I don't know which solution would be faster. Either way, both
solutions will probably be faster with the new conduit 0.2.
> Except that I think my processJson for this test should look like this
> because I want to count how often the clients queried the server.
> Probalby I should also be using CL.fold as shown in the test cases of
> conduit. If you tell me how you'd cope with the "one json dict on each
> line" issue I'll try to benchmark this solution as well.
This issue was already being coped with in my previous e-mail =).
> -- probably existing library functions can be used here ..
> processJson :: (M.Map T.Text Int) -> Value -> (M.Map T.Text Int)
> processJson m value = case value of
> Ae.Object hash_map ->
> case HMS.lookup (T.pack "id") hash_map of
> Just id_o ->
> case id_o of
> Ae.String id -> M.insertWith' (+) id 1 m
> _ -> m
> _ -> m
> _ -> m
Looks like the perfect job for CL.fold. Just change those three last
lines in main from
... C.$=
CL.mapM processJson C.$$
CL.consume
into
... C.$$
CL.fold processJson
and you should be ready to go.
Cheers!
--
Felipe.
More information about the Haskell-Cafe
mailing list