[Haskell-cafe] strict version of Haskell - does it exist?
Marc Weber
marco-oweber at gmx.de
Mon Jan 30 17:12:19 CET 2012
Replying to all replies at once:
> Malcolm Wallace
> At work, we have a strict version of Haskell
:-) which proofs that it is worth thinking about it.
> Ertugrul
> If you want to save the time to learn how to write efficient Haskell
> programs, you may want to have a look into the Disciple language.
Yes - I will. Its on my TODO list for at least 12 month :(
Not sure whether there are parser combinator libraries yet.
@ Herbert Valerio Riedel (suggesting aeson)
I gave it yet another try - see below.
It still fails.
@ Felipe Almeida Lessa (suggesting conduits and atto parsec)
I mentioned that I already tried it. Counting lines only was a lot slower than
counting lines and parsing JSON using PHP.
@ Chris Wong
> flag is that strictness isn't just something you turn on and off willy-nilly
You're right. But those features are not required for all tasks :)
Eg have a look at Data.Map. Would a strict implementation look that different?
I came up with this now. Trying strict bytestrings and Aeson.
note the LB.fromChunks usage below to turn a strict into a lazy bytestring.
Result: PHP script doing the same runs in 33secs (using the
AFindCheckoutsAndContacts branch) The haskell program below - I stopped it
after 8 min. (At least it does no longer cause my system to swap ..
You're right: I could start profiling. I could learn about how to optimize it.)
But why? The conclusion is that if I ever use yesod - and if I ever want to
analyse logs - I should call PHP from yesod as external process !? :-(
Even if I had a core i7 (8 threads in parallel) I still would not be sure
whether Haskell would be the right choice. I agree that I assume that all data
fits into memory so that piecewise evaluation doesn't matter too much.
Thanks for all your help - it proofs once again that the Haskell
community is the most helpful I know about. Yet I think me being the
programmer Haskell is the wrong choice for this task.
Thanks Marc Weber
my new attempt - now I should start profiling.. Looks like I haven't
built all libs with profiling support ..
import Data.Aeson.Types
import Data.Aeson
import Data.List
import Control.Applicative
import Debug.Trace
import qualified Data.Map as M
import Action
import Data.Aeson.Parser as AP
import qualified Data.ByteString.Lazy as LB
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as LBC8
data Action = ACountLine | AFindCheckoutsAndContacts
-- lines look like this:
-- {"id":"4ee535f01550c","r":"","ua":"Mozilla\/5.0 (compatible; bingbot\/2.0; +http:\/\/www.bing.com\/bingbot.htm)","t":1323644400,"k":"XXX","v":"YY"}
data Item = Item {
id :: SB.ByteString,
ua :: SB.ByteString,
t :: Int,
k :: SB.ByteString,
v :: SB.ByteString
}
instance FromJSON Item where
parseJSON (Object v) = Item <$>
v .: "id" <*>
v .: "ua" <*>
v .: "t" <*>
v .: "k" <*>
v .: "v"
parseJSON _ = empty
run :: Action -> [FilePath] -> IO ()
run AFindCheckoutsAndContacts files = do
-- find all ids quering the server more than 100 times.
-- do so by building a map counting queries
(lines :: [BS.ByteString]) <- fmap (concat . (map LBC8.lines) ) $ mapM BS.readFile files
let processLine :: (M.Map BS.ByteString Int) -> BS.ByteString -> (M.Map BS.ByteString Int)
processLine st line = case decode' (LB.fromChunks [line]) of
Nothing -> st -- traceShow ("bad line " ++ (LBC8.unpack line)) st
Just (Item id ua t k v) -> M.insertWith (+) k 1 st
let count_by_id = foldl' processLine (M.empty) lines
mapM_ print $ M.toList $ M.filter (> 100) count_by_id
More information about the Haskell-Cafe
mailing list