[Haskell-beginners] Performance problem

Radosław Szymczyszyn lavrin at gmail.com
Wed May 30 11:07:22 CEST 2012


Hello!

I've had a similar problem with text processing discussed on the list
some time ago (a topic about implementing a spellchecker). To keep
things short'n'simple: built-in Haskell Strings are inefficient as
they're simply lists of Chars, i.e. a String is in fact just a [Char].

The usually suggested solution to this problem is using the ByteString
type which comes from bytestring package. It's probably all nice when
you only need ASCII/Latin encodings, but it bite me when processing
Unicode (e.g. Data.ByteString.UTF8 doesn't have a words function,
though Data.ByteString has one). However, the performance is good.

The best solution as far as I have researched is the text package and
type Text. It ought to support Unicode as far as I remember and has
got all the useful list-like functions. As I hadn't yet had an
occasion to play with it before, I took your code and adapted it to
use Text and Text.IO. Let me know what are the results, as I haven't
got any test set to compare the speed before and after the
modifications.

=== CODE

-- Problem id: HASHADQI

import qualified Data.List as List
import qualified Data.IntMap as Map
import Data.Maybe

import Data.Text (Text)
import qualified Data.Text    as T
import qualified Data.Text.IO as T

type Person = (Text,Text,Text,Text)
type IntPersonMap = Map.IntMap Person

main = do
 input <- T.getContents
 seqAction Map.empty $ T.lines input

seqAction :: IntPersonMap -> [Text] -> IO IntPersonMap
seqAction m [] = return m
seqAction m (l:ls) = do
 m' <- doAction m l
 seqAction m' ls

doAction :: IntPersonMap -> Text -> IO IntPersonMap
doAction m cmd = do
 case T.unpack (T.take 1 cmd) of
   "a" -> doInsert m $ T.words cmd
   "d" -> doDelete m $ T.words cmd
   "i" -> doInfo m $ T.words cmd
   "q" -> doQuery m $ T.words cmd
   [] -> return m

doInsert :: IntPersonMap -> [Text] -> IO IntPersonMap
doInsert m [_, idText, fn, ln, bd, pn] = do
 let id = read (T.unpack idText) :: Int
 if Map.member id m
   then do putStrLn $ "ID " ++ show id ++ " ja cadastrado."
           return m
   else return (Map.insert id (fn, ln, bd, pn) m)

doDelete :: IntPersonMap -> [Text] -> IO IntPersonMap
doDelete m [_, idText] = do
 let id = read (T.unpack idText) :: Int
 if Map.member id m
   then return (Map.delete id m)
   else do putStrLn $ "ID " ++ show id ++ " nao existente."
           return m

doInfo :: IntPersonMap -> [Text] -> IO IntPersonMap
doInfo m [_, idText] = do
 let id = read (T.unpack idText) :: Int
 case Map.lookup id m of
   Just (fn, ln, bd, pn) -> do putStrLn . show $ T.unwords [fn, ln, bd, pn]
                               return m
   Nothing -> do putStrLn $ "ID " ++ show id ++ " nao existente."
                 return m

doQuery :: IntPersonMap -> [Text] -> IO IntPersonMap
doQuery m (_:qs) = do
 let test = (\x -> foldl (&&) True $ map ($x) $ makePredicate qs)
     result = Map.filter test m
 putStrLn $ unwords . map show $ Map.keys result
 return m

makePredicate :: [Text] -> [(Person -> Bool)]
makePredicate [] = []
makePredicate (q:qs) =
 case (\(a,b) -> (T.unpack a, b)) (T.break (==':') q) of
   ("fn", x) -> (\(fn,_,_,_) -> fn == (T.drop 1 x)) : (makePredicate qs)
   ("ln", x) -> (\(_,ln,_,_) -> ln == (T.drop 1 x)) : (makePredicate qs)
   ("bd", x) -> (\(_,_,bd,_) -> bd == (T.drop 1 x)) : (makePredicate qs)
   ("pn", x) -> (\(_,_,_,pn) -> pn == (T.drop 1 x)) : (makePredicate qs)

=== END CODE

Regards,
Radek Szymczyszyn



More information about the Beginners mailing list