[Haskell-cafe] performance question

briand at aracnet.com briand at aracnet.com
Wed Feb 13 05:32:01 CET 2013


On Tue, 12 Feb 2013 15:57:37 -0700
Nicolas Bock <nicolasbock at gmail.com> wrote:

> >  Here is haskell version that is faster than python, almost as fast as c++.
> > You need to install bytestring-lexing package for readDouble.


I was hoping Branimir could comment on how the improvements were allocated.

how much is due to text.regex.pcre (which looks to be a wrapper to libpcre) ?

how much can be attributed to using data.bytestring ?

you have to admit, it's amazing how well a byte-compiled, _dynamically typed_ interpreter can do against an actualy native code compiler.  Can't regex be done effectively in haskell ?  Is it something that can't be done, or is it just such minimal effort to link to pcre that it's not worth the trouble ?


Brian

> > import Text.Regex.PCRE
> > import Data.Maybe
> > import Data.Array.IO
> > import Data.Array.Unboxed
> > import qualified Data.ByteString.Char8 as B
> > import Data.ByteString.Lex.Double (readDouble)
> >
> > strataBounds :: UArray Int Double
> > strataBounds = listArray (0,10) [ 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5,
> > 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 ]
> >
> > newStrataCounts :: IO(IOUArray Int Int)
> > newStrataCounts = newArray (bounds strataBounds) 0
> >
> > main = do
> >     l <- B.getContents
> >     let a = B.lines l
> >     strataCounts <- newStrataCounts
> >     n <- calculate strataCounts a 0
> >     let
> >         printStrataCounts :: IO ()
> >         printStrataCounts = do
> >             let s = round $ sqrt (fromIntegral n::Double) :: Int
> >             printf "read %d matrix elements (%dx%d = %d)\n" n s s n
> >             printStrataCounts' 0 0
> >         printStrataCounts' :: Int -> Int -> IO ()
> >         printStrataCounts' i total
> >             | i < (snd $ bounds strataBounds) = do
> >                 count <- readArray strataCounts i
> >                 let
> >                     p :: Double
> >                     p = (100.0*(fromIntegral count) ::
> > Double)/(fromIntegral n :: Double)
> >                 printf "[%1.2e, %1.2e) = %i (%1.2f%%) %i\n" (strataBounds
> > ! i) (strataBounds ! (i+1))
> >                                                                 count p
> > (total + count)
> >                 printStrataCounts' (i+1) (total+count)
> >             | otherwise = return ()
> >     printStrataCounts
> >
> > calculate :: IOUArray Int Int -> [B.ByteString] -> Int -> IO Int
> > calculate _ [] n = return n
> > calculate counts (l:ls) n = do
> >     let
> >         a = case getAllTextSubmatches $ l =~ B.pack "matrix.*=
> > ([0-9eE.+-]+)$" :: [B.ByteString] of
> >                 [_,v] -> Just (readDouble v) :: Maybe (Maybe
> > (Double,B.ByteString))
> >                 _ -> Nothing
> >         b = (fst.fromJust.fromJust) a
> >         loop :: Int -> IO()
> >         loop i
> >             | i < (snd $ bounds strataBounds) =
> >                 if (b >= (strataBounds ! i)) && (b < (strataBounds !
> > (i+1)))
> >                 then do
> >                     c <- readArray counts i
> >                     writeArray counts i (c+1)
> >                 else
> >                     loop (i+1)
> >             | otherwise = return ()
> >     if isNothing a
> >         then
> >             calculate counts ls n
> >         else do
> >             loop 0
> >             calculate counts ls (n+1)
> >
> >
> > ------------------------------
> > From: nicolasbock at gmail.com
> > Date: Fri, 8 Feb 2013 12:26:09 -0700
> > To: haskell-cafe at haskell.org
> > Subject: [Haskell-cafe] performance question
> >
> > Hi list,
> >
> > I wrote a script that reads matrix elements from standard input, parses
> > the input using a regular expression, and then bins the matrix elements by
> > magnitude. I wrote the same script in python (just to be sure :) ) and find
> > that the python version vastly outperforms the Haskell script.
> >
> > To be concrete:
> >
> > $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay
> > real    0m2.655s
> > user    0m2.677s
> > sys     0m0.095s
> >
> > $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py -
> > real    0m0.445s
> > user    0m0.615s
> > sys     0m0.032s
> >
> > The Haskell script was compiled with "ghc --make printMatrixDecay.hs".
> >
> > Could you have a look at the script and give me some pointers as to where
> > I could improve it, both in terms of performance and also generally, as I
> > am very new to Haskell.
> >
> > Thanks already,
> >
> > nick
> >
> >
> > _______________________________________________ Haskell-Cafe mailing list
> > Haskell-Cafe at haskell.org
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >





More information about the Haskell-Cafe mailing list