[Haskell-cafe] performance question
Bob Ippolito
bob at redivi.com
Wed Feb 13 06:16:49 CET 2013
On Tuesday, February 12, 2013, wrote:
> On Tue, 12 Feb 2013 15:57:37 -0700
> Nicolas Bock <nicolasbock at gmail.com <javascript:;>> wrote:
>
> > > Here is haskell version that is faster than python, almost as fast as
> c++.
> > > You need to install bytestring-lexing package for readDouble.
>
>
> I was hoping Branimir could comment on how the improvements were allocated.
>
> how much is due to text.regex.pcre (which looks to be a wrapper to
> libpcre) ?
>
> how much can be attributed to using data.bytestring ?
>
> you have to admit, it's amazing how well a byte-compiled, _dynamically
> typed_ interpreter can do against an actualy native code compiler. Can't
> regex be done effectively in haskell ? Is it something that can't be done,
> or is it just such minimal effort to link to pcre that it's not worth the
> trouble ?
I think that there are two bottlenecks: the regex engine, and converting a
bytestring to a double. There doesn't appear to be a fast and accurate
strtod implementation for Haskell, and the faster regex implementations
that I could find appear to be unmaintained.
>
>
> Brian
>
> > > import Text.Regex.PCRE
> > > import Data.Maybe
> > > import Data.Array.IO
> > > import Data.Array.Unboxed
> > > import qualified Data.ByteString.Char8 as B
> > > import Data.ByteString.Lex.Double (readDouble)
> > >
> > > strataBounds :: UArray Int Double
> > > strataBounds = listArray (0,10) [ 0.0, 1.0e-8, 1.0e-7, 1.0e-6, 1.0e-5,
> > > 1.0e-4, 1.0e-3, 1.0e-2, 1.0e-1, 1.0, 2.0 ]
> > >
> > > newStrataCounts :: IO(IOUArray Int Int)
> > > newStrataCounts = newArray (bounds strataBounds) 0
> > >
> > > main = do
> > > l <- B.getContents
> > > let a = B.lines l
> > > strataCounts <- newStrataCounts
> > > n <- calculate strataCounts a 0
> > > let
> > > printStrataCounts :: IO ()
> > > printStrataCounts = do
> > > let s = round $ sqrt (fromIntegral n::Double) :: Int
> > > printf "read %d matrix elements (%dx%d = %d)\n" n s s n
> > > printStrataCounts' 0 0
> > > printStrataCounts' :: Int -> Int -> IO ()
> > > printStrataCounts' i total
> > > | i < (snd $ bounds strataBounds) = do
> > > count <- readArray strataCounts i
> > > let
> > > p :: Double
> > > p = (100.0*(fromIntegral count) ::
> > > Double)/(fromIntegral n :: Double)
> > > printf "[%1.2e, %1.2e) = %i (%1.2f%%) %i\n"
> (strataBounds
> > > ! i) (strataBounds ! (i+1))
> > > count p
> > > (total + count)
> > > printStrataCounts' (i+1) (total+count)
> > > | otherwise = return ()
> > > printStrataCounts
> > >
> > > calculate :: IOUArray Int Int -> [B.ByteString] -> Int -> IO Int
> > > calculate _ [] n = return n
> > > calculate counts (l:ls) n = do
> > > let
> > > a = case getAllTextSubmatches $ l =~ B.pack "matrix.*=
> > > ([0-9eE.+-]+)$" :: [B.ByteString] of
> > > [_,v] -> Just (readDouble v) :: Maybe (Maybe
> > > (Double,B.ByteString))
> > > _ -> Nothing
> > > b = (fst.fromJust.fromJust) a
> > > loop :: Int -> IO()
> > > loop i
> > > | i < (snd $ bounds strataBounds) =
> > > if (b >= (strataBounds ! i)) && (b < (strataBounds !
> > > (i+1)))
> > > then do
> > > c <- readArray counts i
> > > writeArray counts i (c+1)
> > > else
> > > loop (i+1)
> > > | otherwise = return ()
> > > if isNothing a
> > > then
> > > calculate counts ls n
> > > else do
> > > loop 0
> > > calculate counts ls (n+1)
> > >
> > >
> > > ------------------------------
> > > From: nicolasbock at gmail.com <javascript:;>
> > > Date: Fri, 8 Feb 2013 12:26:09 -0700
> > > To: haskell-cafe at haskell.org <javascript:;>
> > > Subject: [Haskell-cafe] performance question
> > >
> > > Hi list,
> > >
> > > I wrote a script that reads matrix elements from standard input, parses
> > > the input using a regular expression, and then bins the matrix
> elements by
> > > magnitude. I wrote the same script in python (just to be sure :) ) and
> find
> > > that the python version vastly outperforms the Haskell script.
> > >
> > > To be concrete:
> > >
> > > $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay
> > > real 0m2.655s
> > > user 0m2.677s
> > > sys 0m0.095s
> > >
> > > $ time ./createMatrixDump.py -N 128 | ./printMatrixDecay.py -
> > > real 0m0.445s
> > > user 0m0.615s
> > > sys 0m0.032s
> > >
> > > The Haskell script was compiled with "ghc --make printMatrixDecay.hs".
> > >
> > > Could you have a look at the script and give me some pointers as to
> where
> > > I could improve it, both in terms of performance and also generally,
> as I
> > > am very new to Haskell.
> > >
> > > Thanks already,
> > >
> > > nick
> > >
> > >
> > > _______________________________________________ Haskell-Cafe mailing
> list
> > > Haskell-Cafe at haskell.org <javascript:;>
> > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> > >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130212/d2756498/attachment.htm>
More information about the Haskell-Cafe
mailing list