[Haskell-cafe] OCaml list sees abysmal Language Shootout results

Tomasz Zielonka t.zielonka at students.mimuw.edu.pl
Tue Sep 28 07:16:15 EDT 2004


On Tue, Sep 28, 2004 at 12:49:52PM +0200, Tomasz Zielonka wrote:
> On Tue, Sep 28, 2004 at 12:01:11PM +0200, Tomasz Zielonka wrote:
> > On Tue, Sep 28, 2004 at 10:46:14AM +0100, Keith Wansbrough wrote:
> > > I just saw this on the OCaml list (in a posting by "Rafael 'Dido' 
> > > Sevilla" <dido at imperium.ph> in the "Observations on OCaml vs. Haskell" 
> > > thread).  I can't believe that a simple "wc" implementation should be 
> > > 570 times slower in Haskell than OCaml - could someone investigate and 
> > > fix the test?
> > 
> > No wonder it is so slow, this program looks as a result of some ,,as
> > slow as possible'' contest ;)
> 
> It took me half an hour to make a version which is 41 times faster
> on a 5MB file. It should be possible to make it even 2-3 times faster
> than this.

Changed readArray to unsafeRead, and it is 47 times faster now.

I must say I am pleasantly surprised that GHC managed to unbox
everything there was to unbox without much annotations. For 5MB file the
program allocated only 192KB in the heap. Especially optimisation of
higher-level constructs like 'fmap (toEnun . fromEnum) ...' is very
nice.

Code attached. Feel free to improve it.

Best regards,
Tom

-- 
.signature: Too many levels of symbolic links
-------------- next part --------------

import System.IO
import Data.Array.IO
import Data.Array.Base (unsafeRead)
import Data.Word
import Char
import List

wc :: Handle -> IO (Int, Int, Int)
wc h = do
    buf <- newArray_ (0, bufSize - 1) :: IO (IOUArray Int Word8)
    let
        wcLoop :: Char -> Int -> Int -> Int -> Int -> Int -> IO (Int, Int, Int)
        wcLoop prev nl nw nc i n 
            | prev `seq` nl `seq` nw `seq` nc `seq` i `seq` n `seq` False =
                undefined
            | i == n =
                do  n' <- hGetArray h buf bufSize
                    if n' == 0
                        then return (nl, nw, nc)
                        else wcLoop prev nl nw nc 0 n'
            | otherwise =
                do  c <- fmap (toEnum . fromEnum) (unsafeRead buf i)
                    wcLoop
                        c
                        (nl + if c == '\n' then 1 else 0)
                        (nw + if not (isSpace c) && isSpace prev then 1 else 0)
                        (nc + 1)
                        (i + 1)
                        n
    wcLoop ' ' 0 0 0 0 0
  where
    bufSize :: Int
    bufSize = 8192

main = do
    (nl, nw, nc) <- wc stdin    
    putStrLn $ concat $ intersperse " " $ map show [nl, nw, nc]



More information about the Haskell-Cafe mailing list