[Haskell-cafe] Is Haskell capable of matching C in string processing performance?

Fri Jan 22 07:23:05 EST 2010

The following snippet of code ran ~ 33% faster than yours on my computer (GHC 6.10.4, OSX):

import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Lazy as L
import System.IO

null_str = S.pack "null"

main = withBinaryFile "out2.json" WriteMode $ \h -> do
  hPutStr h "["
  L.hPutStr h . L.fromChunks  . replicate 5000000 $ null_str
  hPutStr h "]"

The following snippet ran in roughly the same speed (which should not be surprising, as it is essentially the same code):

import Control.Monad
import qualified Data.ByteString.Char8 as S
import System.IO

null_str = S.pack "null"

main = withBinaryFile "out3.json" WriteMode $ \h -> do
  hPutStr h "["
  replicateM_ 5000000 (S.hPutStr h null_str)
  hPutStr h "]"

This C snippet ran ~ 6 times faster than the Haskell snippets:

#include <stdio.h>

int main() {
  int i;
  FILE* f = fopen("outc.json","w");
  fprintf(f,"[");
  for (i = 0; i < 5000000; ++i) fprintf(f,"null");
  fprintf(f,"]");
}

It seems to me this indicates that the big expense here is the call into the I/O system.

Cheers,
Greg

On Jan 21, 2010, at 10:09 PM, John Millikin wrote:

> Recently I've been working on a library for generating large JSON[1]
> documents quickly. Originally I started writing it in Haskell, but
> quickly encountered performance problems. After exhausting my (meager)
> supply of optimization ideas, I rewrote some of it in C, with dramatic
> results. Namely, the C solution is
> 
> * 7.5 times faster than the fastest Haskell I could write (both using
> raw pointer arrays)
> * 14 times faster than a somewhat functional version (uses monads, but
> no explicit IO)
> * >30 times faster than fancy functional solutions with iteratees, streams, etc
> 
> I'm wondering if string processing is simply a Haskell weak point,
> performance-wise. The problem involves many millions of very small
> (<10 character, usually) strings -- the C solution can copy directly
> from string literals into a fixed buffer and flush it occasionally,
> while even the fastest Haskell version has a lot of overhead from
> copying around arrays.
> 
> Dons suggested I was "doing it wrong", so I'm posting on -cafe in the
> hopes that somebody can tell me how to get better performance without
> resorting to C.
> 
> Here's the fastest Haskell version I could come up with. It discards
> all error handling, validation, and correctness in the name of
> performance, but still can't get anywhere near C:
> http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16423
> 
> [1] http://json.org/
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe