[Haskell-cafe] Is Haskell capable of matching C in string
processing performance?
Gregory Crosswhite
gcross at phys.washington.edu
Fri Jan 22 07:23:05 EST 2010
The following snippet of code ran ~ 33% faster than yours on my computer (GHC 6.10.4, OSX):
import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Lazy as L
import System.IO
null_str = S.pack "null"
main = withBinaryFile "out2.json" WriteMode $ \h -> do
hPutStr h "["
L.hPutStr h . L.fromChunks . replicate 5000000 $ null_str
hPutStr h "]"
The following snippet ran in roughly the same speed (which should not be surprising, as it is essentially the same code):
import Control.Monad
import qualified Data.ByteString.Char8 as S
import System.IO
null_str = S.pack "null"
main = withBinaryFile "out3.json" WriteMode $ \h -> do
hPutStr h "["
replicateM_ 5000000 (S.hPutStr h null_str)
hPutStr h "]"
This C snippet ran ~ 6 times faster than the Haskell snippets:
#include <stdio.h>
int main() {
int i;
FILE* f = fopen("outc.json","w");
fprintf(f,"[");
for (i = 0; i < 5000000; ++i) fprintf(f,"null");
fprintf(f,"]");
}
It seems to me this indicates that the big expense here is the call into the I/O system.
Cheers,
Greg
On Jan 21, 2010, at 10:09 PM, John Millikin wrote:
> Recently I've been working on a library for generating large JSON[1]
> documents quickly. Originally I started writing it in Haskell, but
> quickly encountered performance problems. After exhausting my (meager)
> supply of optimization ideas, I rewrote some of it in C, with dramatic
> results. Namely, the C solution is
>
> * 7.5 times faster than the fastest Haskell I could write (both using
> raw pointer arrays)
> * 14 times faster than a somewhat functional version (uses monads, but
> no explicit IO)
> * >30 times faster than fancy functional solutions with iteratees, streams, etc
>
> I'm wondering if string processing is simply a Haskell weak point,
> performance-wise. The problem involves many millions of very small
> (<10 character, usually) strings -- the C solution can copy directly
> from string literals into a fixed buffer and flush it occasionally,
> while even the fastest Haskell version has a lot of overhead from
> copying around arrays.
>
> Dons suggested I was "doing it wrong", so I'm posting on -cafe in the
> hopes that somebody can tell me how to get better performance without
> resorting to C.
>
> Here's the fastest Haskell version I could come up with. It discards
> all error handling, validation, and correctness in the name of
> performance, but still can't get anywhere near C:
> http://hpaste.org/fastcgi/hpaste.fcgi/view?id=16423
>
> [1] http://json.org/
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list