[Haskell-cafe] Re: String vs ByteString
Daniel Fischer
daniel.is.fischer at web.de
Fri Aug 13 12:55:49 EDT 2010
On Friday 13 August 2010 17:57:36, Bryan O'Sullivan wrote:
> 3. Some commonly used functions, such as substring searching, are
> *way*faster than their ByteString counterparts.
That's an unfortunate example. Using the stringsearch package, substring
searching in ByteStrings was considerably faster than in Data.Text in my
tests.
Replacing substrings blew Data.Text to pieces even, with a factor of 10-65
between ByteString and Text (and much smaller memory footprint).
stringsearch (Data.ByteString.Lazy.Search):
$ ./bmLazy +RTS -s -RTS ../../bigfile Gutenberg Hutzenzwerg > /dev/null
./bmLazy ../../bigfile Gutenberg Hutzenzwerg +RTS -s
92,045,816 bytes allocated in the heap
31,908 bytes copied during GC
103,368 bytes maximum residency (1 sample(s))
39,992 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 158 collections, 0 parallel, 0.01s, 0.00s elapsed
Generation 1: 1 collections, 0 parallel, 0.00s, 0.00s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.07s ( 0.17s elapsed)
GC time 0.01s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.08s ( 0.17s elapsed)
%GC time 10.5% (2.1% elapsed)
Alloc rate 1,353,535,321 bytes per MUT second
Productivity 89.5% of total user, 40.1% of total elapsed
Data.Text.Lazy:
$ ./textLazy +RTS -s -RTS ../../bigfile Gutenberg Hutzenzwerg > /dev/null
./textLazy ../../bigfile Gutenberg Hutzenzwerg +RTS -s
4,916,133,652 bytes allocated in the heap
6,721,496 bytes copied during GC
12,961,776 bytes maximum residency (58 sample(s))
12,788,968 bytes maximum slop
39 MB total memory in use (1 MB lost due to fragmentation)
Generation 0: 8774 collections, 0 parallel, 0.70s, 0.73s elapsed
Generation 1: 58 collections, 0 parallel, 0.03s, 0.03s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 9.87s ( 10.23s elapsed)
GC time 0.73s ( 0.75s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 10.60s ( 10.99s elapsed)
%GC time 6.9% (6.9% elapsed)
Alloc rate 497,956,181 bytes per MUT second
bigfile is a ~75M file.
The point of the more adequate API for text manipulation stands, of course.
Cheers,
Daniel
More information about the Haskell-Cafe
mailing list