[Haskell-cafe] ANNOUNCE: text 0.8.0.0, fast Unicode text support
Daniel Fischer
daniel.is.fischer at web.de
Wed Sep 8 07:46:13 EDT 2010
On Wednesday 08 September 2010 06:44:32, Bryan O'Sullivan wrote:
> I have a Replace.hs benchmark in the main text repo, just to be sure
> we're talking about the same thing.
Grabbed the darcs repo, compiled with the darcs version and also with the
installed text package, both exhibit the same behaviour - what I reported
before.
> Factoring out the time spent on I/O,
> with GHC HEAD, my replace code takes twice the space and time of that in
> the stringsearch package.
That's very good. Twice the space is a given for mostly ASCII text, twice
the time is okay, I think.
My timings are quite different, but that's probably because 6.12.3's
inliner doesn't give the full fusion benefit, so it'll improve
automatically with the next GHC release.
> Given that the space involved is just 121KB
> maximum residency while processing a 124MB file, I'm not concerned about
> it.
I wouldn't either.
> And the time required isn't a bad place to start from, I think.
>
> By the way, as this implies, I can't reproduce your space behaviour at
> all.
>
That's surprising.
Have you made sure to replace a pattern which does not occur in the text?
Can you reproduce the behaviour with a) Data.List.intersperse instead of
the lazier version now used, b) ghc-6.12.* instead of HEAD?
Anyway, I would've thought that with
split pat src
| null pat = emptyError "split"
| isSingleton pat = splitBy (== head pat) src
| otherwise = go 0 (indices pat src) src
where
go _ [] cs = [cs]
go !i (x:xs) cs = let h :*: t = splitAtWord (x-i) cs
in h : go (x+l) xs (dropWords l t)
l = foldlChunks (\a (T.Text _ _ b) -> a + fromIntegral b) 0 pat
you can't start returning chunks before it's known whether the list of
indices is empty, so split would have O(index of pattern) space behaviour.
If HEAD manages to make the chunks available before they are complete
(before it's known how long they will be), it's even awesomer than I'd have
dared to hope.
Okay, so I'll have to try HEAD.
> > I can now say more. Looking at Data.Text.Lazy.replace,
> >
> > replace s d = intercalate d . split s
> >
> > , I also got a space leak with that for BS.Lazy's intercalate and
> >
> > stringsearch's split.
>
> How did you observe the space leak? Looking at -hT charted with hp2ps
> shows me nothing, and the program executes in constant space regardless
> of input size.
top and +RTS -s. I've attached the -hT graph of a run on a 74.3MB file with
a pattern that does not occur. It shows exactly the behaviour I expected
from the code of split, pretty constant growth of the heap until about
twice the file size is reached, then fast and pretty constant shrinking as
the text is output. The graphs are much more interesting if you do
replacements of patterns without large gaps between occurrences.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nbenchText.ps
Type: application/postscript
Size: 7826 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20100908/35ccf69c/nbenchText.ps
More information about the Haskell-Cafe
mailing list