[Haskell-cafe] ANNOUNCE: text 0.8.0.0, fast Unicode text support

Wed Sep 1 15:29:47 EDT 2010

On Wednesday 01 September 2010 18:15:19, Bryan O'Sullivan wrote:
> Hi, Daniel -
>
> Thanks for taking the new code for a test drive!
>
> > The interesting part is the comparison between text and vanilla String
> > I/O, the difference is smaller than I expected for text-0.8.0.0.
>
> Yes. Much of this is due to the new encoding stuff on Handles in GHC
> 6.12, which is slow. Its performance wasn't so noticeable when it was
> only shipping String around, but it's much more visible with Text. It's
> far slower on a Mac than on Linux, in case that's relevant.
>

I'm on Linux. I guess that's another point in favour of it:)
Do you happen to know why it's slower on a Mac?

> > The performance of text-0.8.0.0 has improved significantly over that
> > of text-0.7.2.1 (for the tested features), the improvement of the
> > replacing algorithm is however not as impressive as that of I/O.
>
> I'd bet you that's mostly because the program is I/O bound, in the sense
> that it's spending time going through the layers of buffering and
> translation that now make up a Handle. Any improvement in other code is
> going to be hard to see because of that.

Maybe. As a rough measure, I took (total time - time for just reading and 
outputting) for an approximation of the processing (replacing) time.
For the first example (Lazy), that yields
0.7.2.1: 0.92s - 0.68s = 0.24s
0.8.0.0: 0.44s - 0.26s = 0.18s,
for the fourth
0.7.2.1: 1.23s - 0.68s = 0.55s
0.8.0.0: 0.65s - 0.26s = 0.39s

Yes, I know it's very crude, but TIO.readFile file >>= TIO.putStrLn does no 
less translation than with a replacing in between, or does it?
So I tentatively believe most of the difference is spent doing the 
replacements.

>
> The other major consideration, both for this case and the first one you
> note, is that the inliner in 6.12 chokes on code that uses stream
> fusion: it boxes and unboxes vast quantities of state. That kills
> performance due to both the boxing and unboxing overhead and the
> increased number of nursery GCs.

I haven't looked at the core, so I take your word for it. I know the 
behaviour from other situations.

>
> The marvelous new 6.13 inliner does a *much* better job here - that's
> where I see those 3x performance improvements for free.

That's nice. Maybe I should go get the HEAD before 6.14.1 will be released.

>
> What's *really* bad is the space behaviour.
>
>
> What are you using to measure that?

+RTS -s and top. The figures I gave were the "total memory in use" of
+RTS -s, maximum residency was 20MB (vs. 39 total) for 0.7 and 12MB (vs. 17 
total) for 0.8 in the bad tests.

>
> Also, please don't forget to post your benchmark code when you make
> observations like this,

It's not very interesting,

{-# LANGUAGE BangPatterns #-}
module Main (main) where

import System.Environment (getArgs)
import qualified Data.Text.Lazy as T
import qualified Data.Text.Lazy.IO as TIO

main :: IO ()
main = do
    (file : pat : sub : _) <- getArgs
    let !spat = T.pack pat
        !ssub = T.pack sub
        work = T.replace spat ssub
    TIO.readFile file >>= TIO.putStrLn . work

with the obvious changes of imports for the strict Text and the almost 
obvious changes for the ByteString code.

> as that way I can reproduce your measurements
> and fix problems. I appreciate your help!

I can now say more. Looking at Data.Text.Lazy.replace,

replace s d = intercalate d . split s

, I also got a space leak with that for BS.Lazy's intercalate and 
stringsearch's split. Looking at intercalate,

intercalate t = concat . (Data.List.intersperse t)

, that's where you definitely get a space leak, because

intersperse             :: a -> [a] -> [a]
intersperse _   []      = []
intersperse _   [x]     = [x]
intersperse sep (x:xs)  = x : sep : intersperse sep xs

isn't lazy enough.
Given the context, it's not hard to see what's wrong here. Before 
intersperse produces any output, it checks whether the list contains at 
least two elements (if any). If a is a list-like type, like String, lazy 
ByteStrings or lazy Text, where the elements of the list are lazily 
produced one after the other, as is the case for split, each element must 
be complete before it can be delivered and then consumed.
So indeed, replace needs O(index pat) space :(

I don't think that fixing Data.List.intersperse will fix your space leak, 
though.

split pat src
    | null pat        = emptyError "split"
    | isSingleton pat = splitBy (== head pat) src
    | otherwise       = go 0 (indices pat src) src
  where
    go  _ []     cs = [cs]
    go !i (x:xs) cs = let h :*: t = splitAtWord (x-i) cs
                      in  h : go (x+l) xs (dropWords l t)
    l = foldlChunks (\a (T.Text _ _ b) -> a + fromIntegral b) 0 pat

Using the list of indices of the pattern, split can't deliver anything 
before the first (next) occurrence of the pattern has been located (or the 
end of the string reached). Again, that forces O(index pat) space 
consumption.

I don't see how to avoid that without duplicating a large part of indices' 
logic in a dedicated breaking/splitting function.

On a related note,

break :: Text -> Text -> (Text, Text)
break pat src
    | null pat  = emptyError "break"
    | otherwise = case indices pat src of
                    []    -> (src, empty)
                    (x:_) -> let h :*: t = splitAtWord x src
                             in  (h, t)

has the same problem.

Cheers,
Daniel