[Haskell-cafe] xhtml + bytestring

Eugene Kirpichov ekirpichov at gmail.com
Mon Jan 19 17:59:43 EST 2009


I think that your instance is too specific, although useful for the
particular case of escaping.

I've done my own implementation for fun:

concatMap' :: (Word8 -> L.ByteString) -> L.ByteString -> L.ByteString
concatMap' f s = L.unfoldr p x0
    where x0 = (LI.Empty, s, 0, 0)
          p (LI.Empty, LI.Empty, _, _) = Nothing
          p (LI.Empty, c@(LI.Chunk s s'), _, ns) | ns==S.length s = p
(LI.Empty, s', 0, 0)
                                                 | otherwise      = p
(f (S.index s ns), c, 0, ns+1)
          p (c@(LI.Chunk s s'), bs, nf, ns) | nf==S.length s = p (s', bs, 0, ns)
                                            | otherwise      = Just
(S.index s nf, (c, bs, nf+1, ns))

It turns out to be both a lot slower (don't know why) and with greater
memory residence (because it's not lazy) than the built-in concatMap
for bytestrings in my synthetic tests. However, it produces a
bytestring with few chunks (due to a good implementation of unfoldr)
in an asymptotically optimal time. I also noticed that for a case like
unescaping, where 'f' produces a small string (5) and the source
bytestring is large (10mln), my version spends 24% in GC whereas the
standard version spends 68%.
A drawback is that the resulting bytestring is not lazy at all.

I wonder how one might optimize it. It looks like a function that a
compiler should optimize very well without my help, although I didn't
read the core etc.

2009/1/20 Joachim Breitner <mail at joachim-breitner.de>:
> Hi Bjorn, hi list,
>
> the darcswatch instance I'm running is getting quite big, and it
> periodically slows down my server. I managed to get quite an improvement
> with this simple patch to my parsing:
> http://darcs.nomeata.de/cgi-bin/darcsweb.cgi?r=darcswatch;a=commitdiff;h=20090119181919-23c07-140f8deb91a52a423a2984dce2d22f4a48999aaf.gz
>
> Since the new HTTP library, my code runs completely on ByteStrings, only
> the xhtml library expects me to feed strings. I tried to fix this and
> created a patch against xhtml-3000.2.0.1 to work internally with lazy
> ByteStrings. It's API-compatible to the normal xhtml library, it just
> adds showHtml', renderHtml' and prettyHtml' that output lazy
> ByteStrings, and that has Html instances for strict any lazy
> ByteStrings.
>
> There were some speed and space improvements, but none horrific (at
> least for DarcsWatch, most of the time goes into parsing the
> repositories and mails, and into sorting that data). Unfortunately, I
> can't use it in the live installation until I upgrade the machine from
> Debian etch to lenny, as the bundled bytestring library in ghc-6.6' base
> is too old.
>
> Nevertheless, I'm sharing my patch here, maybe it's useful for some, or
> maybe it can be the base for an official xhtml release with bytestrings
> inside.
>
> To speed things up even more one should probably create a type analogous
> to ShowS, i.e. (L.ByteString -> L.ByteString), that allows you to
> concatenate ByteString chunks cheaply.
>
> I also noted that the current version of bytestring implements
> "concatMap" in a way that is guaranteed to rip apart the string into
> very small chunks, even if the mapped function returns the same
> character most times (as it is the case for the Html escaping function).
> Therefore, I wrote this function:
>
> -- | More efficient variant of 'L.concatMap'
> concatMapL' :: (Char -> String) -> L.ByteString -> L.ByteString
> concatMapL' f s = go s
>  where go s = let (unmodified, modified) = L.span (\c -> f c == [c]) s
>               in case L.uncons modified of
>                 Nothing       -> unmodified
>                 Just (c,rest) -> L.pack (f c) `L.append` go rest
>
> Does this make sense? Should it maybe replace the function in the
> library?
>
> Greetings,
> Joachim
>
> [1] http://darcswatch.nomeata.de/
>
> --
> Joachim "nomeata" Breitner
>  mail: mail at joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C
>  JID: nomeata at joachim-breitner.de | http://www.joachim-breitner.de/
>  Debian Developer: nomeata at debian.org
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>


More information about the Haskell-Cafe mailing list