[Haskell-cafe] xhtml + bytestring

Eugene Kirpichov ekirpichov at gmail.com
Tue Jan 20 04:42:37 EST 2009

I improved the implementation of concatMap:

concatMap' :: (Word8 -> L.ByteString) -> L.ByteString -> L.ByteString
concatMap' f s = L.unfoldr p x0
    where x0 = (LI.Empty, s)
          p (sf, s) = case L.uncons sf of
              Just (c,sf') -> Just (c, (sf',s))
              Nothing -> case L.uncons s of
                  Nothing -> Nothing
                  Just (c,s') -> p (f c, s')

I did a test that performs something like escaping in the following fashion:
main = putStrLn . show . L.foldr (+) 0 $ concatMap' f xs
    where xs = L.pack $ concat $ replicate 100000 [1..10]
          f 1 = esc
          f x = L.pack [x]
          esc = L.pack [1,2,3,4]
So, every 10th character is escaped.

My version works 4x faster than L.concatMap.
However, if 'f' returns larger strings, it quickly becomes dramatically slower.

So, I'd recommend it for escaping and wouldn't recomment it for anything else.

2009/1/20 Joachim Breitner <mail at joachim-breitner.de>:
> Hi Bjorn, hi list,
> the darcswatch instance I'm running is getting quite big, and it
> periodically slows down my server. I managed to get quite an improvement
> with this simple patch to my parsing:
> http://darcs.nomeata.de/cgi-bin/darcsweb.cgi?r=darcswatch;a=commitdiff;h=20090119181919-23c07-140f8deb91a52a423a2984dce2d22f4a48999aaf.gz
> Since the new HTTP library, my code runs completely on ByteStrings, only
> the xhtml library expects me to feed strings. I tried to fix this and
> created a patch against xhtml-3000.2.0.1 to work internally with lazy
> ByteStrings. It's API-compatible to the normal xhtml library, it just
> adds showHtml', renderHtml' and prettyHtml' that output lazy
> ByteStrings, and that has Html instances for strict any lazy
> ByteStrings.
> There were some speed and space improvements, but none horrific (at
> least for DarcsWatch, most of the time goes into parsing the
> repositories and mails, and into sorting that data). Unfortunately, I
> can't use it in the live installation until I upgrade the machine from
> Debian etch to lenny, as the bundled bytestring library in ghc-6.6' base
> is too old.
> Nevertheless, I'm sharing my patch here, maybe it's useful for some, or
> maybe it can be the base for an official xhtml release with bytestrings
> inside.
> To speed things up even more one should probably create a type analogous
> to ShowS, i.e. (L.ByteString -> L.ByteString), that allows you to
> concatenate ByteString chunks cheaply.
> I also noted that the current version of bytestring implements
> "concatMap" in a way that is guaranteed to rip apart the string into
> very small chunks, even if the mapped function returns the same
> character most times (as it is the case for the Html escaping function).
> Therefore, I wrote this function:
> -- | More efficient variant of 'L.concatMap'
> concatMapL' :: (Char -> String) -> L.ByteString -> L.ByteString
> concatMapL' f s = go s
>  where go s = let (unmodified, modified) = L.span (\c -> f c == [c]) s
>               in case L.uncons modified of
>                 Nothing       -> unmodified
>                 Just (c,rest) -> L.pack (f c) `L.append` go rest
> Does this make sense? Should it maybe replace the function in the
> library?
> Greetings,
> Joachim
> [1] http://darcswatch.nomeata.de/
> --
> Joachim "nomeata" Breitner
>  mail: mail at joachim-breitner.de | ICQ# 74513189 | GPG-Key: 4743206C
>  JID: nomeata at joachim-breitner.de | http://www.joachim-breitner.de/
>  Debian Developer: nomeata at debian.org
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

Евгений Кирпичев
Разработчик Яндекс.Маркета

More information about the Haskell-Cafe mailing list