[Haskell-cafe] Google Summer of Code: BlazeHTML RFC

Thu May 27 04:53:10 EDT 2010

On Thu, May 27, 2010 at 11:40 AM, Ivan Miljenovic <ivan.miljenovic at gmail.com
> wrote:

> On 27 May 2010 18:33, Michael Snoyman <michael at snoyman.com> wrote:
> > I don't do any string concatenation (look closely), I was very careful to
> > avoid it. I tried with lazy text as well: it was slower. This isn't
> > surprising, since lazy text- under the surface- is just a list of strict
> > text. And the benchmark itself already has a lazy list of strict text.
> Using
> > lazy text would just be adding a layer of wrapping.
> > I don't know what you mean by "explicitly using Text values"; you mean
> > calling pack manually? That's really all that OverloadedStrings does.
> > You can try out lots of different variants on that benchmark. I did that
> > already, and found this to be the fastest version.
>
> Fair enough.  Now that I think about it, I recall once trying to have
> pretty generate Text values rather than String for graphviz (by using
> fullRender, so it was still using String under the hood until it came
> time to render) and it too was much slower than String (unfortunately,
> I didn't record a patch with these changes so I can't just go back and
> play with it anymore as I reverted them all :s).
>
> Maybe Bryan can chime in with some best-practices for using Text?
>
> Here's my guess at an explanation for what's happening in my benchmark:

text will clearly beat String in memory usage, that's what it's designed
for. However, the compiler is still generating String values which are being
encoded to Text as runtime.

Now, this is the same process for bytestrings. However, bytestrings never
have to be decoded: the IO routines simply read the character buffer. In the
case of text, however, the encoded data must be decoded again to a
bytestring.

In other words, here's what I think the three different benchmarks are
really doing:

* String: generates a list of Strings, passes each String to a relatively
inefficient IO routine.
* ByteString: encodes Strings one by one into ByteStrings, generates a list
of these ByteStrings, and passes each ByteString to a very efficient IO
routine.
: Text: encodes Strings one by one into Texts, generates a list of these
Texts, calls a UTF-8 decoding function to decode each Text into a
ByteString, and passes each resulting ByteString to a very efficient IO
routine.

In the case of ASCII data to be output as UTF-8, uses the
Data.ByteString.Char8.pack function will most likely always be the most
efficient choice, and thus it seems like something BlazeHtml should support.
I'm considering releasing a Hamlet 0.3 based entirely on UTF-8 encoded
ByteStrings, but I'd also like to hear from Bryan about this.

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100527/a6bb0c8c/attachment.html