ByteString I/O Performance

Sun Sep 2 19:40:51 EDT 2007

Donald Bruce Stewart writes:

 > It's been a while since I benchmarked the IO performance, so
 > looks like time to revisit this issue.

Hi Bruce,

my impression is that performance cannot be improved significantly
without providing a modified input API. For some purposes reading
input in large chunks is fine, but for some purposes it is not. A
network proxy server, for example, cannot afford to buffer large
amounts of data for every connection. An I/O buffer size of, say
256KB, would be really not a good choice for such a program. Something
like 4KB typically is, but the small buffer size means that large
amounts of data will be read using a fairly large number of hGet
calls. As it is, hGet performs at least one malloc() per read() call.
That will be slow, no matter how optimized that code is.

One way to get malloc() out of the picture would be to provide a
variant of hGet that takes an existing, pre-allocated buffer as an
argument, so that the user can allocate a ByteString once and re-use
it for every single hGet and hPut.

A different approach would be to try to reduce the cost for malloc()
by using some sort of pre-allocated pool of ByteStrings behind the
scenes.

Last but not least, it's also possible to decide and document that
ByteStrings are not supposed to be used for those kinds of purposes
and that users who need very high performance should rely on hGetBuf
instead.

I can't say what's best, those are simply the options I see.

With kind regards,
Peter