[Haskell-cafe] A question about laziness and performance in document serialization.

Roman Cheplyaka roma at ro-che.info
Thu Aug 22 10:40:51 CEST 2013


* Kyle Hanson <hanooter at gmail.com> [2013-08-20 18:23:48-0700]
> So I am not entirely clear on how to optimize for performance for lazy
> bytestrings.
> 
> Currently I have a (Lazy) Map that contains large BSON values (more than
> 1mb when serialized each). I can serialize BSON documents to Lazy
> ByteStrings using Data.Binary.runPut. I then write this bytestring to a
> socket using Network.Socket.ByteString.Lazy.
> 
> My question is this, if the Map object doesn't change (no updates) when it
> serializes the same document to the socket 2x in a row, does it re-evaluate
> the whole BSON value and convert it to a bytestring each time?

Yes.

> Lets say I wanted to have a cache of bytestings so I have another Map
> object that has the serialized bytestrings that I populate it with every
> time the original BSON Map changes. Should the map be strict or lazy?

This is the wrong question. The right question is, do you want the
values be strict (evaluated) or lazy (kept unevaluated until required)?

If you want values to be lazy, then you have to use the lazy Map.

If you want values to be strict, then you may either use the strict Map,
or still use the lazy Map but make sure that the values are evaluated
when you place them in the map. Using the strict Map is probably a
better idea, but the lazy Map lets you have finer control over what is
lazy and what is forced (should you need it).

Note that the lazy bytestring is just a lazy list of strict bytestrings.
Even placing it in the strict map wouldn't force its evaluation.

> Should the bytestrings it stores be strict or lazy?

For a cache, it makes sense to store strict bytestrings (unless they are
so large that it may be hard to allocate that much of contiguous space).

Lazy bytestrings are useful for streaming, when you use a chunk and then
discard it.

Using strict bytestrings doesn't imply that you want to store them
evaluated. Depending on your circumstances, it may be a good idea to
store strict bytestrings lazily, so that they do not take space and time
until they are requested for the first time.

Simply operating with the words lazy and strict may be very confusing,
since they refer to different things in different contexts. Every time
you read that something is lazy or strict, try to decipher it in terms
of the basic evaluation properties.

HTH,
Roman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20130822/788d3734/attachment.pgp>


More information about the Haskell-Cafe mailing list