[Haskell-cafe] RFC: demanding lazy instances of Data.Binary
nicolas.frisby at gmail.com
Mon Nov 19 21:22:07 EST 2007
On Nov 19, 2007 4:16 PM, Duncan Coutts <duncan.coutts at worc.ox.ac.uk> wrote:
> On Mon, 2007-11-19 at 13:39 -0800, Don Stewart wrote:
> > nicolas.frisby:
> > >
> > > 1) The fact that serialisation is fully strict for 32760 bytes but not for
> > > 32761 makes the direct application of strictCheck intractable. Do you have
> > > any ideas how to circumvent that?
> Test using a much smaller chunk size. I'd test sizes from 1 to something
> like one more than the machine word size.
Let me clarify "circumvent that." strictCheck uses a bounded search
starting at 1 and proceeds to some limit. The Binary instance used in
the example was the fully lazy one for lists: a get/putWord8 for each
constructor. Even so, it was effectively spine-strict up to 32k bytes
(which was 32k elements b/c of the use of unit) because (I think that)
the first chunk of the lazy bytestring wasn't being generated by
encode until it was full. If you asked strictCheck to go from 1 to
32k, I don't think it would finish. So by circumvent, I mean: How can
we apply the essential ideas of strictCheck when our chunks are so
big? Obviously, the iterative search cannot just proceed by one
element at a time; but then we lose the obvious meaning of "add one
more _|_. I don't see an obvious candidate for how to alter the
_|_-ridden test vector generation. Moreover, it's "proposed output" is
wrong when considered from the Big Chunks perspective--we don't
necessarily want Chitil's least strictness.
(more new text below)
> > > 2) Also, any suggestions for other datatypes to provide default instances
> > > for? Tree type structures immediately raise the question of which
> > > traversal should be the default. I'm learning towards providing none since
> > > the goal of constant space usage actually depends on the serialisation
> > > order matching how the deserialised tree will be traversed.
> > Lazy Arrays?
> > > 3) I don't think it is applicable in anyway whatsoever to strict types
> > > like Int, Data.Set, and Data.Sequence? Counter-arguments?
> > Well, atomic types like Int, I don't think it makes sense, but Sets and
> > Sequence are lazy, aren't they?
> Sequences are like spine strict lists. Sets are strict in as much as the
> element type's (==) function is strict.
Let me refine how I posed that question. A predicate: if you enter
"Package.fromList [1..]" at the ghci prompt and you get no
intermediate results, then that was a "strict type." I'm assuming that
if the Show instance doesn't produce intermediate results, then the
serialisation technique can't handle intermediate results (i.e.
chunking) either--at least not in a general enough way to include it
in a library.
More information about the Haskell-Cafe