[Haskell-cafe] the purpose of QuickCheck's size parameter

Sat Jun 16 21:18:03 UTC 2018

Thank you for your extended explanation. Let me reply my thoughts inline
below.

pá 15. 6. 2018 v 0:22 odesílatel Li-yao Xia <lysxia at gmail.com> napsal:

> The purpose of the size parameter is not well-defined formally, but it
> is a very convenient knob to easily tune the test suite in various
> situations, that is definitely worth the negligible cost of having it
> around unconditionally.
>

That's what has been disturbing me. Is it a bound on a data structure size?
The amount of memory it occupies (which can be quite different due to
laziness)? Or a time bound on the tested algorithm running time?

If we take a specific example, a typical recursive data structure is some
kind of a tree. When generating a random binary tree, should the size bound
it's height? Or its average height? Or the number of its elements?

As the size bound it it's not really defined, everyone uses it differently,
so in the end all we now that there is *some* bound and that we can make it
larger or smaller, but that's really all.

>
> Without a size parameter, a fixed distribution means that if we want the
> generator to cover all possible cases, we have to keep a small
> probability of generating humongous examples and thus go OOM. We can
> avoid that by making the size of generated values bounded by the size
> parameter (or a function thereof), which we can increase easily when
> more resources become available.
>

That's true, but I'd assume that if we use something like the exponential
distribution <https://en.wikipedia.org/wiki/Exponential_distribution>, the
probability of an example that'd cause OOM can be really tiny. (Possibly
smaller than the change that there is a human-introduced bug causing OOM,
for example.)

>
> Furthermore, if we really want to generate very large examples, the only
> way with a fixed distribution is to wait longer. Instead, using a size
> parameter, we can make smaller values less likely to target further
> regions in the search space more accurately.
>

That's probably the most convincing argument for me. Without a size bound,
as the size of a structure increases, the likelihood of it's appearance
must approach zero probability. To test larger structures as well, it makes
sense to something like "pick a size between 1 and 1M and then generate a
structure of that size", so we have a naturally occurring bound in such
tests. But still I have doubts if it wouldn't be better to either:

   - Express this limit explicitly for generators of such structures of
   variable size, like in vectorOf, or
   - Define the meaning of 'size' more rigorously to make the behavior more
   consistent among different data structures.

>
> Some properties can take a while to test on larger examples (either
> because of the generators or the actual verification process) so we
> might like to keep the size small during development, and raise it again
> once we're done.
>
> The Arbitrary class assigns a "default" generator for every type. While
> it is not always a good choice, having a parameter to tweak makes
> Arbitrary more generally useful.
>

Here I'd argue that not having a precisely defined meaning of size,
tweaking Arbitrary instances by modifying its size parameter before
produces very vague results. If one needs at least some level of confidence
in the behavior of a test, it's usually necessary to use parametrized
functions that document the distribution of the values depending on
parameters

>
> As for your last point, small examples are faster to generate and check,
> so it seems like a decent strategy to start small by default.
>

I'd say that this applies only during fixing a bug, as you mentioned above.
As long as the test is failing, starting with smaller example indeed makes
is fail faster. But for example for tests in continuous integration, we
expect all of them to succeed and we need to test both small and large, so
here the advantage of starting with small ones vanishes.

Best,
Petr

>
> Li-yao
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20180616/6e8a6b72/attachment.html>