[Haskell-cafe] SmallCheck Depth

Thu Feb 5 14:29:38 UTC 2015

On Thu, Feb 5, 2015 at 6:06 AM, Roman Cheplyaka <roma at ro-che.info> wrote:
> The choice of what belongs to what depth is rather arbitrary.
>
> You say you'd expect all three constructors to be of depth 0. What about
>
>   data A = A1 .. A50
>
> ?
>
> What about Char or Int?
>
> See the problem?

I see the issue, but I think the current SmallCheck documentation does
not acknowledge this issue and the documentation in fact obstructs
understanding of the way the library currently works.

First, Runciman 2008 does not say that the choice of what belongs to
what depth is arbitrary.  Rather, it states that "the depth of a
zero-arity construction is zero, and the depth of a positive-arity
construction is one greater than the maximum depth of a component
argument."  Thus the depth of True would be zero.  In

data A = A1 .. A50

the depth of each value would be zero.

The Runciman 2008 theory holds perfectly for Int if one conceptualizes
Int to be built as follows:

data Sign = Pos | Neg
data Nat = One | Succ Nat
data Int = Zero | NonZero Sign Nat

In that case, the depth of Zero is 0. The depth of 1 (that is, NonZero
Pos One)) is one greater than the maximum depth of a component
argument; since the maximum depth of a component is zero, that gives
us depth 1.  Similarly, the depth of NonZero Pos (Succ One) is 2.

I acknowledge that Runciman 2008 did not explain this as well as it
might have.  I have puzzled over it off and on for months, though the
fact that the current SmallCheck is not consistent with this
explanation did not help me.  Further, my explanation of Int is not
the one Runciman 2008 gave, as it ignores that an Int might be
negative.  In addition, since Int is bounded, you could indeed argue
that every member of Int should indeed have depth 0 because then your
@data A = A1 .. A50@ conceptualization is more appropriate.  However,
the conceptualization given above is the only one that holds for
Integer.

I also acknowledge that the Runciman 2008 theory was not used for all
types in the original SmallCheck; for instance, Char would have been
treated differently.  Even so, the Runciman 2008 theory certainly
holds for types with only a few possible values (such as (), Bool,
Maybe, etc.) and the original SmallCheck was faithful to the Runciman
2008 theory in those cases.

Upon further examination of the current SmallCheck source, I see that
'cons0' decreases the depth, while its counterpart in the original
SmallCheck did not.  What I was wondering is whether this change was
made for any particular reason.

Perhaps the explanation given in Runciman 2008 is not, in fact, a
useful one in practice because of the existence of bounded types with
large numbers of values, such as Char and Int.  In that case, though,
it would be useful if the current SmallCheck documentation abandoned
the Runciman 2008 explanation.  "Test.SmallCheck.Series" still states
that "For data values, [depth] is the depth of nested constructor
applications."  After puzzling over what on earth "nested constructor
applications" means, and then consulting Runciman 2008, I was only
more confused after seeing that the current SmallCheck behaves in a
way that is inconsistent with its own documentation, with Runciman
2008, and with the original SmallCheck.

Perhaps, then, if the current theory underlying SmallCheck is that
depth is just some size-limiting thing, the documentation should say
something like "Depth is a non-negative integer.  You use depth to
make progressively 'larger' values.  What 'larger' means is up to you.
Runciman 2008 said it means the depth of nested constructor
applications, but that is not always practically sensible.  You
should, however, ensure that all the values included at @depth n@ are
also included in @depth n + 1 at . "  Even with an explanation like that,
though, I don't see why @list 0@ should sometimes return values (as it
does for () and Int) and sometimes not (as it does with Bool).  That's
a puzzling behavior that did not exist in the original SmallCheck.

I bring all this up because the implementation of SmallCheck is in
some ways quite subtle.  In the past I have considered using it but
found the depth concept to be bewildering.  In contrast, QuickCheck's
Arbitrary makes no claims as to theoretical consistency--Gen Int might
give a huge range of Int, or it might return 5 every time. Indeed,
that is why there is a whole batch of bindings in Test.QuickCheck for
Gen Int of different ranges.  I doubt I am the only one who found the
inconsistency between the SmallCheck docs and its behavior to be
confusing.

Now that I finally understand the idea of depth, I might actually use
SmallCheck.  Thanks.  --Omari