[Haskell-cafe] Is "take" behaving correctly?

Wed Sep 12 18:27:35 EDT 2007

On 12 Sep 2007, at 8:08 pm, rahn at ira.uka.de wrote:

>
>> take 1000 [1..3] still yields [1,2,3]
>
> You can think about take n as: Take as much as possible, but at  
> most n elements. This behavior has some nice properties as turned  
> out by others, but there are some pitfalls.

One of the very nice properties is this:
   for all n >= 0, m >= 0
      take m . take n = take (m+n)
      drop m . drop n = drop (m+n)
And another is this:
   for all xs, n >= 0
      take n xs ++ drop n xs == xs

> length . take n /= const n
>
> in general, instead only
>
> length . take n `elem` map const [0..n]

What we have is
   for all xs, n >= 0
     n <= length xs => length (take n xs) == n

Suppose we had the "unsafe" take.  Would we then have

    length . take n == const n?

NO!  For some xs, (length . take n) xs would be bottom,
while (const n) xs would be n.  As far as I can see, the
strongest statement about length that unsafeTake would
satisfy is

   for all xs, n >= 0
     n <= length xs => length (unsafeTake n xs) == n

which is, mutatis mutandis, the SAME law that applies to the
`take` we already have.

> Therefore head . length n is unsafe for all n, even strict positive  
> ones.

I'm assuming that should have been (head . take n).
The thing is that (head . take n) fails when and only when
(head . unsafeTake n) fails; the only difference is which
function reports the error.

> Moreover, it is easy to produce tricky code. Assume you want to  
> know, whether the prefixes of length k are equal and write
>
> pref_eq k xs ys = take k xs == take k ys
>
> This seems to be a straightforward implementation with good  
> properties.
Actually, no, at least not if implemented naively.  (I wonder just how
clever GHC is with this code?  I can't check at the moment.  After
heroic efforts by my sysadmin, I was finally able to install the binary
release of GHC 6.6.1, but (a) it assumes that gcc accepts a -fwrapv
flag, which gcc 3.3.2 (TWW) does not accept, and (b) after patching
that, gcc still chokes on the output of GHC.)

The obvious question is what pref_eq k xs ys *should* do when xs
or ys is not at least k long.  Does it mean "I believe xs and ys
are at least k long: if they are, check their prefixes, if they
aren't, raise an error" or does it mean "xs and ys are at least
k long and their prefixes of length k are equal"?

> Uhh, this is not what everybody expects at first glance. In  
> particular, if
>
> pref_eq k xs ys == False
>
> then *not* necessarily
>
> pref_eq k xs (init ys) == False.
>
> As always, quickCheck is your friend to assure (or reject) such a  
> property.

That is certainly so, but the fundamental issue in this example is
not what take does, but "don't leap into coding a function before
you know what it means".  We've now seen three readings of what
pref_eq means.  In particular,

pref_eq k xs ys = unsafeTake k xs == unsafeTake k ys

would NOT satisfy the law that
	pref_eq k xs ys == False
      => pref_eq k xs (init ys) == False
because pref_eq 2 [1,2] [1,3] == False
     but pref_eq 2 [1,2] [1]   => undefined

So "fixing" take would not, in this case, produce the desired behaviour.