Proposal: (breaking change to avoid fragile breakage) change the definition of foldr2, zip, and zipWith

David Feuer david.feuer at gmail.com
Sun Aug 24 19:22:44 UTC 2014


BACKGROUND

TRAC: #9495

GHC's definition of foldr2 violates the spec of the Haskell Report, making
some programs less defined than they should be, and does so in a way that
is quite fragile, dependent on a number of different factors—that is, it is
likely hard to debug.

Details:

The current "baseline" definition of foldr2 (before rewrite rules) does
what the Report requires:

foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c
foldr2 k z = go
  where
    go [] _ys = z
    go _xs [] = z
    go (x:xs) (y:ys) = k x y (go xs ys)

There are two rewrite rules, foldr2/left and foldr2/right, designed to
allow foldr2 to fuse with build appearing in either the first or second
list argument. foldr2/left is perfectly safe (as long as the build argument
is legitimate). foldr2/right is not, as comments in the source partially
explain. The problem shows up if you have something shaped like

zip [1,7,42,5] (1:8:4:11:undefined)

but where the second list (ending in undefined) is actually produced using
build, and GHC chooses to fuse foldr2 with it. The Report requires the
above to produce [(1,1),(7,8),(42,4),(5,11)], but GHC will instead produce
(1,1):(7,8):(42,4):(5,11):_|_

This issue can make a program that works with -O0 fail with -O, and can
cause fragility depending on exactly what fuses.

SCOPE

One obvious question is what kinds of build forms can cause problems. Many,
like map and fromEnumTo, don't have enough power to be likely to cause
trouble. Joachim Breitner, however, realized that the current
implementation of filter has enough power to break things. So here's a
complete example of code that will break with -O but not with -O0:

{-# NOINLINE don'tFuse #-}
don'tFuse = id

p x = 100 `mod` x < 20

main = print $ zip (don'tFuse [10,9..1]) (filter p [10,9,..])

Things will only get worse if I succeed in my plan to bring unfoldr into
the framework.

SOLUTIONS

1. One solution, of course, is to eliminate unfoldr2/right, bringing GHC
into compliance with the Report. I really like this idea, but Joachim
thinks it is not worth the potential performance impact on code written
before or without regard for the change. We both agree that a reasonable
alternative is

3. Modify the baseline definition of unfoldr2 as follows:

foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c
foldr2 k z = go
  where
    go [] ys = ys `seq` z
    go _xs [] = z
    go (x:xs) (y:ys) = k x y (go xs ys)

This should, we believe, make the baseline definition fail where the one
fused by foldr2/right would fail, giving consistent semantics.

WHAT MIGHT BREAK

Code that currently works but will break with this change has two
properties:

1. It relies on the asymmetry in foldr, zipWith, or zip to avoid running
into bottom.
2. foldr2/right does not fire, either because the second list is not a good
producer or because GHC applies the foldr2/left rule instead.

That is, most of the code that this change will break is fragile under the
current scheme.

DISCUSSION PERIOD

Standard two weeks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/libraries/attachments/20140824/5fae3e57/attachment-0001.html>


More information about the Libraries mailing list