Proposal: (breaking change to avoid fragile breakage) change the definition of foldr2, zip, and zipWith

David Feuer david.feuer at gmail.com
Mon Sep 8 06:58:37 UTC 2014


It's been a couple weeks now, and no one's responded. I'd like to
submit a patch for this one ASAP to update source and/or
documentation, but I would really like to get some guidance from
anyone else who may care.

On Sun, Aug 24, 2014 at 3:22 PM, David Feuer <david.feuer at gmail.com> wrote:
> BACKGROUND
>
> TRAC: #9495
>
> GHC's definition of foldr2 violates the spec of the Haskell Report, making
> some programs less defined than they should be, and does so in a way that is
> quite fragile, dependent on a number of different factors—that is, it is
> likely hard to debug.
>
> Details:
>
> The current "baseline" definition of foldr2 (before rewrite rules) does what
> the Report requires:
>
> foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c
> foldr2 k z = go
>   where
>     go [] _ys = z
>     go _xs [] = z
>     go (x:xs) (y:ys) = k x y (go xs ys)
>
> There are two rewrite rules, foldr2/left and foldr2/right, designed to allow
> foldr2 to fuse with build appearing in either the first or second list
> argument. foldr2/left is perfectly safe (as long as the build argument is
> legitimate). foldr2/right is not, as comments in the source partially
> explain. The problem shows up if you have something shaped like
>
> zip [1,7,42,5] (1:8:4:11:undefined)
>
> but where the second list (ending in undefined) is actually produced using
> build, and GHC chooses to fuse foldr2 with it. The Report requires the above
> to produce [(1,1),(7,8),(42,4),(5,11)], but GHC will instead produce
> (1,1):(7,8):(42,4):(5,11):_|_
>
> This issue can make a program that works with -O0 fail with -O, and can
> cause fragility depending on exactly what fuses.
>
> SCOPE
>
> One obvious question is what kinds of build forms can cause problems. Many,
> like map and fromEnumTo, don't have enough power to be likely to cause
> trouble. Joachim Breitner, however, realized that the current implementation
> of filter has enough power to break things. So here's a complete example of
> code that will break with -O but not with -O0:
>
> {-# NOINLINE don'tFuse #-}
> don'tFuse = id
>
> p x = 100 `mod` x < 20
>
> main = print $ zip (don'tFuse [10,9..1]) (filter p [10,9,..])
>
> Things will only get worse if I succeed in my plan to bring unfoldr into the
> framework.
>
> SOLUTIONS
>
> 1. One solution, of course, is to eliminate unfoldr2/right, bringing GHC
> into compliance with the Report. I really like this idea, but Joachim thinks
> it is not worth the potential performance impact on code written before or
> without regard for the change. We both agree that a reasonable alternative
> is
>
> 3. Modify the baseline definition of unfoldr2 as follows:
>
> foldr2 :: (a -> b -> c -> c) -> c -> [a] -> [b] -> c
> foldr2 k z = go
>   where
>     go [] ys = ys `seq` z
>     go _xs [] = z
>     go (x:xs) (y:ys) = k x y (go xs ys)
>
> This should, we believe, make the baseline definition fail where the one
> fused by foldr2/right would fail, giving consistent semantics.
>
> WHAT MIGHT BREAK
>
> Code that currently works but will break with this change has two
> properties:
>
> 1. It relies on the asymmetry in foldr, zipWith, or zip to avoid running
> into bottom.
> 2. foldr2/right does not fire, either because the second list is not a good
> producer or because GHC applies the foldr2/left rule instead.
>
> That is, most of the code that this change will break is fragile under the
> current scheme.
>
> DISCUSSION PERIOD
>
> Standard two weeks.


More information about the Libraries mailing list