[Haskell-cafe] Can Haskell outperform C++?

Wed May 23 05:23:42 CEST 2012

On 5/21/12 10:51 AM, Yves Parès wrote:
>> I do think we have the opposite problem, however, in much Haskell code --
> people are using the clean, obviously correct, but inefficient code even in
> standard library functions that really should be optimized like crazy!
>
> And even before optimizing "like crazy", I think the functions that are
> "more often" good should be emphasized: Prelude should export foldl'
> together with/instead of foldl, sum should have its sum' counterpart (or
> even be rewritten to use foldl'), and so on...
> It baffles me that functions that are said to be more efficient in the
> majority of cases are not the default.

Part of this is due to legacy and the Haskell Report. For example, the 
definition of sum is specified by the Report. And unfortunately there 
exist (rare) cases where the semantics of sum and sum' differ: namely 
when the type supports lazy addition (e.g., Peano numbers) and therefore 
can return a partial answer to an infinite summation.

That said, the GHC philosophy in most of these cases has been to:

(a) use their own definitions with a CPP flag USE_REPORT_PRELUDE in case 
you *really* want the Report's definition[1], and

(b) to "secretly" replace foldl by foldl' in cases where it can 
determine the function argument to be strict (and therefore that the 
change doesn't alter the semantics).

That doesn't fix everything, and it's no justification for not having 
the newer Reports give more sane definitions, but it's better than nothing.

Part of the issue re fixing the Report is that there's a tension between 
correctness and expediency, as well as between different notions of 
"correctness". By and large, Haskell has managed to stay out of 
theoretical arguments about choosing what "correct" means when it is 
still controversial in mathematics. (Whence many of the complaints about 
the non-mathematical nature of the numeric type classes.) But which of 
the foldr vs foldl vs foldl' definitions of summation is the "correct" 
definition? Mathematically, infinite summations should be allowed, and 
summations of infinite quantities should be allowed. However, 
pragmatically, infinite summations are of a different nature than finite 
summations; e.g., when they converge it'd be nice to evaluate them in 
finite time. So on the one hand, a naive view of "correctness" suggests 
that we ought to allow these summations as just more of the same; but on 
the other hand, an alternative notion of "correctness" suggests that 
while infinite sums and sums of infinites should be handled, they should 
be handled by different means than traditional finite sums of finite values.

The foldl' definition of summation only allows finite summations of 
finite[2] values; the foldl definition only allows finite summations, 
but they can be summations of infinite values; the foldr definition 
allows both infinite values and infinitely many summands, though it's 
not especially useful for handling infinite summations[3]. We can 
probably exclude foldr with all fairness, and tell folks to handle 
infinite summations in another way. But between foldl and foldl' there's 
(room for) a lot more debate about which should be blessed by the 
Prelude. Do we want to support lazy numbers out of the box, or do we 
want to support performance for strict numbers out of the box? While 
there's been a growing strictification of Haskell for performance 
reasons, do recall that laziness was one of the initial reasons for 
defining Haskell in the first place. Hence the old Report's definition. 
The role of Haskell as a research engine has moved on since then, but 
the decision to codify that is still non-trivial.

[1] 
http://hackage.haskell.org/packages/archive/base/4.5.0.0/doc/html/src/Data-List.html#sum

[2] For the purpose of discussion, I'm considering the Float and Double 
values for infinities to be "finite", because they are identical to the 
finite values with respect to strictness and size of representation.

[3] It could take infinitely long to return, even for types which allow 
lazily returning partial values. For example,

     data Peano = Z | S Peano
     instance Num Peano where ...

     zeros = Z : zeros

     -- Has to traverse the whole list, just in case there's a (S _) in
     -- there somewhere, before it knows whether to return Z or (S _).
     main = print (foldr (+) Z zeros)

-- 
Live well,
~wren