[Haskell-cafe] Can Haskell outperform C++?
wren ng thornton
wren at freegeek.org
Wed May 23 05:23:42 CEST 2012
On 5/21/12 10:51 AM, Yves Parès wrote:
>> I do think we have the opposite problem, however, in much Haskell code --
> people are using the clean, obviously correct, but inefficient code even in
> standard library functions that really should be optimized like crazy!
>
> And even before optimizing "like crazy", I think the functions that are
> "more often" good should be emphasized: Prelude should export foldl'
> together with/instead of foldl, sum should have its sum' counterpart (or
> even be rewritten to use foldl'), and so on...
> It baffles me that functions that are said to be more efficient in the
> majority of cases are not the default.
Part of this is due to legacy and the Haskell Report. For example, the
definition of sum is specified by the Report. And unfortunately there
exist (rare) cases where the semantics of sum and sum' differ: namely
when the type supports lazy addition (e.g., Peano numbers) and therefore
can return a partial answer to an infinite summation.
That said, the GHC philosophy in most of these cases has been to:
(a) use their own definitions with a CPP flag USE_REPORT_PRELUDE in case
you *really* want the Report's definition[1], and
(b) to "secretly" replace foldl by foldl' in cases where it can
determine the function argument to be strict (and therefore that the
change doesn't alter the semantics).
That doesn't fix everything, and it's no justification for not having
the newer Reports give more sane definitions, but it's better than nothing.
Part of the issue re fixing the Report is that there's a tension between
correctness and expediency, as well as between different notions of
"correctness". By and large, Haskell has managed to stay out of
theoretical arguments about choosing what "correct" means when it is
still controversial in mathematics. (Whence many of the complaints about
the non-mathematical nature of the numeric type classes.) But which of
the foldr vs foldl vs foldl' definitions of summation is the "correct"
definition? Mathematically, infinite summations should be allowed, and
summations of infinite quantities should be allowed. However,
pragmatically, infinite summations are of a different nature than finite
summations; e.g., when they converge it'd be nice to evaluate them in
finite time. So on the one hand, a naive view of "correctness" suggests
that we ought to allow these summations as just more of the same; but on
the other hand, an alternative notion of "correctness" suggests that
while infinite sums and sums of infinites should be handled, they should
be handled by different means than traditional finite sums of finite values.
The foldl' definition of summation only allows finite summations of
finite[2] values; the foldl definition only allows finite summations,
but they can be summations of infinite values; the foldr definition
allows both infinite values and infinitely many summands, though it's
not especially useful for handling infinite summations[3]. We can
probably exclude foldr with all fairness, and tell folks to handle
infinite summations in another way. But between foldl and foldl' there's
(room for) a lot more debate about which should be blessed by the
Prelude. Do we want to support lazy numbers out of the box, or do we
want to support performance for strict numbers out of the box? While
there's been a growing strictification of Haskell for performance
reasons, do recall that laziness was one of the initial reasons for
defining Haskell in the first place. Hence the old Report's definition.
The role of Haskell as a research engine has moved on since then, but
the decision to codify that is still non-trivial.
[1]
http://hackage.haskell.org/packages/archive/base/4.5.0.0/doc/html/src/Data-List.html#sum
[2] For the purpose of discussion, I'm considering the Float and Double
values for infinities to be "finite", because they are identical to the
finite values with respect to strictness and size of representation.
[3] It could take infinitely long to return, even for types which allow
lazily returning partial values. For example,
data Peano = Z | S Peano
instance Num Peano where ...
zeros = Z : zeros
-- Has to traverse the whole list, just in case there's a (S _) in
-- there somewhere, before it knows whether to return Z or (S _).
main = print (foldr (+) Z zeros)
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list