[Haskell-cafe] Unnecessarily strict implementations
Daniel Fischer
daniel.is.fischer at web.de
Thu Sep 2 07:41:47 EDT 2010
On Thursday 02 September 2010 09:25:59, Jan Christiansen wrote:
> Hi,
>
> On 02.09.2010, at 01:35, Daniel Fischer wrote:
> > It's not that it's not as non-strict as possible per se. (Sorry, had
> > to :)
> > It's that intersperse's current definition (in GHC at least) can
> > cause a
> > space leak. In this case, making the function less strict can cure
> > it, in
> > other cases, more strictness might be the solution.
>
> I would be very happy if you would share this example with me. I am
> looking for an example where the current implementation of intersperse
> or inits causes a space leak for quite a while now.
>
I don't see how the current implementation of inits or tails could cause a
space leak that the lazier versions wouldn't, so you'd have to wait longer
for such an example.
For intersperse,
$ cabal update && cabal install stringsearch
You need the new version 0.3.1, Data.ByteString.Lazy.Search[.DFA].splitXXX
had their own space leak in 0.3.0 [caused by too much laziness].
Then
===========================================
{-# LANGUAGE BangPatterns #-}
module Main (main) where
import System.Environment (getArgs)
import qualified Data.ByteString as S
import qualified Data.ByteString.Lazy as L
import qualified Data.ByteString.Char8 as C
import Data.ByteString.Lazy.Search (split)
main :: IO ()
main = do
(file : pat : sub : _ ) <- getArgs
let !spat = C.pack pat
!ssub = L.fromChunks [C.pack sub]
work = ical ssub . split spat
L.readFile file >>= L.putStrLn . L.take 100 . work
ical :: L.ByteString -> [L.ByteString] -> L.ByteString
ical new = L.concat . intersperse new
intersperse :: a -> [a] -> [a]
intersperse sep [] = []
intersperse sep (x:xs) = x : go xs
where
go [] = []
go (y:ys) = sep : y : go ys
============================================
has no space leak, if you replace the local intersperse with
Data.List.intersperse (equivalent, ical = L.intercalate), you have a space
leak.
To expose the leak, take a sufficiently large file (say 10MB or larger) and
replace a pattern that does not occur in the file or occurs late in the
file,
$ ./noleak file pat sub
runs fast in small memory,
$ ./leak file pat sub
takes a little to run and keeps the entire file until the first occurrence
of pat in memory.
Note that the above implementation of intersperse has different semantics
from Data.List.intersperse,
Data.List.intersperse ',' ('a':_|_) = _|_
intersperse ',' ('a':_|_) = 'a':_|_
Data.List.intersperse ',' ('a':'b':_|_) = 'a' : ',' : _|_
intersperse ',' ('a':'b':_|_) = 'a' : ',' : 'b' : _|_
etc.
> > On the other hand, we currently have
> >
> > intersect [] _|_ = []
> >
> > and one of intersect _|_ [] and intersect [] _|_ must give _|_.
> > Which one is a matter of choice.
>
> I am sorry for not being precise. You are right. But right now we have
> intersect xs [] = _|_ for every list xs terminated by _|_. But I
> suffices to evaluate xs to head normal to decide that the result
> should be []. That is, we could have
>
> intersect [] _|_ = [] and intersect (_|_:_|_) [] = []
>
> or
>
> intersect [] (_|_:_|_) = [] and intersect _|_ [] = []
>
> and the current implementation satisfies neither.
>
Right. So the question is, has the current implementation advantages over
either of these? (I don't see any.) If not, which of these two behaviours
is preferable?
> > And before that, the rule intersect [] _ = [] if the current
> > behaviour of
> > intersect [] should be retained.
>
> That's a deal.
>
> >> The implication (<=) :: Bool -> Bool -> Bool is too strict as well.
> >> We
> >> have False <= _|_ = _|_ as well as _|_ <= True = _|_ while one of
> >> these cases could yield True.
> >
> > I'm not convinced either should (nor that they shouldn't).
>
> I think this is a matter of elegance rather than a matter of
> efficiency. In the same way as I prefer
>
> False && _|_ = False
>
> over
>
> False && _|_ = _|_
>
> I prefer
>
> False <= _|_ = True
>
> over
>
> False <= _|_ = _|_
>
I have mixed feelings about those. Part of me dislikes breaking the
symmetry between (<=), (==) and compare.
> > The last slide lists among the problems
> > "proposes undesirably inefficient functions (reverse)".
> > I wouldn't equate 'not minimally strict' with 'too strict'.
> > Minimal strictness also can have negative effects, one must look at
> > each
> > case individually.
>
> I second this but in my opinion the minimally strict implementation
> should be the default if there is no reason against it.
Agreed - except I have to object to your use of the definite article, some
functions have several minimally strict implementations.
(Ambiguity of minimal strictness *can* be a reason for a stricter choice,
though probably rarely.)
>
> Cheers, Jan
Cheers,
Daniel
More information about the Haskell-Cafe
mailing list