[Haskell-cafe] Unnecessarily strict implementations

Jan Christiansen jac at informatik.uni-kiel.de
Thu Sep 2 18:22:14 EDT 2010


Hi,

On 02.09.2010, at 13:41, Daniel Fischer wrote:

> takes a little to run and keeps the entire file until the first  
> occurrence
> of pat in memory.

first of all thanks very much for the detailed instructions.

I have rewritten the example slightly using Strings instead of  
Bytestrings. Replacing all occurrences of 'ä' by "ä" in the  
collected works of Shakespeare ; ) has a maximum memory usage of  
around 65MB with the current implementation of intersperse while it  
has a maximum memory usage of only around 5KB with the less strict  
implementation.

replaceBy :: Eq alpha => alpha -> [alpha] -> [alpha] -> [alpha]
replaceBy x sep = concat . intersperse sep . splitBy (==x)

splitBy :: (alpha -> Bool) -> [alpha] -> [[alpha]]
splitBy _ []  = []
splitBy p xs  =
   case break p xs of
         (l,ys) -> l : case ys of
                             []     -> []
                             (_:zs) -> splitBy p zs

This function only runs in constant space if I use the strict pattern  
matching on the result of break. If I use the following implementation  
I observe a linear memory usage instead.

splitBy' :: (alpha -> Bool) -> [alpha] -> [[alpha]]
splitBy' _ []  = []
splitBy' p xs  =
   l : case  ys of
               []     -> []
               (_:zs) -> splitBy' p zs
   where
     (l,ys) = break p xs

I think this is due to the Wadler tuple space leak. The same would  
apply to the current implementation of lines. I wonder whether an  
implementation of lines analogous to splitBy has any disadvantages.

>> That is, we could have
>>
>>   intersect [] _|_ = []   and   intersect (_|_:_|_) [] = []
>>
>> or
>>
>>   intersect [] (_|_:_|_) = []   and   intersect _|_ [] = []
>>
>> and the current implementation satisfies neither.

> Right. So the question is, has the current implementation advantages  
> over
> either of these? (I don't see any.) If not, which of these two  
> behaviours
> is preferable?

I'd prefer the first one as it is in line with the left to right  
pattern matching of Haskell.

> I have mixed feelings about those. Part of me dislikes breaking the
> symmetry between (<=), (==) and compare.

I think you should not blame (<=) for the existence of a function that  
yields a superset of the information that (<=) yields ; )

Cheers, Jan


More information about the Haskell-Cafe mailing list