~ patterns

Mon Jan 30 07:32:24 EST 2006

Simon Marlow wrote:

>On 27 January 2006 10:50, John Hughes wrote:
>
>  
>
>>I'd be fairly horrified at the removal of ~ patterns. I've used them
>>to fix very serious space-leaks that would have been awkward to fix
>>in any other way.
>>    
>>
>
>You use them to *fix* space leaks?  Can you give an example?  I'm
>genuinely surprised and interested - I don't think I've ever seen a case
>of this before, but I've seen the opposite many times (lazy pattern
>matching being the cause of space leaks).
>  
>
One case that sticks in my mind was a library of parsing combinators 
based on the "list
of successes" idea, which kept track of a high-water mark in the input 
in order to report
the position of the first unparseable symbol. I had a list comprehension 
in the definition
of (>>=) with a pattern-match in the generator, of the form 
(a,xs,hwm)<-... if I remember
rightly. It turned out that matching that pattern strictly caused a 
severe space leak. I
inserted a ~, and improved space performance dramatically.

>The reasoning behind removing ~ patterns, for me, is that the number of
>times they are used doesn't justify stealing the ~ symbol and the extra
>complexity in the language design, when you can get the same effect with
>let bindings.  With pattern guards there's even less justification for
>keeping ~, eg.
>
>  f (a, ~(x:xs)) = e
>
>turns into
> 
>  f (a,b) | let (x:xs) = b  = e
>
>you can translate any equation this way.  I would probably write the
>second form in prefernce to the first, because the strict pattern
>matches are clearly separated from the lazy.
>  
>
I guess I use them a lot more than you do!

When I look for space leaks, both seq and ~ play an essential role. We 
all know that
fixing a space leak often means identifying long-lived data, and 
shortening its
lifetime, right? We do that either by consuming it earlier, using seq, 
*or by
producing it later*. Since values often get computed by a pattern-match, 
delaying
their computation means making that pattern match lazy--hence ~ plays an 
important
role. I find delaying the construction of large values in this way very 
effective, and I'm a bit
surprised that the technique doesn't seem to be more widely used!

But an important caveat is that, at least for me, space-debugging has a 
large element
of trial and error. I'm using the profiler, trying a change that I think 
will help,
profiling again, and so on. In order to track down space leaks fast, I 
need to be able
to explore alternatives fast. It's then an advantage that inserting ~ 
doesn't require
restructuring my code (as translating to a let-based form would do), and 
neither
does removing it again if it transpires that it didn't help. Haskell's 
great appeal, for
me, is that it makes me "light on my feet", so I can easily try many 
alternatives.
Making it heavier is a step in the wrong direction.

I don't really agree that ~ adds complexity to the language design, 
given that we need
SOME mechanism for lazy pattern matching. If you write
a denotational semantics of pattern matching, then ~ fits in naturally. 
It has a simple,
compositional semantics. Where I see complexity is that a pattern means 
different
things in a let or in a case, so I have to remember, for each construct, 
whether it's
a strict-matching or lazy-matching construct. Quick, when I write do 
(x,y)<-e...
is that pair matched strictly, like case, or lazily, like let? If 
strictly, then why? There's
no choice to be made here, so why not wait until x or y is used before 
matching? I expect
you know the answer to this question off-hand, but it's an obstacle to 
learning the
language--I'll bet there are many quite experienced Haskell programmers 
who are
uncertain. If only pattern matching was *always* strict, unless a ~ 
appeared, then the
language would be more regular and easier to learn.

John