Lazy minimum

Wed Nov 19 20:06:27 EST 2008

Compare these two lines in ghci (if you have at least 4 GB of memory):

> let a = replicate (2^26) 2 in minimum [[1],a,a]
> let a = replicate (2^26) 2 in minimum [a,a,[1]]

The first finishes much faster than the second.

This comes up for example in isomorphism testing of graphs embedded in  
surfaces, which is much easier than the general case: A choice of a  
directed edge determines a walk of the vertices and edges of the  
graph, from which the graph can be recovered. A minimal walk can be  
used to determine a canonical form for the graph, making for a nice  
Haskell one-liner:

> normal graph = unwalk $ minimum $ map (walk graph) $ edges graph

However, this code suffers from the same issue as the toy ghci lines  
above. Even though the walks are lazy, the minimum function wastefully  
explores them.

It's clear how to fix this, comparing lazy lists: Play rounds of  
"Survivor", peeling off the heads of the lists at each iteration. One  
inevitably evaluates all of each minimal list, if there is more than  
one. This is familiar. For my example, the automorphisms of a graph  
show up in the complexity of any isomorphism algorithm; they manifest  
themselves here as the set of minimal walks.

What I'm wondering, however, is if there is a way to code "minimum"  
efficiently in general,

> minimum :: Ord a => [a] -> a

where one knows absolutely nothing further about the type "a", but one  
believes that lazy evaluation will run afoul of the above issue.

It would seem that this would require compiler support, allowing code  
to access approximations to lazy data generalizing the head of a lazy  
list. I'm reminded of working with power series by working mod x^n for  
various n. Here, I'd like a bounded version of "compare", that  
returned EQ for data that agreed to a specified lazy evaluation depth.

Did I miss that class? Is there a construct in GHC that would allow me  
to write "minimum" efficiently for lazy data?