[Haskell-cafe] Computing a sorted list of products lazily

Fri Apr 17 23:20:56 EDT 2009

Jason Dagit wrote:
> Hello,
> 
> A colleague of mine recently asked if I knew of a lazy way to solve
> the following problem:
> Given two sets of sorted floating point numbers, can we lazily
> generate a sorted list of the products from their Cartesian product?
> 
> The algorithm should return the same result as:
> sortProduct a b = sort [ x * y | x <- a, y <- b ]
> 
> But, my friend is not satisfied with the strictness of sort.  In
> particular, he doesn't want to have to hold the whole list in memory
> to sort it.  He would like to compute the results one product at a
> time and in the correct order.

The algorithm you're looking for is called "(lazy) cube pruning" and is 
quite common in machine translation. If you're willing to slog through 
MT papers, this one[1] gives a decent introduction as I recall. The 
algorithm works for any monotone combination function.

The basic idea is to do a frontier search on the cartesian grid. We know 
a priori that the first elements of each list will give the "best" 
combination value. We return that value as the first of our new list, 
and mark it off on the grid (not really), we then look at the next 
element for each list and compute the combination values for each of the 
new fringe cells, and store them in a heap. Pick the best value out of 
the heap and return it as the next element of the list, 'mark it off' on 
the grid and explore the new fringe.

Because the combination function is monotone, this search strategy 
guarantees you'll encounter every combination values in the right order 
to return them with minimal delay. The only downside is that you need to 
maintain the heap of seen values that aren't ready to be returned yet 
(which can grow quite large in the worst case). For MT systems where 
this can really be a problem, the cube search strategy is often 
integrated with beam search (i.e. if the heap grows larger than some 
size limit then the "worst" values start dropping out of the bottom)--- 
hence the name cube *pruning*.

If you don't feel encumbered by the LGPL, we have a Java implementation 
of the full algorithm already. Your use case is much simpler than the 
variant necessary for MT, but the reference implementation can be quite 
helpful. Feel free to contact me off-list if you're interested, I can 
point you to our repository and where the algorithm is in the morass of 
code.

[1] http://www.cis.upenn.edu/~lhuang3/acl-cube.pdf

-- 
Live well,
~wren