[Haskell-cafe] Computing a sorted list of products lazily
wren ng thornton
wren at freegeek.org
Fri Apr 17 23:20:56 EDT 2009
Jason Dagit wrote:
> Hello,
>
> A colleague of mine recently asked if I knew of a lazy way to solve
> the following problem:
> Given two sets of sorted floating point numbers, can we lazily
> generate a sorted list of the products from their Cartesian product?
>
> The algorithm should return the same result as:
> sortProduct a b = sort [ x * y | x <- a, y <- b ]
>
> But, my friend is not satisfied with the strictness of sort. In
> particular, he doesn't want to have to hold the whole list in memory
> to sort it. He would like to compute the results one product at a
> time and in the correct order.
The algorithm you're looking for is called "(lazy) cube pruning" and is
quite common in machine translation. If you're willing to slog through
MT papers, this one[1] gives a decent introduction as I recall. The
algorithm works for any monotone combination function.
The basic idea is to do a frontier search on the cartesian grid. We know
a priori that the first elements of each list will give the "best"
combination value. We return that value as the first of our new list,
and mark it off on the grid (not really), we then look at the next
element for each list and compute the combination values for each of the
new fringe cells, and store them in a heap. Pick the best value out of
the heap and return it as the next element of the list, 'mark it off' on
the grid and explore the new fringe.
Because the combination function is monotone, this search strategy
guarantees you'll encounter every combination values in the right order
to return them with minimal delay. The only downside is that you need to
maintain the heap of seen values that aren't ready to be returned yet
(which can grow quite large in the worst case). For MT systems where
this can really be a problem, the cube search strategy is often
integrated with beam search (i.e. if the heap grows larger than some
size limit then the "worst" values start dropping out of the bottom)---
hence the name cube *pruning*.
If you don't feel encumbered by the LGPL, we have a Java implementation
of the full algorithm already. Your use case is much simpler than the
variant necessary for MT, but the reference implementation can be quite
helpful. Feel free to contact me off-list if you're interested, I can
point you to our repository and where the algorithm is in the morass of
code.
[1] http://www.cis.upenn.edu/~lhuang3/acl-cube.pdf
--
Live well,
~wren
More information about the Haskell-Cafe
mailing list