[Haskell-beginners] I have created an ugly Haskell program..

Mon Nov 2 11:48:09 EST 2009

On Sun, Nov 01, 2009 at 11:27:42PM +0000, Philip Scott wrote:
> .. and I am positive there must be a way of beautifying it, but I am 
> struggling. I bet there is just some lovely way of making this all shrink to 
> three lines..
> 
> So here's the problem. I have two lists of tuples: (timestamp, value)
> 
> What I would like to do in do a kind of 'zip' on two of these lists to make a 
> list of (timestamp, (value1, value2)) with the following rules:
> 
> - If the timestamps are equal it's easy - make your new element an move on
> - If one of the lists has a timestamp that the other doesn't, repeat an old 
> value from the other list
> - If we don't have an old value yet, then don't create an element in the new 
> list.

Ask yourself: What Would Conal Do (WWCD)?  Conal Elliott is always
trying to get people to think about the semantic essence of their
problems, so let's try it.

What are we REALLY trying to do here?  What are those lists of tuples,
REALLY?  Well, it seems to me that the lists of tuples are really just
representing *functions* on some totally ordered domain.  The
list-of-pairs representation takes advantage of the fact that these
functions tend to be constant on whole intervals of the domain; we
only need a tuple to mark the *beginning* of a constant interval.  The
fact that we want to take a value from the last old timestamp when we
don't have a certain timestamp in the list reflects the fact that the
function takes on that value over the whole *interval* from the
timestamp when it occurred to whenever the next timestamp is.

So, let's try converting these lists of pairs to actual functions:

  asFunc :: (Ord a) => [(a,b)] -> (a -> Maybe b)
  asFunc is a = fmap snd . listToMaybe . reverse . takeWhile ((<=a) . fst) $ is

Simple -- we just scan through the list looking for the right
interval.

Now the combining function is just a matter of converting the lists to
functions, and applying those functions to each index we want in the
output list (discarding any Nothings).

  combine :: (Ord a) => [(a,b)] -> [(a,c)] -> [(a,(b,c))]
  combine is js = catMaybes . flip map ixs $ \a ->
                    fmap ((,) a) (liftA2 (,) (asFunc is a) (asFunc js a))
    where ixs = sort . nub $ map fst is ++ map fst js

Done!  

Now, you might object that this is much more inefficient than the
other solutions put forth.  That is very true.  Converting to a
function with 'asFunc' means that we do a linear-time lookup in the
list every time we call the function, so this is O(n^2) overall
instead of O(n).  Building the list of indices ixs in the code above
is also O(n^2) instead of O(n).  However, I still find it very helpful
to think about the essence of the problem like this: elegant yet
inefficient code is a much better starting place than the other way
around!  From here there are several possibilities: maybe this version
is efficient enough, if you'll only be working with small lists.  Or
you can also try to optimize, taking advantage of the fact that we
always call the functions built by asFunc with a sequence of strictly
increasing inputs.  I might make a sort of "iterator" object which
acts like a function (a -> Maybe b), but keeps some extra state so
that as long as you call it with strictly increasing values of a, you
get back a Maybe b (and a new iterator) in constant time.  Of course,
this is really what the other solutions are doing: but thinking about
it this way has helped to structure the solution in a (hopefully) more
clear and elegant way.

-Brent