idiom for different implementations of same idea

Thu, 1 Aug 2002 14:34:00 -0700 (PDT)

Hi all,

I'm looking for some advice on what's the cleanest way to implement
something.  The basic idea is that I have a task to solve, T.  There are
many steps to solving this task, but they can be broken down into a small
list of elementary steps:

  - prepareData
  - initialize
  - doThingOne
  - doThingTwo
  - getResults

where the main driver does something like:

  prepareData
  initialize
  iterate until converged
    doThingOne
    doThingTwo
  getResults

As is standard in my field (statistical natural langauge processing), I
have several models defined to perform task T (though only the first, most
basic one is implemented).  Call these Model0, Model1, Model2, and so on.

All of these models, since they're solving the same basic task, have the
same basic types on their functions, something like (a bit simplified,
but):

  prepareData :: Data () -> Data markup
  initialize  :: Data markup -> ST s table
  doThingOne  :: Data markup -> table -> ST s alignments
  doThingTwo  :: Data markup -> alignments -> ST s table
  getResults  :: Data markup -> table -> alignments -> String

Simple enough.  Now, say I have three models (0-2) which implement these
functions with varying complexities (also of varying complexities in terms
of the types of 'markup', 'table' and 'alignments').  Each model has a
different idea of what these three types are.

Now, I want in my executable my user to be able to say "-model=0" and so
on in the command line and for it to use the appropriate model.  Each of
these models will go in a separate module.

One way to do this would be to import all of the models qualified and then
if they choose Model0, pass to the "go" function Model0.prepareData,
Model0.initialize, etc.  This is fine, simple, good.  But it doesn't
enforce at all the types of the functions.

Another way to go would be to make a class, something like:

class Model model markup table alignments 
    | model -> markup, table, alignments where
  prepareData :: model -> Data () -> Data markup
  initialize  :: model -> Data markup -> ST s table
  doThingOne  :: model -> Data markup -> table -> ST s alignments
  doThingTwo  :: model -> Data markup -> alignments -> ST s table
  getResults  :: model -> Data markup -> table -> alignments -> String

where the model type/parameter is essentially a dummy to tie everything
together.  This could be implemented a bit more cleaning with the
definition:

data T a

and then the first parameter for all of those class functions changing
from "model" to "T model".  Each model module would then have it's own
datatype, something like:

> (in module Model0:)
> data Model0
> 
> instance Model Model0 Int Table Al where
>   ...

This is another option; however I don't think it's very clean for two
reasons:

  - there are a lot of fundeps; in the actual application there
    are a few more type variables than I presented here, and it
    gets very long :)
  - the model parameter is used only to determine which model
    we're using and doesn't actually do anything other than
    satisfy the typechecker

There are probably a plethora of alternatives I haven't considered, but
I'm sure people have done something similar to this before and I'm curious
how they handled it...

Thanks for reading this far :)

 - Hal

--
Hal Daume III

 "Computer science is no more about computers    | hdaume@isi.edu
  than astronomy is about telescopes." -Dijkstra | www.isi.edu/~hdaume