[Haskell-cafe] Refactoring type-class madness

Fri Jul 16 01:07:00 EDT 2010

Andrew Webb wrote:
> Because, at the basic level all of the experiments share this type of
> data, it seems that I should be able to write analysis functions that
> work for any experiment. However, the experiments differ in the
> stimuli used, and associated with each stimulus set is a set of
> "milestones" that give times at which important things happen in the
> stimuli, and "regions of interest" that give areas of the visual scene
> that are considered important. For a single experiment I would have:
> 
> data Experiment = Exp [Trial]
> data Trial = Trial [Event] (Map MileStone Time)
> data MileStone = M1 | M2 | ...
> 
> [...]
> 
> So, I was wondering whether there was something wrong with my basic
> model which leads to this ugly type class, or whether this is the
> proper way forward. Either is fine, really, it would just be nice to
> know for certain.

In sounds like you're trying to use typeclasses as if they were 
OO-classes, which is a good way to confuse yourself. What you probably 
want is just parametric polymorphism. For example, if every experiment 
is a sequence of trials, and every trial is a sequence of events with 
some milestones, then you can get the generality you want with:

     data Experiment m = Exp [Trial m]
     data Trial m = Trial [Event] (Map m Time)
     data Event = Fixation {...} | Saccade {...}

     data A = MS_A1 | MS_A2 | ... deriving (Ord, Enum)
     data B = MS_B1 | MS_B2 | ... deriving (Ord, Enum)
     ...

then you would pass around (Experiment A), (Experiment B), etc. The 
reason for the Ord instances is so you can use them as keys in Map, and 
the reason for Enum is just so you have a generic interface for listing 
all the milestones (though getting the keys of the map may suffice).

The main reason for wanting to use typeclasses is when you have a common 
interface (i.e. set of function names and types), but the 
implementations of that interface are structurally/algorithmically 
different. If the structure of the implementation is the same and only 
the type of some component changes, then parametric polymorphism is the 
way to go.

Off-topic to your original question, it seems like a better model for 
your data might be to treat milestones as a third kind of event. Thus, a 
trial is just a sequence of events, which could be subject events 
(fixation, saccades) or experimental events (milestones, etc). This is 
assuming that your processing only cares about how patient events occur 
relative to experimental events, and that you don't need access to 
experimental events separately.

If you need to be able to jump around to different events, then you 
could use Trial[Event](Map m [Event]) and construct the map after 
reading input by walking over the list of events and storing pointers to 
the subsequence beginning with each milestone:

     computeMilestones :: Trial m -> Trial m
     computeMilestones (Trial es m) = Trial es (go es m)
         where
         go []         m = m
         go es@(e:es') m = go es' (m' e)
             where
             m' (MS x) = insert x es m
             m' _      = m

-- 
Live well,
~wren