[Haskell-cafe] Re: Seeking advice on a style question

Sun Jan 7 07:30:18 EST 2007

Steve Schafer wrote:
> [Apologies for the long delay in replying; I've been traveling, etc.]
[never mind]

>> The other extreme is the one I favor: the whole pipeline is expressible
>> as a chain of function compositions via (.). One should be able to write
>>
>>  process = rectangles2pages . questions2rectangles
>>
>> This means that (rectangles2pages) comes from a (self written) layout
>> library and that (questions2rectangles) comes from a question formatting
>> library and both concern are completely separated from each other. If
>> such a factorization can be achieved, you get clear semantics, bug
>> reduction and code reuse for free.
> 
> I favor that approach, too. ;) The problem is that when there is a
> multi-step process, and various bits of information get propagated
> throughout, as required by the various steps in the process, the overall
> decomposition into a series of steps a . b . c . ... can become brittle
> in the face of changing requirements.
> 
> Let's say, for example, a change request comes in that now requires step
> 13 to access information that had previously been discarded back at step
> 3. The simple approach is to propagate that information in the data
> structures that are passed among the intervening steps. But that means
> that all of the steps are "touched" by the change--because the relevant
> data structures are redefined--even though they're just passing the new
> data along.

Ah, I forgot to point it out, polymorphism is your dear friend, of
course. For example, 'rectangles2pages' should be fully polymorphic in
the stuff that's inside the rectangles, just like for instance 'nub ::
Eq a => [a] -> [a]' is polymorphic in the list elements. One possibility
is something like

   class Rectangle a where
       width, height :: a -> Integer

   type Pagenumber = Integer
   data Rectangle a => Pages a = Pages {
         stickyboxes   :: [(Position, a)]  -- appear on every page
       , pagenumberpos :: Position         -- absolute numbering later
       , pages         :: Data.Map Pagenumber [(Position, a)]
                                           -- actual contents
       }
   data Position = Position { x :: Integer, y :: Integer }

   data Footer a = Footer { content :: a, position :: HAlign }
   data HAlign = Left | Center | Right

   rectangles2pages :: Rectangle a => Footer a -> [a] -> Pages a

but there are many others. The type of 'rectangles2pages' dictates that
it can only rearrange but not alter the data inside 'a' (this is due to
_parametric_ polymorphism). Now, you may use it for normal text
processing via

   instance Rectangle Paragraph where

Or you can abuse the pagination algorithm to align bread, buns and
cookies on several tablets for baking in the stove:

   instance Rectangle Cookie where -- in the sense of bounding box

however you like it :) The point is that you will have minimal trouble
when requirements change if you somehow managed to keep
'rectangles2pages' as general as possible. The code above does not need
to be changed if you need to carry extra information around in 'a'.
That's what I meant with "(self written) layout library": for me, a
"library" is necessarily polymorphic.

Later on, you can specialize the types. So printing will be

   print :: Pages Paragraph -> Graphic

and it is clear that 'Pages Cookie' cannot be printed. A pipeline could
look like

   process = print . rectangles2pages footer .  ...
       where footer = loadFooter item

Note that I chose to still plumb a footer around, which makes sense in
case it is fully specified in the item. Of course, it can also be put in
pair '(Footer a, [a])':

   process = print . uncurry rectangles2pages .  ...

which is the way to go if it is generated on the fly by the previous
step in the pipeline. Given suitable generality, the right choice is
often natural and satisfying. And who says that there isn't an even
better generalization that incorporates the footer more elegantly?

Of course, the difficult thing is to discover the right generalization.
This can be quite an art. Oh, I can spent days of thinking without
writing a single line of code and I have never, ever encountered a
situation where it wasn't worth the effort. The point is: you have to
implement the corresponding general functionality anyway because often,
even the special case needs the full power in some way or another.
Implementing it for the special case only is like coding with a
blindfold. Still, the generalization can turn out to be inadequate, but
because we don't need to worry about the type 'a', things will be
easier. And, at some point, the interface doesn't need to change
anymore: every change makes it more general, there is a maximum
generality and we know that monotone sequences converge :)

>> Btw, the special place "end" suggests that the "question markup
>> language" does not incorporate all of: "conditional questions",
>> "question groups", "group templates"? Otherwise, I'd just let the user
>> insert
>>
>>   <if media="print">
>>      <template-instance ref="endquestions.xml" />
>>   </if>
>>
>> at the end of every questionnaire. If you use such a tiny macro language
>> (preferably with sane and simple semantics), you can actually merge
>> (stripUndisplayedQuestions) and (appendEndQuestions) into a function
>> (evalMacros) without much fuss.
> 
> If only I had the power to impose those kinds of changes....
> 
> Unfortunately, I have little control over the logical organization of
> questions, questionnaires and all of the other little bits and pieces.
> (I assure you I would have done it quite differently if I could.)
> Instead, I have to deal with an ad hoc pseudo-hierarchical
> quasi-relational database structure, and to settle for occasional extra
> columns to be added to the tables in order to specify information that I
> can't synthesize any other way.

Argh. Dealing with badly designed external interfaces is a nightmare.
But there may be a way out: while you unfortunately cannot change their
database design, they fortunately cannot control how your code
represents their database. Who says that you import the database as it
is? Who forbids to rearrange it internally? I think I'd interleave a
processing step

   saneImport :: RawDatabase -> (QuestionHierarchy,PageTemplate)

that fetches all present information and groups and massages the data
until it becomes organized. This is much like your 'questions' and
'pagemaster' combined, but it only depends on the data base content, not
on parameters like 'mediaKind' or 'Item' which determine how and what
content is to be processed. This way, all troubles arising from a badly
designed database are banned into 'saneImport' and don't affect the
actual processing step.

>> Fine, though I don't see exactly why this isn't done before after the
>> questions have been transformed to printable things but before there
>> are distributed across pages. So the references cannot refer to page
>> numbers, yet must be processed after transforming questions to
>> rectangles?
>
> It's not until you get to the "rectangles" level that you can see the
> text and tokens that need to be replaced.

I don't quite understand why, but let's have another example of
polymorphism that may be used before the "rectangle" level but also
after. We tackle macros (the end questions can be transformed to them
when fetching them from the database) and cross references inside some
other data structure. One possibility to transform questions with macros
and references to plain text is

       -- very simple macros
   data Macro a = Return a | If (MediaKind -> Bool) a

   execute1 :: MediaKind -> Macro a -> Maybe a
   execute1 _ (Return x) = Just x
   execute1 media (If b x) = if b media then Just x else Nothing

       -- execute can be formulated with an ApplicativeFunctor
       -- but we'll keep it easy here
   executeMacros :: MediaKind -> [Macro a] -> [a]
   executeMacros media xs = concatMap (maybeToList . execute1 media)

        -- references
   data Ref a = Here a | Ref CrossRef

   resolve :: Data.Map CrossRef a -> Ref a -> a
   resolve = ... -- for simplicity, the lookup always succeeds

        -- questions whose question text is of type a
   data Question a = FreeForm a (Answer String)
                   | SingleChoice (Answer Index) [a]

   instance Functor Question where
        fmap f (FreeForm x a) = FreeForm (f x) a
        fmap f (SingleChoice a xs) = SingleChoice a (map f xs)

        -- now comes the interesting type
   type RawQuestion = Macro (Question (Ref String))
        -- and we want to flatten it to (Question String)
   textonly :: MediaKind -> Data.Map CrossRef String
               -> [RawQuestion] -> [Question String]
   textonly media refs = fmap (resolve refs) . executeMacros media

The basic idea is the following: it is clear how to resolve a single
cross reference, that's what

  resolve :: Data.Map CrossRef a -> Ref a -> a

does. Now, we simply want to lift this into other data structures:

 resolveQs :: Data.Map CrossRef a
              -> Question (Ref String) -> Question String
 resolveQs refs = fmap (resolve refs)

The lifting itself can be made polymorphic in the sourrounding structure:

 resolveInside :: Functor f => Data.Map CrossRef a
             -> f (Ref String) -> f String
 resolveInside refs = fmap (resolve refs)

with the help of type constructors and functors. In more complicated
cases, one may need other type classes like those from 'Data.Foldable',
'Data.Applicative', 'Data.Traversable'. Depending on the functions that
need to be lifted, custom (multi parameter) type classes may quickly
show up. In general, the discipline of lifting functions systematically
to many data types is called "generic programming".

In a sense, not only the pipeline's functionality needs to be assembled
from smaller functions, but their types need to be assembled from
smaller pieces as well!

> As hard as it may be to believe, the people who are responsible for
> approving the questionnaires see it like this: If the system produces
> one 5-page "front" questionnaire and ten 6-page "back" questionnaires,
> then that's 65 pages that they have to inspect. But if the system were
> to produce ten 11-page questionnaires, even though the first five pages
> of each questionnaire are generated from exactly the same data using
> exactly the same software, that's 110 pages that they have to inspect.

X-)

> --------
> Thanks for all of the discussion. I think I have a lot to ponder....

May the λ guide your path ;) And of course, you can always outsource
some pondering to the mailing list.

Regards,
apfelmus