[Haskell-cafe] Re: Seeking advice on a style question

Thu Jan 4 15:15:30 EST 2007

[Apologies for the long delay in replying; I've been traveling, etc.]

On Sun, 31 Dec 2006 20:11:47 +0100, you wrote:

>The other extreme is the one I favor: the whole pipeline is expressible
>as a chain of function compositions via (.). One should be able to write
>
>  process = rectangles2pages . questions2rectangles
>
>This means that (rectangles2pages) comes from a (self written) layout
>library and that (questions2rectangles) comes from a question formatting
>library and both concern are completely separated from each other. If
>such a factorization can be achieved, you get clear semantics, bug
>reduction and code reuse for free.

I favor that approach, too. ;) The problem is that when there is a
multi-step process, and various bits of information get propagated
throughout, as required by the various steps in the process, the overall
decomposition into a series of steps a . b . c . ... can become brittle
in the face of changing requirements.

Let's say, for example, a change request comes in that now requires step
13 to access information that had previously been discarded back at step
3. The simple approach is to propagate that information in the data
structures that are passed among the intervening steps. But that means
that all of the steps are "touched" by the change--because the relevant
data structures are redefined--even though they're just passing the new
data along.

The less simple (and not always feasible) approach is to essentially
start over again and re-jigger all of the data structures and
subprocesses to handle the new requirement. But this can obviously
become quite a task.

>If there are only the cases of some single question or a full
>questionnaire, you could always do
>
>    blowup :: SingleQuestion -> FullQuestionaire
>    preview = process (blowup a_question) ...
>
>In general, I think that it's the task of (process) to inspect (Item)
>and to plug together the right steps. For instance, a single question
>does not need page breaks or similar. I would avoid overloading the
>(load*) functions and (paginate) on (Item).

A single question can be several pages long, so it does need to be
paginated. The reason for the decomposition as it now stands is that any
item (and there are more kinds of items than just questions and
questionnaires) can be decomposed into a pagemaster and a list of
questions. Once that has occurred, all items acquire essentially the
same "shape." That's why loading the pagemaster and loading the
questions are the first two steps in the process.

>Btw, the special place "end" suggests that the "question markup
>language" does not incorporate all of: "conditional questions",
>"question groups", "group templates"? Otherwise, I'd just let the user
>insert
>
>   <if media="print">
>      <template-instance ref="endquestions.xml" />
>   </if>
>
>at the end of every questionnaire. If you use such a tiny macro language
>(preferably with sane and simple semantics), you can actually merge
>(stripUndisplayedQuestions) and (appendEndQuestions) into a function
>(evalMacros) without much fuss.

If only I had the power to impose those kinds of changes....

Unfortunately, I have little control over the logical organization of
questions, questionnaires and all of the other little bits and pieces.
(I assure you I would have done it quite differently if I could.)
Instead, I have to deal with an ad hoc pseudo-hierarchical
quasi-relational database structure, and to settle for occasional extra
columns to be added to the tables in order to specify information that I
can't synthesize any other way.

>Uh, that doesn't sound good. I assume that the post-processing is not
>implemented in Haskell?

Not even remotely so. ;) In the paper world, post-processing consists of
semi-automated collation and stapling of the actual printed pages. In
the electronic world, during previous survey periods, an analogous
process was used (a "front" questionnaire and a "back" questionnaire
would be figuratively stapled together); we're looking to make the
merging a bit smoother and more automatic this time around.

As is often the case, the motivation for the rather arcane
post-processing is human, rather than technical. Let's say I have ten
different questionnaires, where the first five pages of each
questionnaire are identical, and these are followed by six additional
pages that differ from one questionnaire to another. That's a total of
10 * 11 = 110 pages, but only 5 + 10 * 6 = 65 _distinct_ pages.

As hard as it may be to believe, the people who are responsible for
approving the questionnaires see it like this: If the system produces
one 5-page "front" questionnaire and ten 6-page "back" questionnaires,
then that's 65 pages that they have to inspect. But if the system were
to produce ten 11-page questionnaires, even though the first five pages
of each questionnaire are generated from exactly the same data using
exactly the same software, that's 110 pages that they have to inspect.

>Fine, though I don't see exactly why this isn't done before after the
>questions have been transformed to printable things but before there are
>distributed across pages. So the references cannot refer to page
>numbers, yet must be processed after transforming questions to rectangles?

It's not until you get to the "rectangles" level that you can see the
text and tokens that need to be replaced.

--------

Thanks for all of the discussion. I think I have a lot to ponder....

Steve Schafer
Fenestra Technologies Corp.
http://www.fenestra.com/