[Haskell-cafe] streaming translation using monads

Wed Nov 19 18:33:14 EST 2008

On Nov 19, 2008, at 12:19 PM, Justin Bailey wrote:

> On Wed, Nov 19, 2008 at 11:50 AM, Warren Harris
> <warrensomebody at gmail.com> wrote:
>> Now perhaps the in-memory list part was a bad conclusion since the  
>> queries
>> can be decorated with translation functions capable of streaming  
>> the results
>> out to another channel. However, the use of a universal type for  
>> the values
>> would still seem to be required since there is no way to implement
>> type-indexed values when the queries themselves are expressed as an  
>> abstract
>> datatype rather than as functions. Am I overlooking something?
>>
>
> If the type of the input determines the type of the output, and the
> type of the input can be determined statically, then I think you are
> overlooking this technique. But if your application requires that each
> input expression be sent to the remote client before type information
> can be determined, then you are right.
>
> In the case of HaskellDB, each operator/term in the Query monad
> enriches the type information available. Of course, that enrichment
> happens at compile time and occurs via type inference. For example,
> assuming two tables T1 and T2, with columns T1col and T2col, this
> expression:
>
>  simpleQuery = do
>     t1 <- table T1
>     t2 <- table T2
>     restrict ( .. some expression ...)
>     project (t1 ! T1col1 # t2 ! T2col1)
>
> Gives a type similar to Query (RecCons T1Col1 Int (RecCons T2 Col2 Int
> RecNil)). RecCons and RecNil are types that allow a list to be built
> at the type level. The list carries the column names and types in the
> resulting projection. The type information is used  to ensure queries
> only refer to columns that exist, datatypes are compared sensibly,
> etc. If in your case you can build that kind of structure based purely
> on the "input language" operators/terms, then it seems you could build
> the entire "output expression" in one shot. But again, if input and
> output have to be interleaved then I think you are stuck.

What you describe is very similar to my current implementation, except  
I'm using a continuation function to receive the results input from  
the remote server rather than creating a list. The continuation's type  
(a curried function) is analogous to a list at the type level, if I  
understand correctly. However, since what I would like to do is stream  
the results in from one server and out to another (translating as I  
go), I really need some way to hook my translation in between each  
primitive field read from the input. So inside the body of the  
'project' call you've given above, I might like to say something like:

   v1 <- t1 ! T1col1;
   v2 <- t2 ! T2col1;
   send client (trans v1 v2);
   return ()

However, since this input-receiving expression is now expressed as  
monad, it doesn't seem possible to use it to simultaneously formulate  
the output expressions that must be sent to the remote server to cause  
T1col1 and T2col1 to be requested.

In my particular application, the remote server not only handles  
simple directives to retrieve sequences of primitives like T1col1,  
T2col1, but also higher-order formulations of them as lists and  
tuples. So the server may be sent a request like "(T1col1, (list  
(T2col1, T2col2)))" meaning "request the tuple of T1col1 followed by a  
(possibly long) sequence of T2col1, T2col2 pairs" and I want to be  
able to translate the incoming result stream without holding the  
entire thing in memory.

Obviously I can do this by having one operation to generate the  
outbound query from a target language expression (implemented with an  
algebraic datatype), and another operation to handle the results based  
on the same expression, but it seems like some sort of formulation  
should be possible that allows both operations to be expressed  
simultaneously. In fact, I get that with the continuation-oriented  
combinators I mentioned earlier, but in practice I've found them to be  
very hard to work with. Just to make it more concrete, the above  
example might be expressed with my continuation-based combinators as:

T1col1 ==> (\ k t1 -> send client (trans1 t1); k) ++
list (T2col1 ++ T2col2
        ==> (\ k t2c1 t2c2 -> send client (trans2 t2c1 t2c2); k))) + 
+ ...

Here, the first function receives the value read from T1col1,  
translates it and send it to the originator of the request. This  
function is substituted for the an overall continuation -- receives  
the overall continuation as its first argument, k, and returns it at  
the end without passing any values to it. This effectively consumes  
the value t1. The second function is similar, and is called for each  
pair received from the list. The combinator '++', primitive directives  
like T1col1, and higher-order directives like 'list' are responsible  
for dealing with the overall expression grammar (i/o of delimiters,  
etc) and folding the overall continuation through the results as they  
are received.

Sorry for being so long-winded here... I really appreciate your input.

Warren