[Haskell-cafe] Patterns for processing large but finite streams

Heinrich Apfelmus apfelmus at quantentunnel.de
Fri Jul 1 11:34:54 CEST 2011


Eugene Kirpichov wrote:
>>> Plain old lazy lists do not allow me to combine multiple concurrent
>>> computations, e.g. I cannot define average from sum and length.
>
> I meant the average of the whole list - given a sumS and lengthS ("S"
> for "Stream"), write meanS as something like liftS2 (/) sumS lengthS.
> 
> Or is that possible with lazy lists too?
> 
> (looks like arrows actually - which arrow is appropriate here?)

That's a very good point. Just to clarify for everyone: Eugene wants to 
write the function  average  almost *literally* as

    average xs = sum xs / length xs

but he wants the functions  sum  and  length  to fuse, so that the input 
stream  xs  is *not* shared as a whole.


I have thought about this problem for a while actually and have observed 
the following:

1) You are not looking for a representation of streams, but for a 
representation of *functions* on streams. The essence of a function on 
streams is its case analysis of the input. Hence, the simplest solution 
is to make the case analysis explicit:

    data StringTo a = CaseOf a (Char -> StringTo a)

    -- function on a stream (here: String)
    interpret :: StringTo a -> (String -> a)
    interpret (CaseOf nil cons) []     = nil
    interpret (CaseOf nil cons) (x:xs) = interpret (cons x) xs

    instance Applicative StringTo where
        pure a = CaseOf a (const $ pure a)
        (CaseOf nil1 cons1) <*> (CaseOf nil2 cons2) =
            CaseOf (nil1 $ nil2) (\c -> cons1 c <*> cons2 c)

    length = go 0 where go n = CaseOf n (\_ -> go $! n+1)

    average = liftA2 (/) sum length

In other words, if you reify  case .. of  expression , you will be able 
to fuse them.

2) If Haskell were to support some kind of evaluation under the lambda 
(partial evaluation, head normal form instead of weak head normal form), 
it would be unnecessary to make the case expressions implicit. Rather, 
the applicative instance could be written as follows

    instance Applicative ((->) String) where
        pure a  = const a
        f <*> x = \cs -> case cs of
           []     -> f [] $ x []
           (c:cs) ->
                let f' cs = f (c:cs) -- partial evaluation on this
                    x' cs = x (c:cs)
                in f' `partialseq` x' `partialseq` (f' <*> x') cs

We could simply write

     average = liftA2 (/) sum length

and everything would magically fuse.

3) John Hughes has already thought about this problem in his PhD thesis. 
:) (but it is not available for download on the internet, unfortunately. 
:( ). His solution was a SYNCHLIST primitive in conjunction with some 
sort of parallelism PAR. Basically, the SYNCHLIST primitive only allows 
simultaneous access to the input stream and the parallelism is used to 
make that simultaneity happen.


Best regards,
Heinrich Apfelmus

--
http://apfelmus.nfshost.com




More information about the Haskell-Cafe mailing list