[Haskell-cafe] Reifying case expressions [was: Re: On stream processing, and a new release of timeplot coming]

Gábor Lehel illissius at gmail.com
Sun Dec 25 23:50:10 CET 2011


On Sun, Dec 25, 2011 at 9:19 PM, Eugene Kirpichov <ekirpichov at gmail.com> wrote:
> Hello Heinrich,
>
> Thanks, that's sure some food for thought!
>
> A few notes:
> * This is indeed analogous to Iteratees. I tried doing the same with
> Iteratees but failed, so I decided to put together something simple of my
> own.
> * The Applicative structure over this stuff is very nice. I was thinking,
> what structure to put on - and Applicative seems the perfect fit. It's also
> possible to implement Arrows - but I once tried and failed; however, I was
> trying that for a more complex stream transformer datatype (a hybrid of
> Iteratee and Enumerator).
> * StreamSummary is trivially a bifunctor. I actually wanted to make it an
> instance of Bifunctor, but it was in the category-extras package and I
> hesitated to reference this giant just for this purpose :) Probably
> bifunctors should be in prelude.

Edward Kmett has been splitting that up into a variety of smaller
packages, for instance:

http://hackage.haskell.org/package/bifunctors

> * Whereas StreamSummary a r abstracts deconstruction of lists, the dual
> datatype (StreamSummary a r ->) abstracts construction; however I just now
> (after looking at your first definition of length) understood that it is
> trivially isomorphic to the regular list datatype - you just need to be
> non-strict in the state - listify :: ListTo a [a] = CaseOf [] (\x -> fmap
> (x:) listify). So you don't need functions of the form (forall r . ListTo a
> r -> ListTo b r) - you just need (ListTo b [a]). This is a revelation for
> me.
>
> On Sun, Dec 25, 2011 at 2:25 PM, Heinrich Apfelmus
> <apfelmus at quantentunnel.de> wrote:
>>
>> Eugene Kirpichov wrote:
>>>
>>> In the last couple of days I completed my quest of making my graphing
>>> utility timeplot ( http://jkff.info/software/timeplotters ) not load the
>>> whole input dataset into memory and consequently be able to deal with
>>> datasets of any size, provided however that the amount of data to *draw*
>>> is
>>> not so large. On the go it also got a huge speedup - previously
>>> visualizing
>>> a cluster activity dataset with a million events took around 15 minutes
>>> and
>>> a gig of memory, now it takes 20 seconds and 6 Mb max residence.
>>> (I haven't yet uploaded to hackage as I have to give it a bit more
>>> testing)
>>>
>>> The refactoring involved a number of interesting programming patterns
>>> that
>>> I'd like to share with you and ask for feedback - perhaps something can
>>> be
>>> simplified.
>>>
>>> The source is at http://github.com/jkff/timeplot
>>>
>>> The datatype of incremental computations is at
>>>
>>> https://github.com/jkff/timeplot/blob/master/Tools/TimePlot/Incremental.hs .
>>> Strictness is extremely important here - the last memory leak I
>>> eliminated
>>> was lack of bang patterns in teeSummary.
>>
>>
>> Your  StreamSummary  type has a really nice interpretation: it's a
>> reification of  case  expressions.
>>
>> For instance, consider the following simple function from lists to
>> integers
>>
>>    length :: [a] -> Int
>>    length xs = case xs of
>>        []     -> 0
>>        (y:ys) -> 1 + length ys
>>
>> We want to reify the case expression as constructor of a data type. What
>> type should it have? Well, a case expression maps a list  xs  to a result,
>> here of type Int, via two cases: the first case gives a result and the other
>> maps a value of type  a  to a function from lists to results again. This
>> explanation was probably confusing, so I'll just go ahead and define a data
>> type that represents functions from lists  [a] to some result of type  r
>>
>>    data ListTo a r = CaseOf r (a -> ListTo a r)
>>
>>    interpret :: ListTo a r -> ([a] -> r)
>>    interpret (CaseOf nil cons) xs =
>>        case xs of
>>            []     -> nil
>>            (y:ys) -> interpret (cons y) ys
>>
>> As you can see, we are just mapping each  CaseOf  constructor back to a
>> built-in case expression.
>>
>> Likewise, each function from lists can be represented in terms of our new
>> data type: simply replace all built-in case expressions with the new
>> constructor
>>
>>    length' :: ListTo a Int
>>    length' = CaseOf
>>        (0)
>>        (\x -> fmap (1+) length')
>>
>>    length = interpret length'
>>
>> The CaseOf may look a bit weird, but it's really just a straightforward
>> translation of the case expression you would use to define the function  go
>>  instead.
>>
>> Ok, this length function is really inefficient because it builds a huge
>> expression of the form  (1+(1+...)). Let's implement a strict variant
>> instead
>>
>>    lengthL :: ListTo a Int
>>    lengthL = go 0
>>        where
>>        go !n = CaseOf (n) (\x -> go (n+1))
>>
>> While we're at it, let's translate two more list functions
>>
>>    foldL' :: (b -> a -> b) -> b -> ListTo a b
>>    foldL' f b = Case b (\a -> foldL' f $! f b a)
>>
>>    sumL    :: ListTo Int Int
>>    sumL    = foldL' (\b a -> a+b) 0
>>
>>
>> And now we can go for the point of this message: unlike ordinary functions
>> from lists, we can compose these in lock-step! In particular, the following
>> applicative instance
>>
>>    instance Applicative (ListTo a) where
>>        pure b = CaseOf b (const $ pure b)
>>        (CaseOf f fs) <*> (CaseOf x xs) =
>>            CaseOf (f x) (\a -> fs a <*> xs a)
>>
>> allows us to write a function
>>
>>    average :: ListTo Int Double
>>    average = divide <$> sumL <*> lengthL
>>        where
>>        divide a b = fromIntegral a / fromIntegral b
>>
>> that runs in constant space! Why does this work? Well, since we can now
>> inspect case expressions, we can choose to evaluate them in lock-step,
>> essentially computing  sum  and  length  with just one pass over the input
>> list. Remember that the original definition
>>
>>    average xs = sum xs / length xs
>>
>> has a space leak because the input list xs is being shared.
>>
>>
>> Remarks:
>> 1. Reified case expressions are, of course, the same thing as Iteratees,
>> modulo chunking and weird naming.
>>
>> 2. My point is topped by scathing irony: if Haskell had a form of *partial
>> evaluation*, we could write applicative combinators for *ordinary* functions
>>  [a] -> r  and express  average  in constant space.  In other words, partial
>> evaluation would make it unnecessary to reify case expressions for the
>> purpose of controlling performance / space leaks.
>>
>>
>> Best regards,
>> Heinrich Apfelmus
>>
>> --
>> http://apfelmus.nfshost.com
>>
>>
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
>
>
> --
> Eugene Kirpichov
> Principal Engineer, Mirantis Inc. http://www.mirantis.com/
> Editor, http://fprog.ru/
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



-- 
Work is punishment for failing to procrastinate effectively.



More information about the Haskell-Cafe mailing list