[Haskell-cafe] Sliding windows for streaming

David Feuer david.feuer at gmail.com
Sun Jun 7 01:28:29 UTC 2020


Unfortunately, much of that paper goes way over my head.

On Sat, Jun 6, 2020, 8:07 PM Gershom B <gershomb at gmail.com> wrote:

> On the specific question, the most general and natural thing to do, though
> not necessarily the most efficient, is to let users keep the length of
> items in each chunk as part of the semigroup.
>
> More broadly, not a direct answer, but you may be interested in section 13
> (on windowed algorithms) of http://gbaz.github.io/slides/buildable2014.pdf
>
> The “parallelogram” algorithm there is quite general.
>
> The paper also covers other interesting streaming algorithms worth having
> around such as (generalized) maximum segment sum, and parallel prefix scan.
>
> - <http://gbaz.github.io/slides/buildable2014.pdf>g
> On Jun 6, 2020, 5:33 PM -0400, David Feuer <david.feuer at gmail.com>, wrote:
>
> I'm looking for a bit of help with a library design choice.
>
> The streaming package currently offers a slidingWindow function
> converting a stream into a stream of fixed-size windows of that
> stream[1]:
>
> slidingWindow
> :: Monad m
> => Int -- Window size
> -> Stream (Of a) m b
> -> Stream (Of (Seq a)) m b
>
> This is based directly on a similar function in conduit. Using a rough
> translation into the world of lists, we have
>
> slidingWindow 3 "abcdef" = ["abc","bcd","cde","def"]
>
> The awkward case where the stream is shorter than the window is
> handled by potentially producing a short sequence at the end:
>
> slidingWindow 3 "ab" = ["ab"]
> slidingWindow 3 "" = [""]
>
> I recently merged a pull request that adds variations on sliding
> window maxima and minima using what's apparently a "folklore"
> algorithm. For example
>
> slidingWindowMax 3 "abcbab" = "abcccb"
>
> This is basically like
>
> slidingWindowMax k = map maximum . slidingWindow k
>
> except that an empty stream doesn't yield anything, to avoid undefined
> values.
>
> The big advantage of these specialized functions is that rather than
> having to take a maximum over a sequence of length `k` at each step,
> they only do a constant (amortized) amount of work at each step. Nice!
> But not very general. Suppose we want to take a moving average of some
> sort, like an arithmetic mean, geometric mean, harmonic mean, or
> median? That thought leads quite naturally to a data structure: a
> queue holding elements of some arbitrary *semigroup* that efficiently
> keeps track of the sum of all the elements in the queue[2].
>
> While the choice of *data structure* is moderately obvious, the choice
> of *sliding window function* is less so. The tricky bit is, again,
> what happens when the stream is too short for the window. If you work
> in the Sum semigroup and divide the results by the window size to get
> a moving average, then a too-short stream will give a (single) result
> that's completely wrong! Oof. What would be the most useful way to
> deal with this? The streams in `streaming` give us the option of
> producing a distinguished "return" value that comes after all the
> yields. Would it make sense to *return* the incomplete sum, and the
> number of elements that went into it, instead of *yielding* it into
> the result stream? That seems flexible, but maybe a tad annoying. What
> do y'all think?
>
> [1]
> https://hackage.haskell.org/package/streaming-0.2.3.0/docs/Streaming-Prelude.html#v:slidingWindow
>
> [2] See the AnnotatedQueue in
> https://github.com/haskell-streaming/streaming/pull/99/files which
> basically modifies Okasaki's implicit queues using some of the basic
> ideas that appear in Hinze-Paterson 2–3 trees.
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20200606/435036ec/attachment.html>


More information about the Haskell-Cafe mailing list