[Haskell-cafe] Sliding windows for streaming
Gershom B
gershomb at gmail.com
Sun Jun 7 00:06:54 UTC 2020
On the specific question, the most general and natural thing to do, though not necessarily the most efficient, is to let users keep the length of items in each chunk as part of the semigroup.
More broadly, not a direct answer, but you may be interested in section 13 (on windowed algorithms) of http://gbaz.github.io/slides/buildable2014.pdf
The “parallelogram” algorithm there is quite general.
The paper also covers other interesting streaming algorithms worth having around such as (generalized) maximum segment sum, and parallel prefix scan.
-g
On Jun 6, 2020, 5:33 PM -0400, David Feuer <david.feuer at gmail.com>, wrote:
> I'm looking for a bit of help with a library design choice.
>
> The streaming package currently offers a slidingWindow function
> converting a stream into a stream of fixed-size windows of that
> stream[1]:
>
> slidingWindow
> :: Monad m
> => Int -- Window size
> -> Stream (Of a) m b
> -> Stream (Of (Seq a)) m b
>
> This is based directly on a similar function in conduit. Using a rough
> translation into the world of lists, we have
>
> slidingWindow 3 "abcdef" = ["abc","bcd","cde","def"]
>
> The awkward case where the stream is shorter than the window is
> handled by potentially producing a short sequence at the end:
>
> slidingWindow 3 "ab" = ["ab"]
> slidingWindow 3 "" = [""]
>
> I recently merged a pull request that adds variations on sliding
> window maxima and minima using what's apparently a "folklore"
> algorithm. For example
>
> slidingWindowMax 3 "abcbab" = "abcccb"
>
> This is basically like
>
> slidingWindowMax k = map maximum . slidingWindow k
>
> except that an empty stream doesn't yield anything, to avoid undefined values.
>
> The big advantage of these specialized functions is that rather than
> having to take a maximum over a sequence of length `k` at each step,
> they only do a constant (amortized) amount of work at each step. Nice!
> But not very general. Suppose we want to take a moving average of some
> sort, like an arithmetic mean, geometric mean, harmonic mean, or
> median? That thought leads quite naturally to a data structure: a
> queue holding elements of some arbitrary *semigroup* that efficiently
> keeps track of the sum of all the elements in the queue[2].
>
> While the choice of *data structure* is moderately obvious, the choice
> of *sliding window function* is less so. The tricky bit is, again,
> what happens when the stream is too short for the window. If you work
> in the Sum semigroup and divide the results by the window size to get
> a moving average, then a too-short stream will give a (single) result
> that's completely wrong! Oof. What would be the most useful way to
> deal with this? The streams in `streaming` give us the option of
> producing a distinguished "return" value that comes after all the
> yields. Would it make sense to *return* the incomplete sum, and the
> number of elements that went into it, instead of *yielding* it into
> the result stream? That seems flexible, but maybe a tad annoying. What
> do y'all think?
>
> [1] https://hackage.haskell.org/package/streaming-0.2.3.0/docs/Streaming-Prelude.html#v:slidingWindow
>
> [2] See the AnnotatedQueue in
> https://github.com/haskell-streaming/streaming/pull/99/files which
> basically modifies Okasaki's implicit queues using some of the basic
> ideas that appear in Hinze-Paterson 2–3 trees.
> _______________________________________________
> Libraries mailing list
> Libraries at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/libraries
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20200606/357c04d3/attachment.html>
More information about the Haskell-Cafe
mailing list