[Haskell-cafe] A "less" complex median

Tue May 28 16:54:17 UTC 2019

Hello MarLinn,

Thanks for the thorough analysis!
Indeed the code is more readable in your refactoring!
And yes the bug you found needs addressing. I'll get on that :p
I will look into your suggestions as well as the OpenCV solutions to my
problem.
Thanks again, this was very helpful :)

Cheers,
Michel

On Tue, May 28, 2019 at 4:55 PM MarLinn <monkleyon at gmail.com> wrote:

>
> The code works, at least in the simple cases I've tried.
> It seems, however, way too complex and not very composable.
> So I was wondering if anyone could tell me where I went wrong, and how I
> can make the code simpler and smaller.
>
> Of course, it is completely possible to divide my insertion function, but
> the goal is to do all of the steps in one iteration over the list. This
> allows me to later compare what I have with divided functions to discover
> if the compiler would optimize them into one traversal.
>
> But even with its current logic, I believe the code can be better written.
> I just don't know how :p
>
> So I really appreciate any advice on how to make it better (or fix a bug I
> missed, or maybe even the logic of the whole algorithm).
>
> Hi Michael,
>
> first of all, let's see what makes your code look so complex. Part of that
> is not inherent complexity, it's a lack of documentation. And by that I
> don't mean a lack of comments, but rather that the code is not
> self-documenting. Names are too abstract and/or too short. For example the
> name "getTo" means almost nothing on its own. You could also use records
> and types for this kind of documentation. The function-arguments of "
> goThrough", "flt" and "cmp", may be extension points in the future (see
> below), but right now they make the code more complex than necessary as
> well. Then there's monadic syntax for what is essentially pure computation
> and a few other bits and bobs.
>
> All this is not bad for some quick and dirty experiment, but changing that
> is a great first step for a clean-up.
>
> So here's one proposal how you could clean up the core part of the code.
> Everything is (almost) just renamed and reordered parts of the original,
> with obvious environmental stuff left out as an exercise to the reader. I
> also use -XUnicodeSyntax, but that's just my personal preference.
>     data MedianFilter a = MF
>         { kernelSize    ∷ !KernelSize
>         , srcPosition   ∷ !Position
>         , currentKernel ∷ ![(Position, a)]
>         }
>       deriving ( Show )
>
>     stepMedianFilterS ∷ Ord a ⇒ a → State (MedianFilter a) a
>     stepMedianFilterS = state . stepMedianFilter
>
>     stepMedianFilter ∷ Ord v ⇒ v → MedianFilter v → (Median v,MedianFilter
> v)
>     stepMedianFilter _   MF{..} | kernelSize < 1 || srcPosition < 0
>                                 = error "stepMedianFilter: bad argument"
>     stepMedianFilter v s at MF{..} = (median , s{ srcPosition = srcPosition
> + 1, currentKernel = newKernel } )
>       where
>         (newKernel, median) = updateKernelAndGetMedian medianIdx
> currentKernel
>
>         -- medianIdx = kernelSize `div` 2, unless we're at the start of
> the stream
>         medianIdx           = min srcPosition kernelSize `div` 2
>         isTooOld x          = not $ validAt kernelSize srcPosition x
>         v'                  = (srcPosition, v)
>
>              -- kernel is empty: create new kernel
>         updateKernelAndGetMedian _                 []    = ([v'], v)
>              -- clean up old values
>         updateKernelAndGetMedian stepsToMedian   (x:xs)  | isTooOld x =
> updateKernelAndGetMedian stepsToMedian xs
>         updateKernelAndGetMedian stepsToMedian k@(x:xs)  = case (v < snd
> x, stepsToMedian) of
>              -- the new value is also the median
>             (True,0) → (v':k , v)
>              -- the value belongs here, but the median has not been found
> yet
>             (True,_) → (v':k , snd $ nthOrLast (stepsToMedian-1) k)
>              -- the median is here, but the value must still be inserted
>             (_   ,0) → (x :ys, snd x)
>              -- neither median nor position have been found yet
>             (_   ,_) → (x :ys, y)
>           where
>             (ys, y) = updateKernelAndGetMedian (stepsToMedian-1) xs
>
> Of course this does little in the way of making the code more composable
> or simpler from a purely logical perspective. Nesting the functions does
> help in reducing the number of arguments that have to be passed around
> though. And without getting here I would not have been able to think about
> the code in any meaningful way as well.
>
> Now let's look at a few other properties of the code:
>
> There are few avenues that lead to dead ends, I believe.
>
>    - Can the core function "goThrough" (my "updateKernelAndGetMedian") be
>    replaced by a primitive like a fold, a scan, or similar? I don't think so.
>    - Can the kernel be represented with a set, a priority queue, or any
>    other type of tree? Yes, but the cleanup of old values would be even
>    harder. Not worth the effort, unless you plan on using huge kernels.
>
> Bugs: I think I found one. The "goThrough"/"updateKernelAndGetMedian"-function
> does three things at once: inserting the new value into the sorted list,
> cleaning up old values, and searching for the current median. But if the
> place to insert is found before the median is found, you drop the cleaning.
> Which means if there is an old, invalid value between the position of the
> new value and the true median, you should not be computing the right
> median. (Unless it's directly behind the position of the new value.) One
> example is the sequence [3,6,5,4,2,1]. With a kernel size of 5, your
> algorithm computes a median of 3 for the 6th position (the 1), where it
> seems you would want a 4.
>
> On to composability. I agree that there's tension between efficiency and
> extendability, mostly inside "goThrough"/"updateKernelAndGetMedian" In
> particular, I can think of three important ways you might want to reuse
> parts of this algorithm.
>
>    1.
>
>    You might want to have a generic "iterate over a stream with a
>    kernel-based function"-function. This way you could implement things like a
>    Gaussian blur or edge detection. To get there you would have to replace
>    your "cmp" and "idx==0" (my "v < snd x" and "stepsToMedian") with more
>    complex functions and to replace "getTo" (my "nthOrLast") with another
>    recursive call or similar. (This part is probably also the way to correct
>    the bug mentioned above.) So your "flt" and "cmp" where not a bad
>    idea, but not quite there either.
>    2.
>
>    You might want to handle start and end of the stream in more different
>    ways. You could "extend" the first/last value, mirror or extrapolate the
>    bordering values, fill with a constant, append secondary streams, ignore
>    border values, … Right now both ends are handled differently. Also, both
>    parts are hard coded into "goThrough"/"updateKernelAndGetMedian" (and "
>    ret"/"medianIdx"). Again, making it more complex. So pulling this
>    handling out might also make the code simpler. The best way I can see to
>    solve this is to detect start and end outside this function and direct all
>    three to different, exchangeable functions – which, in turn, will probably
>    interact with a simplified version of "goThrough"/"
>    updateKernelAndGetMedian" again. You could also assume that some other
>    function prepends/appends meaningful extensions to both ends and also
>    cleans up the ends afterwards. The same way you cut the end right now, but
>    expect some outside user to clean up the start. But then you should default
>    to squashing the kernel at both ends.
>    3.
>
>    You might want to create a pipeline of different filters for every
>    point of the stream. This gets easier if you separate the state from its
>    monad like I did, because now you can compose states before putting them
>    inside that monad. It also helps write nicer-looking functions – at least
>    in my opinion.
>
> Lastly, most of these extensibility problems should have been solved in
> OpenCV, so you could search for inspiration there.
>
> Just a few idea. Hope they help!
>
> Cheers,
> MarLinn
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20190528/17fb491e/attachment.html>