[Haskell-cafe] A "less" complex median
Michel Haber
michelhaber1994 at gmail.com
Tue May 28 16:54:17 UTC 2019
Hello MarLinn,
Thanks for the thorough analysis!
Indeed the code is more readable in your refactoring!
And yes the bug you found needs addressing. I'll get on that :p
I will look into your suggestions as well as the OpenCV solutions to my
problem.
Thanks again, this was very helpful :)
Cheers,
Michel
On Tue, May 28, 2019 at 4:55 PM MarLinn <monkleyon at gmail.com> wrote:
>
> The code works, at least in the simple cases I've tried.
> It seems, however, way too complex and not very composable.
> So I was wondering if anyone could tell me where I went wrong, and how I
> can make the code simpler and smaller.
>
> Of course, it is completely possible to divide my insertion function, but
> the goal is to do all of the steps in one iteration over the list. This
> allows me to later compare what I have with divided functions to discover
> if the compiler would optimize them into one traversal.
>
> But even with its current logic, I believe the code can be better written.
> I just don't know how :p
>
> So I really appreciate any advice on how to make it better (or fix a bug I
> missed, or maybe even the logic of the whole algorithm).
>
> Hi Michael,
>
> first of all, let's see what makes your code look so complex. Part of that
> is not inherent complexity, it's a lack of documentation. And by that I
> don't mean a lack of comments, but rather that the code is not
> self-documenting. Names are too abstract and/or too short. For example the
> name "getTo" means almost nothing on its own. You could also use records
> and types for this kind of documentation. The function-arguments of "
> goThrough", "flt" and "cmp", may be extension points in the future (see
> below), but right now they make the code more complex than necessary as
> well. Then there's monadic syntax for what is essentially pure computation
> and a few other bits and bobs.
>
> All this is not bad for some quick and dirty experiment, but changing that
> is a great first step for a clean-up.
>
> So here's one proposal how you could clean up the core part of the code.
> Everything is (almost) just renamed and reordered parts of the original,
> with obvious environmental stuff left out as an exercise to the reader. I
> also use -XUnicodeSyntax, but that's just my personal preference.
> data MedianFilter a = MF
> { kernelSize ∷ !KernelSize
> , srcPosition ∷ !Position
> , currentKernel ∷ ![(Position, a)]
> }
> deriving ( Show )
>
> stepMedianFilterS ∷ Ord a ⇒ a → State (MedianFilter a) a
> stepMedianFilterS = state . stepMedianFilter
>
> stepMedianFilter ∷ Ord v ⇒ v → MedianFilter v → (Median v,MedianFilter
> v)
> stepMedianFilter _ MF{..} | kernelSize < 1 || srcPosition < 0
> = error "stepMedianFilter: bad argument"
> stepMedianFilter v s at MF{..} = (median , s{ srcPosition = srcPosition
> + 1, currentKernel = newKernel } )
> where
> (newKernel, median) = updateKernelAndGetMedian medianIdx
> currentKernel
>
> -- medianIdx = kernelSize `div` 2, unless we're at the start of
> the stream
> medianIdx = min srcPosition kernelSize `div` 2
> isTooOld x = not $ validAt kernelSize srcPosition x
> v' = (srcPosition, v)
>
> -- kernel is empty: create new kernel
> updateKernelAndGetMedian _ [] = ([v'], v)
> -- clean up old values
> updateKernelAndGetMedian stepsToMedian (x:xs) | isTooOld x =
> updateKernelAndGetMedian stepsToMedian xs
> updateKernelAndGetMedian stepsToMedian k@(x:xs) = case (v < snd
> x, stepsToMedian) of
> -- the new value is also the median
> (True,0) → (v':k , v)
> -- the value belongs here, but the median has not been found
> yet
> (True,_) → (v':k , snd $ nthOrLast (stepsToMedian-1) k)
> -- the median is here, but the value must still be inserted
> (_ ,0) → (x :ys, snd x)
> -- neither median nor position have been found yet
> (_ ,_) → (x :ys, y)
> where
> (ys, y) = updateKernelAndGetMedian (stepsToMedian-1) xs
>
> Of course this does little in the way of making the code more composable
> or simpler from a purely logical perspective. Nesting the functions does
> help in reducing the number of arguments that have to be passed around
> though. And without getting here I would not have been able to think about
> the code in any meaningful way as well.
>
> Now let's look at a few other properties of the code:
>
> There are few avenues that lead to dead ends, I believe.
>
> - Can the core function "goThrough" (my "updateKernelAndGetMedian") be
> replaced by a primitive like a fold, a scan, or similar? I don't think so.
> - Can the kernel be represented with a set, a priority queue, or any
> other type of tree? Yes, but the cleanup of old values would be even
> harder. Not worth the effort, unless you plan on using huge kernels.
>
> Bugs: I think I found one. The "goThrough"/"updateKernelAndGetMedian"-function
> does three things at once: inserting the new value into the sorted list,
> cleaning up old values, and searching for the current median. But if the
> place to insert is found before the median is found, you drop the cleaning.
> Which means if there is an old, invalid value between the position of the
> new value and the true median, you should not be computing the right
> median. (Unless it's directly behind the position of the new value.) One
> example is the sequence [3,6,5,4,2,1]. With a kernel size of 5, your
> algorithm computes a median of 3 for the 6th position (the 1), where it
> seems you would want a 4.
>
> On to composability. I agree that there's tension between efficiency and
> extendability, mostly inside "goThrough"/"updateKernelAndGetMedian" In
> particular, I can think of three important ways you might want to reuse
> parts of this algorithm.
>
> 1.
>
> You might want to have a generic "iterate over a stream with a
> kernel-based function"-function. This way you could implement things like a
> Gaussian blur or edge detection. To get there you would have to replace
> your "cmp" and "idx==0" (my "v < snd x" and "stepsToMedian") with more
> complex functions and to replace "getTo" (my "nthOrLast") with another
> recursive call or similar. (This part is probably also the way to correct
> the bug mentioned above.) So your "flt" and "cmp" where not a bad
> idea, but not quite there either.
> 2.
>
> You might want to handle start and end of the stream in more different
> ways. You could "extend" the first/last value, mirror or extrapolate the
> bordering values, fill with a constant, append secondary streams, ignore
> border values, … Right now both ends are handled differently. Also, both
> parts are hard coded into "goThrough"/"updateKernelAndGetMedian" (and "
> ret"/"medianIdx"). Again, making it more complex. So pulling this
> handling out might also make the code simpler. The best way I can see to
> solve this is to detect start and end outside this function and direct all
> three to different, exchangeable functions – which, in turn, will probably
> interact with a simplified version of "goThrough"/"
> updateKernelAndGetMedian" again. You could also assume that some other
> function prepends/appends meaningful extensions to both ends and also
> cleans up the ends afterwards. The same way you cut the end right now, but
> expect some outside user to clean up the start. But then you should default
> to squashing the kernel at both ends.
> 3.
>
> You might want to create a pipeline of different filters for every
> point of the stream. This gets easier if you separate the state from its
> monad like I did, because now you can compose states before putting them
> inside that monad. It also helps write nicer-looking functions – at least
> in my opinion.
>
> Lastly, most of these extensibility problems should have been solved in
> OpenCV, so you could search for inspiration there.
>
> Just a few idea. Hope they help!
>
> Cheers,
> MarLinn
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20190528/17fb491e/attachment.html>
More information about the Haskell-Cafe
mailing list