<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <blockquote type="cite"

cite="mid:CAEYibMdWfB7oFQ5EGLfyR8V2AP+1Nr4kp6VNvpX-AadDsAJKqg@mail.gmail.com">

      <div dir="ltr">The code works, at least in the simple cases I've

        tried.

        <div>It seems, however, way too complex and not very composable.</div>

        <div>So I was wondering if anyone could tell me where I went

          wrong, and how I can make the code simpler and smaller.</div>

        <div><br>

        </div>

        <div>Of course, it is completely possible to divide my insertion

          function, but the goal is to do all of the steps in one

          iteration over the list. This allows me to later compare what

          I have with divided functions to discover if the compiler

          would optimize them into one traversal.</div>

        <div><br>

        </div>

        <div>But even with its current logic, I believe the code can be

          better written. I just don't know how :p</div>

        <div><br>

        </div>

        <div>So I really appreciate any advice on how to make it better

          (or fix a bug I missed, or maybe even the logic of the whole

          algorithm).</div>

      </div>

      <br>

    </blockquote>

    <p>Hi Michael,<br>

      <br>

      first of all, let's see what makes your code look so complex. Part

      of that is not inherent complexity, it's a lack of documentation.

      And by that I don't mean a lack of comments, but rather that the

      code is not self-documenting. Names are too abstract and/or too

      short. For example the name "<tt>getTo</tt>" means almost nothing

      on its own. You could also use records and types for this kind of

      documentation. The function-arguments of "<tt>goThrough</tt>", "<tt>flt</tt>"

      and "<tt>cmp</tt>", may be extension points in the future (see

      below), but right now they make the code more complex than

      necessary as well. Then there's monadic syntax for what is

      essentially pure computation and a few other bits and bobs.</p>

    <p>All this is not bad for some quick and dirty experiment, but

      changing that is a great first step for a clean-up.</p>

    <p>So here's one proposal how you could clean up the core part of

      the code. Everything is (almost) just renamed and reordered parts

      of the original, with obvious environmental stuff left out as an

      exercise to the reader. I also use -XUnicodeSyntax, but that's

      just my personal preference.<br>

    </p>

    <tt>    data MedianFilter a = MF<br>

              { kernelSize    ∷ !KernelSize<br>

              , srcPosition   ∷ !Position<br>

              , currentKernel ∷ ![(Position, a)]<br>

              }<br>

            deriving ( Show )<br>

      <br>

          stepMedianFilterS ∷ Ord a ⇒ a → State (MedianFilter a) a<br>

          stepMedianFilterS = state . stepMedianFilter<br>

      <br>

          stepMedianFilter ∷ Ord v ⇒ v → MedianFilter v → (Median

      v,MedianFilter v)<br>

          stepMedianFilter _   MF{..} | kernelSize < 1 || srcPosition

      < 0<br>

                                      = error "stepMedianFilter: bad

      argument"<br>

          stepMedianFilter v s@MF{..} = (median , s{ srcPosition =

      srcPosition + 1, currentKernel = newKernel } )<br>

            where<br>

              (newKernel, median) = updateKernelAndGetMedian medianIdx

      currentKernel<br>

      <br>

              -- medianIdx = kernelSize `div` 2, unless we're at the

      start of the stream<br>

              medianIdx           = min srcPosition kernelSize `div` 2<br>

              isTooOld x          = not $ validAt kernelSize srcPosition

      x<br>

              v'                  = (srcPosition, v)<br>

      <br>

                   -- kernel is empty: create new kernel<br>

              updateKernelAndGetMedian _                 []    = ([v'],

      v)<br>

                   -- clean up old values<br>

              updateKernelAndGetMedian stepsToMedian   (x:xs)  |

      isTooOld x = updateKernelAndGetMedian stepsToMedian xs<br>

              updateKernelAndGetMedian stepsToMedian k@(x:xs)  = case (v

      < snd x, stepsToMedian) of<br>

                   -- the new value is also the median<br>

                  (True,0) → (v':k , v)<br>

                   -- the value belongs here, but the median has not

      been found yet<br>

                  (True,_) → (v':k , snd $ nthOrLast (stepsToMedian-1)

      k)<br>

                   -- the median is here, but the value must still be

      inserted<br>

                  (_   ,0) → (x :ys, snd x)<br>

                   -- neither median nor position have been found yet<br>

                  (_   ,_) → (x :ys, y)<br>

                where<br>

                  (ys, y) = updateKernelAndGetMedian (stepsToMedian-1)

      xs</tt>

    <p>Of course this does little in the way of making the code more

      composable or simpler from a purely logical perspective. Nesting

      the functions does help in reducing the number of arguments that

      have to be passed around though. And without getting here I would

      not have been able to think about the code in any meaningful way

      as well.<br>

    </p>

    <p>Now let's look at a few other properties of the code:</p>

    <p>There are few avenues that lead to dead ends, I believe.</p>

    <ul>

      <li>Can the core function "<tt>goThrough</tt>" (my "<tt>updateKernelAndGetMedian</tt>")

        be replaced by a primitive like a fold, a scan, or similar? I

        don't think so.</li>

      <li>Can the kernel be represented with a set, a priority queue, or

        any other type of tree? Yes, but the cleanup of old values would

        be even harder. Not worth the effort, unless you plan on using

        huge kernels.</li>

    </ul>

    <p>Bugs: I think I found one. The "<tt>goThrough</tt>"/"<tt>updateKernelAndGetMedian</tt>"-function

      does three things at once: inserting the new value into the sorted

      list, cleaning up old values, and searching for the current

      median. But if the place to insert is found before the median is

      found, you drop the cleaning. Which means if there is an old,

      invalid value between the position of the new value and the true

      median, you should not be computing the right median. (Unless it's

      directly behind the position of the new value.) One example is the

      sequence <tt>[3,6,5,4,2,1]</tt>. With a kernel size of 5, your

      algorithm computes a median of 3 for the 6th position (the 1),

      where it seems you would want a 4.<br>

    </p>

    <p>On to composability. I agree that there's tension between

      efficiency and extendability, mostly inside "<tt>goThrough</tt>"/"<tt>updateKernelAndGetMedian</tt>"

      In particular, I can think of three important ways you might want

      to reuse parts of this algorithm.</p>

    <ol>

      <li>

        <p>You might want to have a generic "iterate over a stream with

          a kernel-based function"-function. This way you could

          implement things like a Gaussian blur or edge detection. To

          get there you would have to replace your "<tt>cmp</tt>" and "<tt>idx==0</tt>"

          (my "<tt>v < snd x</tt>" and "<tt>stepsToMedian</tt>") with

          more complex functions and to replace "<tt>getTo</tt>" (my "<tt>nthOrLast</tt>")

          with another recursive call or similar. (This part is probably

          also the way to correct the bug mentioned above.) So your "<tt>flt</tt>"

          and "<tt>cmp</tt>" where not a bad idea, but not quite there

          either.<br>

        </p>

      </li>

      <li>

        <p>You might want to handle start and end of the stream in more

          different ways. You could "extend" the first/last value,

          mirror or extrapolate the bordering values, fill with a

          constant, append secondary streams, ignore border values, …

          Right now both ends are handled differently. Also, both parts

          are hard coded into "<tt>goThrough</tt>"/"<tt>updateKernelAndGetMedian</tt>"

          (and "<tt>ret</tt>"/"<tt>medianIdx</tt>"). Again, making it

          more complex. So pulling this handling out might also make the

          code simpler. The best way I can see to solve this is to

          detect start and end outside this function and direct all

          three to different, exchangeable functions – which, in turn,

          will probably interact with a simplified version of "<tt>goThrough</tt>"/"<tt>updateKernelAndGetMedian</tt>"

          again. You could also assume that some other function

          prepends/appends meaningful extensions to both ends and also

          cleans up the ends afterwards. The same way you cut the end

          right now, but expect some outside user to clean up the start.

          But then you should default to squashing the kernel at both

          ends.<br>

        </p>

      </li>

      <li>

        <p>You might want to create a pipeline of different filters for

          every point of the stream. This gets easier if you separate

          the state from its monad like I did, because now you can

          compose states before putting them inside that monad. It also

          helps write nicer-looking functions – at least in my opinion.</p>

      </li>

    </ol>

    Lastly, most of these extensibility problems should have been solved

    in OpenCV, so you could search for inspiration there.<br>

    <p>Just a few idea. Hope they help!</p>

    <p>Cheers,<br>

      MarLinn<br>

    </p>

  </body>

</html>