[Haskell-beginners] Optimising a combinatorial algorithm?

Tue Sep 29 13:37:43 UTC 2015

Bernhard Herzog <bh at intevation.de> writes:

> On 24.09.2015, Mario Lang wrote:
>> TL;DR: My implementation looks elegant (to me), but is rather slow
>> compared to a well optimized C++ impl. of the same algorithm.  Runtime
>> differences are currently huge.  Looking for input on what I could
>> change to match C++ performance, without loosing too much expressiveness
>> of the code.
> [...]
>> Profiling tells me I am spending 90% of all the time in 'dur', which is my
>> small helper method to calculate the duration of a PartialVoice,
>> PartialMeasure or Voice.
>>
>> The project is setup to support profiling:
>>
>> git clone https://github.com/mlang/hbmc
>> cd hbmc
>> cabal run profiling
>
> Some performance tips:
>
> AFAICT, the main reason the dur methods are so slow is they traverses
> the lists much too often. An easy way to reduce that number is to cache
> the duration of lists of Sign values in PartialVoice by replacing the
> type alias with a new data type like this:
>
>   data PartialVoice = PV (Maybe Rational) [Sign]
>
> To always initialize the duration when constructing a PartialVoice, a
> helper function is useful:
>
>   mkPV signs = PV (sumDuration signs) signs
>
> Here sumDuration is defined in the same way as the current dur method
> for PartialVoice:
>
>   sumDuration :: Duration a => [a] -> Maybe Rational
>   sumDuration = foldl' (liftA2 (+)) (pure 0) . map dur
>
> The dur method for PartialVoice can then be implemented by simply
> accessing the cached value:
>
>   instance Duration PartialVoice where
>
>       dur (PV d _) = d
>
> Since that parameter of PV is lazy the actual sum is only computed when
> the value is needed and for any given PV value it is computed at most
> once.
>
> On my system, that change alone improved the running time by a factor of
> almost 10 according to the profiler output.

Confirmed (committed).  Thanks a lot!  This indeed brought a factor of 10 speedup.
I wonder why the attempts to memoize dur by other people did not help at
all.
After all, this is just another type of memoisation, isn't it?

> Some other changes that can improve the speed:
>
> Use Integer instead of Rational.
>
> This should be possible since all duration values appear to be integer
> multiples of 1/128, so you can just represent them by that factor and
> use a newtype wrapper for increased type safety.

I will eventually have to deal with musical tuplets, so it is not just
128th values. I see your point, but I am going to delay that
optimisation until I know I really need it.

> Parameterize Sign with the type of the duration.
>
> AFAICT the only reason to wrap the duration in a Maybe is that you
> cannot assign a duration during parsing, so the parsers always assign
> Nothing as duration. The pvs function will always assign a Just-value to
> each Sign, however. So after parsing, the Measure has Nothing for the
> duration in every Sign, but the Measures returned by ms always have Just
> values. You you could express this in the types by parameterizing Sign
> and the other types with the type of a duration. The parser would then
> create e.g. Sign () values and pvs would turn those into e.g. Sign
> Rational values.

That is a good idea I was already thinking about.
Actually, what I am trying to figure out is how to do closely related,
but different, datatypes in Haskell.  In C++, I am used to being able to
inherit from an existing class, just adding the new fields I need.  This
works without name clashes obviously because two classes can have the
same field names without clashing.  However, I am missing something like
this from Haskell.
What I actually need in the long run, is a separate data type
for every enhancement step in my post-processing.  I current have
AmbiguousSign and Sign.  However, I will need more types like this, with
every step, I'd like to add yet another field, which is going to be
computed by that step.
AFAIU the parametrisation trick you mention above works for a single field.
How would I go about progressively adding more and more fields to a base
data type?

> This improves type safety, makes the meaning clearer and simplifies the
> code because you don't need to lift operations into the Maybe or do case
> analyses.

I agree.  Doing away with the Maybe was a good thing.

> Also, if you want to ignore parts of a pattern match, use "_" as the
> pattern, not "otherwise". In the way you used it, it introduces a new
> binding in that branch of the case expression and shadows the definition
> from the Prelude. Idiomatic use of "otherwise" is as the condition on
> the catch-all case of a guard.

Oh, thanks for pointing that out, I didn't realize at all that otherwise
doesn't work like expected in "normal" case expressions.

-- 
CYa,
  ⡍⠁⠗⠊⠕