[Haskell-beginners] Optimising a combinatorial algorithm?

Thu Sep 24 21:18:25 UTC 2015

David McBride <toad3k at gmail.com> writes:

> I tried memoizing dur in some places, but it gets called with too many
> different arguments to make it worthwhile.

Likely with all possible combinations from the input :-)

> I tried change StateT to Strict.StateT.
> Last ditch effort, I tried making your data types strict, but that didn't
> help.
>
> The dur in your PartialVoice Duration instance gets called 3.5 million
> times.  I don't know if you are going to find a magic bullet that makes
> this faster without tweaking your algorithm.  I feel like their is a way to
> speed it up, maybe even exploit laziness to eliminate things early, but I'm
> just not familiar with this domain.  Sorry

lazyness is a good topic.  I am not sure how it actually affects this
algorithm.  I wonder if my use of 'filter' is the actual culprit, and if
I instead should try to rewrite the uses of 'traverse' to actually
try and not generate invalid possibilities in the first place.  Right
now, I am sort of hoping lazyness would work together with filter being
applied after all possibilities have been generated.

Thanks for your time and insights.

> On Thu, Sep 24, 2015 at 10:11 AM, Mario Lang <mlang at delysid.org> wrote:
>
>> Hi!
>>
>> TL;DR: My implementation looks elegant (to me), but is rather slow
>> compared to a well optimized C++ impl. of the same algorithm.  Runtime
>> differences are currently huge.  Looking for input on what I could
>> change to match C++ performance, without loosing too much expressiveness
>> of the code.
>>
>> I am writing software to deal with Braille music code.
>> As an exercise for me, and as an attempt to boil the algorithm down
>> to a more readable form, I am in the process of translating my C++
>> program[1] to Haskell.  So far, so good.  Given that I only came back to
>> Haskell (after 10 years of a break) roughly 2 weeks ago, I think I
>> have made quite nice progress.  However, now that I reached a stage
>> where I can actually do a performance comparison, the dreaded thing has
>> happened: The performance of my Haskell implementation is rather
>> horrible to what my C++ implementation can deliver.  While the code size
>> has gone down roughly by a factor of 10, runtime has gone up roughly by
>> the same factor...
>>
>> For the impatient, here is the code I am talking about:
>> https://github.com/mlang/hbmc/blob/master/Data/Braille/Music.hs
>>
>> The first half of the code is concerend with parsing braille.
>> Performance is not critical here, so I opted to use Parsec, mostly
>> because I am interested in good quality error messages.
>>
>> The fun starts with the definition of 'pvs'.
>>
>> The problem:
>> Braille music values are inherently ambiguous.  The sign for a whole
>> note can also mean a 16th note, a sign for a half note can also
>> mean a 32th note, the sign for a quarter note can also mean a 64th, and
>> a 8th note could also mean a 128th note.  This is histoical, and there
>> is no way around this.  The time signature (meter) is used to actually
>> tell which note is which.  This is relatively easy to do for an
>> experienced human, but quite complicated for a computer.
>> A measure (bar) of braille music can contain parallel *and* sequential
>> parts.  The data structure is basically:
>>
>> data Sign = ...
>> type PartialVoice = [Sign]
>> type PartialMeasure = [PartialVoice]
>> type Voice = [PartialMeasure]
>> type Measure = [Voice]
>>
>> As a constraint, all PartialVoices in a PartialMeasure need to have the
>> exact same duration.  Similarily, all Voices inside a Measure also need to
>> have the exact same duration.
>>
>> So 'pvs', 'pms', 'vs' and 'ms' in my Data.Braille.Music module are
>> concerned
>> with calculating all possible interpretations of the input.
>>
>> All in all, the current Haskell implementation is doing what it should.
>> However, profiling tells me I am allocating roughly 2GB of memory
>> to disambiguate a single measure of music in 3/2 time.
>> This takes roughly 1.2s on my rather fast CPU.
>>
>> The amount of allocation and the runtime are unacceptable.
>> If I leave it at that, I might as well ditch the code and terminate the
>> experiment.
>> As a comparison, my C++ impl. takes roughly 1s to disambiguate 32 measures
>> of
>> the same piece, while Haskell already takes 1.2s to disambiguate just
>> one of these measures.
>>
>> Profiling tells me I am spending 90% of all the time in 'dur', which is my
>> small helper method to calculate the duration of a PartialVoice,
>> PartialMeasure or Voice.
>>
>> The project is setup to support profiling:
>>
>> git clone https://github.com/mlang/hbmc
>> cd hbmc
>> cabal run profiling
>>
>> Do you see a way to significantly speed this algorithm up, without
>> loosing a lot of expressiveness/elegancy?
>>
>> Since I am back to Haskell since only 2 weeks, I am basically happy abut
>> every sort of hint/tip you can give me.  Stylistic, as well as
>> performance-related.
>>
>> For now, this is really just a toy project.  However, if I can manage to
>> improve the performance significantly without destroying the readability
>> of the code too much, I might be interested to finish translating my
>> current C++ work into a more-or-less complete Haskell library.
>> After all, BMC is more or less a compiler.
>> And Haskell is supposed to be very good at this stuff :-)
>>
>> Here is a link to the original C++ project:
>> [1] https://github.com/malng/bmc
>>
>> --
>> CYa,
>>   ⡍⠁⠗⠊⠕
>> _______________________________________________
>> Beginners mailing list
>> Beginners at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
>>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners

-- 
CYa,
  ⡍⠁⠗⠊⠕ | Debian Developer <URL:http://debian.org/>
  .''`. | Get my public key via finger mlang/key at db.debian.org
 : :' : | 1024D/7FC1A0854909BCCDBE6C102DDFFC022A6B113E44
 `. `'
   `-      <URL:http://delysid.org/>  <URL:http://www.staff.tugraz.at/mlang/>