[Haskell-cafe] Increasing parallelism in a frequency counter

Sat May 23 15:12:19 UTC 2015

Hi Andreas, all,

On 21/05/15 22:08, Andreas Pauley wrote:
> Hi everyone,
>
> I've started playing with some parallel programming in Haskell, mostly
> by applying the first few examples from Simon Marlow's book to my own
> toy problem:
> https://github.com/apauley/parallel-frequency

Nice.

> I'm quite pleased with the speedup I have so far, the fastest parallel
> implementation is just over twice as fast as its sequential
> counterpart.
>
> But looking at the threadscope images from the generated event logs I
> can see that there are still significant portions of my program that
> runs on only one core.
>
> Is anyone interested to see if we can increase the parallelism in this
> frequency counter?

If the problem is specified as taking the N most frequently occuring 
items in a list:

     f :: Ord a => Int -> [a] -> [(a, Int)]

then I cheated, I assume the input is pre-chunked into a sensible number 
of pieces:

     f' :: Ord a => Int -> [[a]] -> [(a, Int)]

with which it is much easier to get good parallel speedups.

Anyway, some general optimisation points, not all directly related to 
the parallel part of the problem:

0. profile to see what is taking time (and space, it increases GC time)
    you can use +RTS -h without having to recompile for profiling

1. lists are memory inefficient (eg: Map keys): replace String with Text

2. lists and Text are bad at random access (eg: splitting at an index)

3. use Data.Map.Strict to avoid building up (+) thunks

4. (take n . sort) is possibly taking a lot of sequential time (see 0)

5. perhaps try parallel reductions instead of linear folds:
    a * b * c * d * ... -> ((a * b) * (c * d)) * ...
    though the overhead didn't seem worth it in my tests

6. not optimisation related, but consider writing a defaultMain
    then each executable source can be reduce to
       import DefaultMain (defaultMain)
       import FrequencyImplementation (frequency)
       main :: IO ()
       main = defaultMain frequency
    you could even have all the implementations with the same module name
    but in different directories, and use hs-src-dirs: to select, then
    you would be able to use an identical Main module.

more on 2:

A bit of a low-level hack, this: I read the whole input file as a 
ByteString and split it into an appropriate number of chunks, taking 
care not to split UTF-8 sequences, also taking care not to split 
mid-word.  Each output chunk references the original input without 
copying the data.  Then I decodeUtf8 each ByteString chunk to a Text.

This reduced peak memory usage (as reported by +RTS -h graphed with 
hp2ps) from ~400MB of (:) to ~30MB of ARR_WORDS for the 11MB artamene.txt.

more on 4:

So you have a large Map k Int containing frequencies, and you want the 
largest n frequency counts.  There is an O(1) Map.splitRoot, which you 
can use to parallel divide and conquer - the largest n of the whole map 
are going to be among the (n * split count) combined largest n of each 
of the split maps.  This didn't give me much speed up at all, though.

I put my implementation at http://code.mathr.co.uk/word-histogram

Thanks,

Claude
-- 
http://mathr.co.uk