[Haskell-cafe] Increasing parallelism in a frequency counter
Claude Heiland-Allen
claude at mathr.co.uk
Sat May 23 15:12:19 UTC 2015
Hi Andreas, all,
On 21/05/15 22:08, Andreas Pauley wrote:
> Hi everyone,
>
> I've started playing with some parallel programming in Haskell, mostly
> by applying the first few examples from Simon Marlow's book to my own
> toy problem:
> https://github.com/apauley/parallel-frequency
Nice.
> I'm quite pleased with the speedup I have so far, the fastest parallel
> implementation is just over twice as fast as its sequential
> counterpart.
>
> But looking at the threadscope images from the generated event logs I
> can see that there are still significant portions of my program that
> runs on only one core.
>
> Is anyone interested to see if we can increase the parallelism in this
> frequency counter?
If the problem is specified as taking the N most frequently occuring
items in a list:
f :: Ord a => Int -> [a] -> [(a, Int)]
then I cheated, I assume the input is pre-chunked into a sensible number
of pieces:
f' :: Ord a => Int -> [[a]] -> [(a, Int)]
with which it is much easier to get good parallel speedups.
Anyway, some general optimisation points, not all directly related to
the parallel part of the problem:
0. profile to see what is taking time (and space, it increases GC time)
you can use +RTS -h without having to recompile for profiling
1. lists are memory inefficient (eg: Map keys): replace String with Text
2. lists and Text are bad at random access (eg: splitting at an index)
3. use Data.Map.Strict to avoid building up (+) thunks
4. (take n . sort) is possibly taking a lot of sequential time (see 0)
5. perhaps try parallel reductions instead of linear folds:
a * b * c * d * ... -> ((a * b) * (c * d)) * ...
though the overhead didn't seem worth it in my tests
6. not optimisation related, but consider writing a defaultMain
then each executable source can be reduce to
import DefaultMain (defaultMain)
import FrequencyImplementation (frequency)
main :: IO ()
main = defaultMain frequency
you could even have all the implementations with the same module name
but in different directories, and use hs-src-dirs: to select, then
you would be able to use an identical Main module.
more on 2:
A bit of a low-level hack, this: I read the whole input file as a
ByteString and split it into an appropriate number of chunks, taking
care not to split UTF-8 sequences, also taking care not to split
mid-word. Each output chunk references the original input without
copying the data. Then I decodeUtf8 each ByteString chunk to a Text.
This reduced peak memory usage (as reported by +RTS -h graphed with
hp2ps) from ~400MB of (:) to ~30MB of ARR_WORDS for the 11MB artamene.txt.
more on 4:
So you have a large Map k Int containing frequencies, and you want the
largest n frequency counts. There is an O(1) Map.splitRoot, which you
can use to parallel divide and conquer - the largest n of the whole map
are going to be among the (n * split count) combined largest n of each
of the split maps. This didn't give me much speed up at all, though.
I put my implementation at http://code.mathr.co.uk/word-histogram
Thanks,
Claude
--
http://mathr.co.uk
More information about the Haskell-Cafe
mailing list