[Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

Donald Bruce Stewart dons at cse.unsw.edu.au
Wed Nov 1 22:12:37 EST 2006


lemmih:
> On 11/1/06, isto <isto.aho at dnainternet.net> wrote:
> >Hi all,
> >
> >On HaWiki was an announcement of MersenneTwister made by Lennart
> >Augustsson.  On a typical run to find out 10000000th rnd num the output
> >is (code shown below):
> >
> >$ time ./testMTla
> >Testing Mersenne Twister.
> >Result is [3063349438]
> >
> >real    0m4.925s
> >user    0m4.856s
> >
> >
> >I was exercising with the very same algorithm and tried to make it
> >efficient (by using IOUArray): now a typical run looks like (code shown
> >below):
> >
> >$ time ./testMT
> >Testing Mersenne Twister.
> >3063349438
> >
> >real    0m3.032s
> >user    0m3.004s
> >
> >
> >The original C-version (modified so that only the last number is
> >shown) gives typically
> >
> >$ time ./mt19937ar
> >outputs of genrand_int32()
> >3063349438
> >
> >real    0m0.624s
> >user    0m0.616s
> >
> >Results are similar with 64 bit IOUArray against 64 bit C variant.
> >C seems to work about 5 to 10 times faster in this case.
> >
> >I have tried to do different things but now I'm stuck.  unsafeRead
> >and unsafeWrite improved a bit the lazy (STUArray-version) and
> >IOUArray-versions but not very much.  I took a look of Core file but
> >then, I'm not sure where the boxed values are ok. E.g. should  IOUArray
> >Int Word64  be replaced with something else?
> >
> >Any hints and comments on how to improve the efficiency and make
> >everything better will be appreciated a lot!
> >
> >br, Isto
> 
> Greetings,
> 
> Applying a few optimations can make it about 3x faster.
> 
> 1. Hoist the array out of your loops. (See generateNumbers32,
> initialiseGenerator32 and genRNums).
> 2. Don't create too many new MT32 boxes. Most of the time is spent in
> 'next32' and changing its type to 'IOUArray Int Word32 -> Int -> IO
> (Word32, Int)' makes it much faster.
> 3. Demand more inlining. If you're using GHC,
> -funfolding-use-threshold=16 will substantially improve the
> performance.
> 
> Using 'seq' is generally a bad idea. It can worsen the performance if
> not used carefully and GHCs strictness analyser is usually good
> enough.
> I used the profiler and -ddump-simpl to analyse this program.
> 
> Donald suggested manual unboxing. However, in this case it won't help
> much (if at all) since GHC is doing such an excellent job on its own.

I wasn't suggesting manual unboxing, more that you should carefully
inspect the Core, and tune with bang patterns where necessary. 

-funfolding-use-threshold=16 is a good idea though. or =100 ;)

-- Don


More information about the Haskell-Cafe mailing list