[Haskell-cafe] How to improve speed? (MersenneTwister is several
times slower than C version)
Donald Bruce Stewart
dons at cse.unsw.edu.au
Wed Nov 1 22:12:37 EST 2006
lemmih:
> On 11/1/06, isto <isto.aho at dnainternet.net> wrote:
> >Hi all,
> >
> >On HaWiki was an announcement of MersenneTwister made by Lennart
> >Augustsson. On a typical run to find out 10000000th rnd num the output
> >is (code shown below):
> >
> >$ time ./testMTla
> >Testing Mersenne Twister.
> >Result is [3063349438]
> >
> >real 0m4.925s
> >user 0m4.856s
> >
> >
> >I was exercising with the very same algorithm and tried to make it
> >efficient (by using IOUArray): now a typical run looks like (code shown
> >below):
> >
> >$ time ./testMT
> >Testing Mersenne Twister.
> >3063349438
> >
> >real 0m3.032s
> >user 0m3.004s
> >
> >
> >The original C-version (modified so that only the last number is
> >shown) gives typically
> >
> >$ time ./mt19937ar
> >outputs of genrand_int32()
> >3063349438
> >
> >real 0m0.624s
> >user 0m0.616s
> >
> >Results are similar with 64 bit IOUArray against 64 bit C variant.
> >C seems to work about 5 to 10 times faster in this case.
> >
> >I have tried to do different things but now I'm stuck. unsafeRead
> >and unsafeWrite improved a bit the lazy (STUArray-version) and
> >IOUArray-versions but not very much. I took a look of Core file but
> >then, I'm not sure where the boxed values are ok. E.g. should IOUArray
> >Int Word64 be replaced with something else?
> >
> >Any hints and comments on how to improve the efficiency and make
> >everything better will be appreciated a lot!
> >
> >br, Isto
>
> Greetings,
>
> Applying a few optimations can make it about 3x faster.
>
> 1. Hoist the array out of your loops. (See generateNumbers32,
> initialiseGenerator32 and genRNums).
> 2. Don't create too many new MT32 boxes. Most of the time is spent in
> 'next32' and changing its type to 'IOUArray Int Word32 -> Int -> IO
> (Word32, Int)' makes it much faster.
> 3. Demand more inlining. If you're using GHC,
> -funfolding-use-threshold=16 will substantially improve the
> performance.
>
> Using 'seq' is generally a bad idea. It can worsen the performance if
> not used carefully and GHCs strictness analyser is usually good
> enough.
> I used the profiler and -ddump-simpl to analyse this program.
>
> Donald suggested manual unboxing. However, in this case it won't help
> much (if at all) since GHC is doing such an excellent job on its own.
I wasn't suggesting manual unboxing, more that you should carefully
inspect the Core, and tune with bang patterns where necessary.
-funfolding-use-threshold=16 is a good idea though. or =100 ;)
-- Don
More information about the Haskell-Cafe
mailing list