[Haskell-cafe] How to improve speed? (MersenneTwister is several times slower than C version)

Lennart Augustsson lennart at augustsson.net
Wed Nov 1 22:04:39 EST 2006


The whole point of writing the Mersenne Twister was that I wanted to  
show how a stateful computation could be encapsulated in the ST monad  
and none of it showing up outside.  This aspect of the code is  
totally gone now when everything is in the IO monad.  Is there some  
good reason to have it in the IO monad?

	-- Lennart

On Nov 1, 2006, at 20:51 , Lemmih wrote:

> On 11/1/06, isto <isto.aho at dnainternet.net> wrote:
>> Hi all,
>>
>> On HaWiki was an announcement of MersenneTwister made by Lennart
>> Augustsson.  On a typical run to find out 10000000th rnd num the  
>> output
>> is (code shown below):
>>
>> $ time ./testMTla
>> Testing Mersenne Twister.
>> Result is [3063349438]
>>
>> real    0m4.925s
>> user    0m4.856s
>>
>>
>> I was exercising with the very same algorithm and tried to make it
>> efficient (by using IOUArray): now a typical run looks like (code  
>> shown
>> below):
>>
>> $ time ./testMT
>> Testing Mersenne Twister.
>> 3063349438
>>
>> real    0m3.032s
>> user    0m3.004s
>>
>>
>> The original C-version (modified so that only the last number is
>> shown) gives typically
>>
>> $ time ./mt19937ar
>> outputs of genrand_int32()
>> 3063349438
>>
>> real    0m0.624s
>> user    0m0.616s
>>
>> Results are similar with 64 bit IOUArray against 64 bit C variant.
>> C seems to work about 5 to 10 times faster in this case.
>>
>> I have tried to do different things but now I'm stuck.  unsafeRead
>> and unsafeWrite improved a bit the lazy (STUArray-version) and
>> IOUArray-versions but not very much.  I took a look of Core file but
>> then, I'm not sure where the boxed values are ok. E.g. should   
>> IOUArray
>> Int Word64  be replaced with something else?
>>
>> Any hints and comments on how to improve the efficiency and make
>> everything better will be appreciated a lot!
>>
>> br, Isto
>
> Greetings,
>
> Applying a few optimations can make it about 3x faster.
>
> 1. Hoist the array out of your loops. (See generateNumbers32,
> initialiseGenerator32 and genRNums).
> 2. Don't create too many new MT32 boxes. Most of the time is spent in
> 'next32' and changing its type to 'IOUArray Int Word32 -> Int -> IO
> (Word32, Int)' makes it much faster.
> 3. Demand more inlining. If you're using GHC,
> -funfolding-use-threshold=16 will substantially improve the
> performance.
>
> Using 'seq' is generally a bad idea. It can worsen the performance if
> not used carefully and GHCs strictness analyser is usually good
> enough.
> I used the profiler and -ddump-simpl to analyse this program.
>
> Donald suggested manual unboxing. However, in this case it won't help
> much (if at all) since GHC is doing such an excellent job on its own.
>
> -- 
> Cheers,
>  Lemmih
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list