[Haskell-cafe] Performance Issue
Don Stewart
dons at galois.com
Thu Feb 26 12:35:34 EST 2009
james.swaine:
> i'm implementing a benchmark which includes a detailed specification for a
> random number generator. for any of the kernels outlined in the benchmark, i
> might have to generate a set of random numbers R, which has a length n, using
> the following formulas:
>
> R[k] = ((2^-46)(X[k])) mod 2^46, where
>
> X[k] = (a^k)s
>
> where the values of a and s are constant and defined below.
> many of the kernels in the benchmark require a large number of randoms to be
> generated (in the tens of millions). when i invoke the following getRandAt
> function that many times to build up a list, evaluation of the list takes
> forever (somewhere between 5 and 10 minutes). i've tried optimizing this
> several different ways, with no luck. i though i might post my code here and
> see if anyone notices anything i'm doing wrong that might be causing such a
> large bottleneck:
>
> --constants
> a :: Int64
> a = 5^13
>
> divisor :: Int64
> divisor = 2^46
>
> multiplier :: Float
> multiplier = 2**(-46)
>
>
> --gets r[k], which is the value at the kth
> --position in the overall sequence of
> --pseudorandom numbers
> getRandAt :: Int64 -> Int64 -> Float
> getRandAt 0 seed = multiplier * (fromIntegral seed)
> getRandAt k seed = multiplier * (fromIntegral x_next)
> where
> x_prev = (a^k * seed) `mod` divisor
> x_next = (a * x_prev) `mod` divisor
>
> thanks all in advance for your help!
Using ghc -O2 --make
There's nothing wrong with this code, really:
Z.$wgetRandAt :: Int# -> Int# -> Float#
and an inner loop of:
Z.$w$j :: Int# -> Float#
Z.$w$j =
\ (w_sHx :: Int#) ->
case Z.^1 Z.lit1 Z.lvl of w1_XFs { I64# ww_XFv ->
case ww_XFv of wild_aFB {
__DEFAULT ->
case minBound3 of wild1_aFC { I64# b1_aFE ->
case ==# w_sHx b1_aFE of wild2_aFG {
False ->
case modInt# w_sHx wild_aFB of wild3_aFJ { __DEFAULT ->
timesFloat#
(powerFloat# __float 2.0 __float -46.0)
(int2Float# wild3_aFJ)
};
True ->
case wild_aFB of wild3_aFM {
__DEFAULT ->
case modInt# w_sHx wild3_aFM of wild4_aFN { __DEFAULT ->
timesFloat#
(powerFloat# __float 2.0 __float -46.0)
(int2Float# wild4_aFN)
};
(-1) ->
overflowError
`cast` (CoUnsafe (forall a_aFS. a_aFS) Float#
:: forall a_aFS. a_aFS ~ Float#)
}
}
};
0 ->
divZeroError
`cast` (CoUnsafe (forall a_aFT. a_aFT) Float#
:: forall a_aFT. a_aFT ~ Float#)
}
Which is just fine.
Inlining those constants explicitly might be a good idea, then we get an outer loop of:
Z.$wgetRandAt :: Int# -> Int# -> Float#
Z.$wgetRandAt =
\ (ww_sHG :: Int#) (ww1_sHK :: Int#) ->
case ww_sHG of wild_B1 {
__DEFAULT ->
case Z.lvl3 of wild1_aEd { I64# x#_aEf ->
case Z.^ Z.a (I64# wild_B1)
of wild2_XFe { I64# x#1_XFh ->
case Z.lvl1 of w_aE7 { I64# ww2_aE9 ->
case ww2_aE9 of wild3_aFB {
__DEFAULT ->
case minBound3 of wild11_aFC { I64# b1_aFE ->
let {
ww3_aFz [ALWAYS Just L] :: Int#
ww3_aFz = *# x#1_XFh ww1_sHK } in
case ==# ww3_aFz b1_aFE of wild21_aFG {
False ->
case modInt# ww3_aFz wild3_aFB
of wild31_aFJ { __DEFAULT ->
Z.$w$j (*# x#_aEf wild31_aFJ)
};
True ->
case wild3_aFB of wild31_aFM {
__DEFAULT ->
case modInt# ww3_aFz wild31_aFM
of wild4_aFN { __DEFAULT ->
Z.$w$j (*# x#_aEf wild4_aFN)
};
(-1) ->
overflowError
`cast` (CoUnsafe (forall a_aFS. a_aFS) Float#
:: forall a_aFS. a_aFS ~ Float#)
}
0 ->
divZeroError
`cast` (CoUnsafe (forall a_aFT. a_aFT) Float#
:: forall a_aFT. a_aFT ~ Float#)
}
}
}
};
0 ->
timesFloat#
(powerFloat# __float 2.0 __float -46.0)
(int2Float# ww1_sHK)
}
which is the fast path, then error / bounds checking.
this looks perfectly acceptable.
What does sound troublesome is using lazy lists .. that's more likely to be the bottleneck.
-- Don
More information about the Haskell-Cafe
mailing list