[Haskell-beginners] Flying Dutchman sailing blind

Wed Oct 13 09:53:27 EDT 2010

On Wednesday 13 October 2010 14:52:58, Jeroen van Maanen wrote:
> So in fact the culprit turned out to be the function updateRationals in
> the module
>
>  
> http://lexau.svn.sourceforge.net/viewvc/lexau/branches/totem/src/LExAu/D
>istribution/MDL.hs?view=markup
>
> It is still eating more time than the actual optimizer, so suggestions
> for improvement are still welcome.

First,

approximateRational :: Double -> Rational
approximateRational x =
    let (m, e) = decodeFloat x
    in if e >= 0
        then (m * (2 ^ e)) % 1
        else m % (2 ^ (-e))

is exactly toRational, so there's no need for that function. Also, there's 
a much faster implementation of toRational (for Double and Float) underway, 
I think it will be in the first GHC 7 release due in a couple of weeks.
Anyway, using toRational will let you profit from that with no additional 
effort when you get a compiler with that patch.

Second,

data Threshold =
  Threshold
    { theBoundA :: Rational
    , theBoundB :: Rational
    , theCountA :: Integer
    , theCountB :: Integer
    }
  deriving Show

I have not looked much at the code, but it seems likely that you will want 
strict fields there,

data Threshold =
  Threshold
    { theBoundA :: !Rational
    , ...
    }

but that's to be tested later.

Third,

mapToThresholds :: [Threshold] -> [Rational] -> [(Rational, Rational)]
mapToThresholds _ [] = []
mapToThresholds thresholds@((Threshold boundA boundB intA intB) : 
moreThresholds) rationals@(x : moreRationals)
  | x > boundB = mapToThresholds moreThresholds rationals
  | x > boundA =
     let width = boundB - boundA
         count = fromInteger (intB - intA)
         mapped = (((x - boundA) * count) / width) + (fromInteger intA)
     in (mapped, x) : mapToThresholds thresholds moreRationals
  | True = error $ "Rational is too small: " ++ (show x) ++ " < " ++ (show 
boundA)
mapToThresholds [] (x : _) = error $ "Rational is too big: " ++ (show x)

will probably profit from making mapped strict,

    let ...
    in mapped `seq` (mapped, x) : mapToThreholds ... 

Now updateRationals:

updateRationals :: Integer -> [(Integer, Rational)] -> Integer -> 
[(Integer, Rational)]
updateRationals previousWeight previousRationals w

             else let mapped = mapToThresholds thresholds boundaries
                      mappedIntervals = zip ((0, 0) : mapped) mapped
                      ((_, a), (_, b)) = foldl1' maxMappedInterval 
mappedIntervals

That's no good, unfortunately.

maxMappedInterval :: ((Rational, Rational), (Rational, Rational)) -> 
((Rational, Rational), (Rational, Rational)) -> ((Rational, Rational), 
(Rational, Rational))
maxMappedInterval ((ma, a), (mb, b)) ((mc, c), (md, d)) =
  if md - mc > mb - ma
    then ((mc, c), (md, d))
    else ((ma, a), (mb, b))

foldl1' evaluates the result of maxMappedInterval to weak head normal form, 
that is to the outermost constructor.
Depending on what the optimiser does, that may or may not evaluate the 
condition md - mc > mb - ma, but it will *not* look at a, b, c, d, and at 
least the second components of the inner pairs happily build thunks, 
possibly keeping references to the elements of the list already processed, 
so keeping stuff from being garbage collected.

What you need is a strict type to contain your Rationals,

data SRQ = SRQ !Rational !Rational !Rational !Rational

maxMappedInterval :: SRQ -> ((Rational,Rational)) -> SRQ
maxMappedInterval s@(SRQ ma a mb b) ((mc,c),(md,d))
  | mb - ma < md - mc = SRQ mc c md d
  | otherwise = s

Then the foldl1' will evaluate all components and you don't get thunks or 
space leaks.