[Haskell] Edison 1.2 and profiling

Sun Jun 4 13:54:20 EDT 2006

> 
> Enabling profiling can significantly change the space behavior of  
> programs.  There may be some important optimizations that don't occur  
> in the presence of profiling, and the profiling data itself takes  
> some memory.  I have to admit, I don't know a lot about GHC profiling  
> so I'm afraid I can't be of much help.

Me neither. I tried several things: Just turning off the optimizations doesn't cause the
problem. Optimization + profiling (-O -prof) has the same problem as -prof alone. It
doesn't matter if -auto-all option is there or not. And the profiled program gobbles up
all memory with or without the +RTS -p option.

> If you can scale down the  
> problem size to do your profiling, that may keep it from being OOM'd  
> and let you get some useful data.

I'll try, but I doubt it. The program seems to misbehave at the very start of the execution.

> > The
> > non-profiled version of the program runs perfectly well. In fact, the
> > PatriciaLoMap from Edison seems to be about two to three times faster
> > than Data.Map library for my purposes.
> 
> That's good to hear.  What's the usage pattern?

As soon as I get the execution profile, I'll send it to you. But it's basically a lot of
small maps, with few insertions, moderate lookups and folds, and lots of unions.

The one function I'm missing from Data.Map library is mapAssoc, but I can manage without
it. One thing you might find interesting about the usage pattern is that my keys are not
actually of type Int, but they contain an Int field. To get around PatriciaLoMap's
restriction that keys must be Ints, I'm abstracting it as follows:

    import qualified Data.Edison.Assoc.PatriciaLoMap as PLM

    newtype Map term =
       PLM (PLM.FM (Label, term))

    map f (PLM map) = PLM (PLM.map (\ (k, v)-> (k, f k v)) map)
    foldWithKey f x (PLM map) = PLM.fold (uncurry f) x map
    filter f (PLM map) = PLM (PLM.filter (f . snd) map)
    adjust f k (PLM map) = PLM (PLM.adjust (right f) (labelId k) map)
       where right f (x, y) = (x, f y)
    lookupM k (PLM map) = fmap snd (PLM.lookupM (labelId l) map)
    ...

I wonder if PatriciaLoMap would be faster if it applied this pattern internally?
Basically, could a structure like the following one make sense in Edison itself?

    class Countable x where
       toInt :: x -> Int

    data Countable k => GenericPLM k v = PLM.FM (k, v)