[Haskell-cafe] Trying to fix space leak

Ben Challenor ben at challenor.org
Mon Dec 31 10:20:07 EST 2007


Hi

I'm learning Haskell through writing a compiler. I'm seeing huge memory 
use in a function which converts the dataflow graph to the form required 
by Data.Graph. It needs to return a map from dataflow nodes to Vertexs, 
a map in the other direction, and the list of edges (as Vertex pairs).

> 	total time  =       18.78 secs   (939 ticks @ 20 ms)
> 	total alloc = 6,616,297,296 bytes  (excludes profiling overheads)
> 
> COST CENTRE                    MODULE               %time %alloc
> 
> j                              Dataflow              87.6   99.9
> graphvizCompile                Dataflow               6.4    0.0
> f                              Dataflow               2.9    0.0
> con2tag_DFReal#                Representation         2.1    0.0

I assume the allocation is being garbage-collected pretty quickly, 
because a) 6,616,297,296 bytes is stupid (!) and b) Process Explorer 
informs me that the peak private bytes of the program is not more than a 
couple of MB.

The offending code looks like this:

> -- (next Vertex available for use, edges so far, map so far, map so far)
> type DependencyEdgesAcc = (Vertex, [Edge], Map.Map DFNode Vertex, Map.Map Vertex DFNode)
> 
> -- Process a list of DFNodes in the same context.
> dependencyEdges :: DependencyEdgesAcc -> [DFNode] -> DependencyEdgesAcc
> dependencyEdges acc [] = {-# SCC "a" #-} acc
> dependencyEdges acc@(_, _, mnv1, _) (n:ns) =
>   -- Check whether this node has been done before.
>   {-# SCC "j" #-} case {-# SCC "i" #-} Map.lookup n mnv1 of
>     Just _ -> {-# SCC "b" #-} dependencyEdges acc ns
>     Nothing ->
>       -- Find the list of dependencies.
>       let ndeps = {-# SCC "c" #-} nodeDependencies n in
>       -- Process them first.
>       let (v2, es2, mnv2, mvn2) = {-# SCC "d" #-} dependencyEdges acc ndeps in
>       -- Now we can claim v2 as the label for this node n.
>       -- Look up vdep for each dependency ndep and add an edge from the vdep to v2.
>       let {(v4, es4, mnv4, mvn4) = {-# SCC "e" #-} List.foldl' (
>           \(v3, es3, mnv3, mvn3) ndep ->
>             case {-# SCC "f" #-} Map.lookup ndep mnv3 of
>               Just vdep -> (v3, (vdep, v2):es3, mnv3, mvn3)
>               Nothing -> error $ "this node should have been added already: " ++ show ndep
>           ) (v2, es2, mnv2, mvn2) ndeps } in
>       assert (v2 == v4)
>       assert (mvn2 == mvn4)
>       assert (mnv2 == mnv4)
>       -- Finally, add this node to the accumulator and then recurse.
>       dependencyEdges (v2+1, es4, {-# SCC "g" #-} Map.insert n v2 mnv2, {-# SCC "h" #-} Map.insert v2 n mvn2) ns

with the following profile:

>     dependencyGraph      Dataflow                                            4395           1   0.0    0.0    93.4   99.9
>      getRootDFNodes      Dataflow                                            4397           5   0.0    0.0     0.0    0.0
>      dependencyEdges     Dataflow                                            4396         282   0.0    0.0    93.4   99.9
>       a                  Dataflow                                            4407         104   0.0    0.0     0.0    0.0
>       j                  Dataflow                                            4399         178  87.6   99.9    93.4   99.9
>        b                 Dataflow                                            4417          75   0.0    0.0     0.0    0.0
>        g                 Dataflow                                            4405         103   0.1    0.0     0.2    0.0
>         con2tag_DFBool#  Representation                                      4419        1972   0.0    0.0     0.0    0.0
>         con2tag_DFReal#  Representation                                      4414      186890   0.1    0.0     0.1    0.0
>         con2tag_DFNode#  Representation                                      4413        1204   0.0    0.0     0.0    0.0
>        h                 Dataflow                                            4404         103   0.0    0.0     0.0    0.0
>        e                 Dataflow                                            4403         103   0.0    0.0     4.8    0.0
>         f                Dataflow                                            4412         174   2.9    0.0     4.8    0.0
>          con2tag_DFBool# Representation                                      4420      245794   0.0    0.0     0.0    0.0
>          con2tag_DFReal# Representation                                      4416    25772482   1.9    0.0     1.9    0.0
>          con2tag_DFNode# Representation                                      4415        2068   0.0    0.0     0.0    0.0
>        d                 Dataflow                                            4402         103   0.0    0.0     0.0    0.0
>        c                 Dataflow                                            4401         103   0.0    0.0     0.0    0.0
>         nodeDependencies Dataflow                                            4406         103   0.0    0.0     0.0    0.0
>        i                 Dataflow                                            4400         178   0.6    0.0     0.7    0.0
>         con2tag_DFBool#  Representation                                      4418       48230   0.0    0.0     0.0    0.0
>         con2tag_DFReal#  Representation                                      4410     5045646   0.1    0.0     0.1    0.0
>         con2tag_DFNode#  Representation                                      4409        1928   0.0    0.0     0.0    0.0

Is this a space leak, or is my algorithm just silly? (It could well be 
the latter...)

I've tried adding random bangs, to no effect. So, can anyone help me? :)

Cheers
Ben


More information about the Haskell-Cafe mailing list