[Haskell-cafe] Large graphs

Sun May 20 15:37:44 CEST 2012

I have a problem that I'm trying to use Haskell for, and I think I'm running into scalability issues in FGL.  However, I am quite new to practical programming in Haskell, so it's possible that I have some other bone-headed performance bug in my code.  I tried looking around for concrete information about the scalability of Haskell's graph libraries, but didn't find much.  So here are the characteristics of the problem I'm working on:

- Large directed graphs.  Mostly 10k-100k nodes, but some in the low 100ks.
- Sparse graphs.  The number of edges is only 2-3x the number of nodes.
- Immutable structure, mutable labels.  After initially reading in the graphs, their shape doesn't change, but information "flows" around the graph, changing the labels on nodes and edges.

I wrote some code that reads in graphs and some some basic flow computations on them.  The first few graphs I tried were around 10k nodes, and the performance was okay (on the order of several seconds).  When I tried some larger graphs (~100k), the memory consumption spiked into multiple GB, the CPU utilization went down to single digit percentages and the overall running time was closer to hours than seconds.

Because the graph structure is basically immutable for my problem, I'm tempted to write my own graph representation based on mutable arrays.  Before I embark on that, I wonder if anyone else can share their experience with large graphs in Haskell?  Is there a library (FGL or otherwise) that should be able to scale up to the size of graph I'm interested in, if I write my code correctly?

Thanks,
Ben