[Haskell-cafe] Large graphs

Sun May 20 18:42:54 CEST 2012

I had issues with FGL in the past, too. Although FGL is really nice to
work with, it just uses a ridiculous amount of memory for large
graphs.

In the end, I used Data.Graph from containers [1]. This was a lot more
reasonable, and let me finish my project relatively easily.

Regards,
  - Clark

[1] http://hackage.haskell.org/packages/archive/containers/0.5.0.0/doc/html/Data-Graph.html

On Sun, May 20, 2012 at 10:55 AM, Serguey Zefirov <sergueyz at gmail.com> wrote:
> 2012/5/20 Benjamin Ylvisaker <benjaminy at fastmail.fm>:
>> I have a problem that I'm trying to use Haskell for, and I think I'm running into scalability issues in FGL.  However, I am quite new to practical programming in Haskell, so it's possible that I have some other bone-headed performance bug in my code.  I tried looking around for concrete information about the scalability of Haskell's graph libraries, but didn't find much.  So here are the characteristics of the problem I'm working on:
>>
>> - Large directed graphs.  Mostly 10k-100k nodes, but some in the low 100ks.
>> - Sparse graphs.  The number of edges is only 2-3x the number of nodes.
>> - Immutable structure, mutable labels.  After initially reading in the graphs, their shape doesn't change, but information "flows" around the graph, changing the labels on nodes and edges.
>
> I would like to suggest to you a representation based in 32-bit
> integers as vertex index. I.e., "roll your own"
>
> Use strict IntMap IntSet for neighbor information, it is very efficient.
>
>> I wrote some code that reads in graphs and some some basic flow computations on them.  The first few graphs I tried were around 10k nodes, and the performance was okay (on the order of several seconds).  When I tried some larger graphs (~100k), the memory consumption spiked into multiple GB, the CPU utilization went down to single digit percentages and the overall running time was closer to hours than seconds.
>
> Looks like your code does not force everything. It leaves some thunks
> unevaluated, check for that situation.
>
> It is common pitfall, not only for computations on graphs.
>
>>
>> Because the graph structure is basically immutable for my problem, I'm tempted to write my own graph representation based on mutable arrays.  Before I embark on that, I wonder if anyone else can share their experience with large graphs in Haskell?  Is there a library (FGL or otherwise) that should be able to scale up to the size of graph I'm interested in, if I write my code correctly?
>
> The above structure (IntMap IntSet) allowed for fast computations on
> relatively large arrays, in order of 1M vertices and 16M
> undirected/32M directed edges.
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>