[Haskell-cafe] How to give unique name/id to nodes outside any monad ?

Timothy Goddard tim at goddard.net.nz
Fri Jan 9 01:11:30 EST 2009


On Thu, 08 Jan 2009 21:28:27 minh thu wrote:
> Hi,
>
> I'd like to process some kind of graph data structure,
> say something like
>
> data DS = A [DS] | B DS DS | C.

Graphs in funtional languages aren't usually represented in this sort of 
manner. Trees are fine to represent like that as they're acyclic and have 
exactly one parent for each node but for graphs it's much more difficult. Say 
that you have a graph with directed connections like this:

0 -> 1
1 -> 2
2 -> 3
1 -> 3
3 -> 4

Now you want to alter node 4. Node 3 has to be updated to point to the new 
version of 4, node 1 has to be changed to point to the new version of 3, node 
2 has to be changed to point to the new version of node 3, then node 1 has to 
be changed again to point to the new version of 2, then finally 0 can be 
changed to point to the new version of 1 and returned.

There is no simple way using this representation to handle that double-update 
to node 1, or to handle disconnected or cyclic graphs. Updates are extremely 
difficult since Haskell data structures are not mutable and have no concept 
of identity. The approach of treating nodes as structures with pointers to 
each other cannot be cleanly and efficiently implemented in an immutable 
fashion. It only really makes sense in a stateful, imperative context.

An approach that suits functional languages better is to store a flat 
structure listing the edges leaving each node. This, I believe, is the 
approach taken by Haskell's main graph library, FGL 
(http://hackage.haskell.org/cgi-bin/hackage-scripts/package/fgl). You would 
now have something like:

data MyNode nv = MyNode {nodeId::Int, nodeValue::nv}

data MyEdge ev = MyEdge {edgeDestination::Int, edgeValue::ev}

data MyGraph nv ev = MyGraph {
	maxNode :: Int,
	nodes :: (Map Int nv),
	edges :: (Map Int [MyEdge ev])}

emptyGraph :: MyGraph nv ev
emptyGraph = MyGraph 0 (Data.Map.empty) (Data.Map.empty)

getNode :: MyGraph nv ev -> Int -> Maybe (MyNode nv)
getNode g id = ((nodes g) `lookup` id) >>= (\v -> MyNode id v)

getEdgesLeaving :: MyGraph nv ev -> Int -> [MyEdge ev]
getEdgesLeaving g id = fromMaybe [] ((edges g) `lookup` id)

addNode :: nv -> MyGraph nv ev -> (Int, MyGraph nv ev)
addNode val g = (maxNode newGraph, newGraph)
	where
		newNodeId = (maxNode g) + 1
		newGraph = MyGraph newNodeId (insert newNodeId val (nodes g)) (edges g)

... and so on. (This is all totally untested - use at your own peril.)

Each node in the graph has a unique identifying number, issued in sequence 
using maxNode as a counter. This makes identifying cycles easy. The nodes map 
contains the value for each node based on its id. The edges map contains a 
list of links from each node to others in the graph. Finding links entering a 
node is quite expensive - if you need to do this often then maintaining a 
second list of edges entering each node would speed it up.

Each node and each edge can have a custom data structure attached. New nodes 
and edges can be added without having to modify references elsewhere, nodes 
have a distinct identity given by the associated Int and the graph is 
immutable - operations on it produce modified copies.

Cheers,

Tim


More information about the Haskell-Cafe mailing list