[Haskell-cafe] A composable, local graph representation as an open discussion

Mon Oct 24 12:56:54 UTC 2016

Haskell-Cafe:

   I have been working on the following idea, and would appreciate
any comments on the novelty or usefulness in your own applications.
A scan of the usual Haskell documents turns up lots of clever data
structures, but nothing particularly enlightening for graphs.
Here is my attempt:

   Graphs are difficult to represent in functional languages
because they express arbitrary undirected connectivity between nodes,
whereas functional code naturally expresses directed trees.

   Most functional algorithms for graphs use an edge-list
with global labels.  Although effective, this method
loses compositionality and does not exploit the type system
for enforcing graph invariants such as consistency of the edge list.

   This note presents a functional method for constructing
a local representation for undirected graphs functionally as
compositions of other graphs.  The resulting data structure
does not use unique node labels, but rather allows edge traversal
from any node to its neighbor through a lookup function.
Graph traversal then emerges as a discussion among static
nodes.  I have found this method useful for assembling sets
of molecules in chemical simulations.  It's also an interesting
model for framing philosophical questions about the measurement
problem in quantum physics.

   As a disclaimer, although it is useful for constructing graphs,
it is not obvious how common operations like graph
copying or node deletion could be performed.  This note
does not discuss how to implement any graph algorithms.

> import qualified Prelude
> import Prelude hiding ((.))
> import Data.Semigroup(Semigroup,(<>))
> import Data.Tuple(swap)

   First, I change the meaning of "." to be element access.
I think this is a cleaner way to work with record data,
and suggest that there should be a special way to use this
syntax without making accessor names into global variables.

> infixl 9 .
> a . b = b a -- switch to member access

   Every subgraph has open ends, which we just number
sequentially from zero.  The lookup function
provides the subgraph's window to the outside world.
Its inputs reference outgoing connections.
A subgraph, built as a composite of two
subgraphs, will have the job of providing the correct
lookup environment to both children.

> type Conn = Int
> newtype Lookup l = Lookup ( Conn -> (l, Lookup l) )

   The tricky part is making the connections between
the internal and external worlds.  For the internal nodes to be complete,
they must have access to complete external nodes.  The problem
is reversed for the external nodes.

   A naive idea is to represent a graph using
a reader monad parameterized over label
and result types (l,r).
  -- newtype Grph l r = Reader (Int -> (l, Lookup)) r
Unfortunately, this breaks down
because the outside world also needs to be able to
`look inside' the subgraph.  The above approach runs into trouble
when constructing the lookup function
specific to each child.  That lookup function needs the outside world,
and the outside world can't be completed without the
ability to look inside!

   We capitulate to this symmetry between the graph and its environment
by using a representation of a subgraph that provides
both a top-down mechanism for using the graph
as well as a bottom-up representation of the subgraph
to the outside world.

> data Grph l r = Grph { runGrph :: Lookup l -> r,
>                        self    :: Conn -> Lookup l -> (l, Lookup l),
>                        nopen   :: Int
>                      }

   The default action of `running' a graph is to run a local action
on each node.  That local function has access to the complete
graph topology via the lookup function.
Since we expect this to be a fold, the result type will
probably be a monoid, or at least a semigroup.
Any sub-graph can be run by specifying what to
do with incomplete connections.  At the top-level, there
should not be `open' connections.

>--run g = (g.runGrph) $ Lookup (\ _ -> error "Tried to go out of
top-level.")
> run g = (g.runGrph) $ u
>         where u = Lookup $ \ _ -> ("end", u)

   Individual nodes are themselves subgraphs.
Nodes must specify how many external connections
can be made, as well as an arbitrary label and an action.

> node :: Int -> l -> ((l, Lookup l) -> r) -> Grph l r
> node n l run = Grph (\e -> run (l, e)) (\_ e -> (l, e)) n

   Arbitrary graphs are constructed by joining two subgraphs.
The key here is the construction of separate lookup
environments for the each subgraph.  The left subgraph
can be connected to the first few openings in the environment
or to the right subgraph.  The right subgraph can connect
to the last few openings of the environment, or to the
left subgraph.  Each time an edge is traversed,
a series of "env" calls are made -- sweeping upward
until an internal connection happens.  Then a downward
sweep of "self" calls are made.  This takes at best
O(log|nodes|) operations.

   Connections are specified by (Conn,Conn) pairs,
so we need the ability to lookup from the permutation
or else to return the re-numbering after subtracting
connections used by the permutation.

> type Permut = [(Conn, Conn)]
> find_fst :: Conn -> Permut -> Either Conn Conn
> find_fst = find1 0 where
>     find1 n a ((a',b):tl) | a == a' = Left b -- internal
>     find1 n a ((a',_):tl) | a' < a  = find1 (n+1) a tl
>     find1 n a (_:tl)                = find1 n a tl
>     find1 n a []                    = Right (a-n) -- external
> find_snd b p = find_fst b (map swap p)

>-- append 2 subgraphs
> append :: (Semigroup r) => Permut -> Grph l r -> Grph l r -> Grph l r
> append p x y = Grph { runGrph = \(Lookup env) ->
>                                 (x.runGrph) (e1 env)
>                              <> (y.runGrph) (e2 env),
>                       self  = down,
>                       nopen = (x.nopen) + (y.nopen) - 2*(length p)
>                     }
>             where
>                   down n (Lookup env) | n < ystart  = (x.self) n (e1 env)
>                   down n (Lookup env) = (y.self) (n-ystart) (e2 env)
>                   e1 env = Lookup $ \n -> case find_fst n p of
>                         Right m -> env m
>                         Left  m -> (y.self) m (e2 env)
>                   e2 env = Lookup $ \n -> case find_snd n p of
>                         Right m -> env (m+ystart)
>                         Left m -> (x.self) m (e1 env)
>                   ystart = (x.nopen) - length p -- start of b's env. refs

   This is a helper function for defining linear graphs.

> instance Semigroup r => Semigroup (Grph l r) where
>   (<>) = append [(1,0)]

   A simple action is just to show the node labels and
the labels of each immediate neighbor.

> show_node (l, Lookup env) = " " ++ show l
> show_env  (l, Lookup env) = show l
>             ++ foldl (++) (":") (map (\u -> show_node(env u)) [0, 1])
>             ++ "\n"

   The following example graphs are a list of 4 single nodes,
two incomplete 2-member chains, and a complete 4-member cycle.
The key feature here is that that the graphs are all composable.

> c6 = [ node 2 ("C"++show n) show_env | n <- [1..4] ]
> str  = c6!!0 <> c6!!1
> str' = c6!!2 <> c6!!3
> cyc = append [(1,0), (0,1)] str str' -- Tying the knot.
> main = putStrLn $ run cyc

   The connection to the measurement problem in quantum physics
comes out because the final output of running any graph
is deterministic, but can depend nontrivially on the graph's environment.
Like links in the graph, physical systems communicate through
their mutual interactions, and from those determine a new state
a short time later.  In a closed universe, the outcome is deterministic,
while for any an open system (subgraph), the outcome is probabilistic.
The analogy suggests that understanding how probabilities
emerge in the measurement problem requires a
two-way communication channel between the system and its environment.

~ David M. Rogers