[Haskell-cafe] Graph theory analysis of Haskell code

Wed Dec 5 18:46:11 EST 2007

This isn't strictly Haskell related, but anyway.

Next year I will be doing my honours in mathematics.  One possible
topic for my thesis that I've thought of - and my supervisor is quite
enthused about - is to use graph theory to analyse various textual
sources, starting with source code but leaving the framework open
enough to be able to extend it to other sources (e.g. email address
books).

How I envisage it happening is that a parser would be used to find all
"functions" in the given code, treat these as nodes in the graph and
then use directed edges to indicate which functions call other
functions.  This resultant graph can then be analysed in various ways
suitable to the context (e.g. find that a library module can be split
into two since there are two completely separate trees present in the
graph that don't interact at all, or if a function is only ever called
by one other function then it can be subsumed into it).

So, here is the question I ask of all of you: is this feasible?  Do
you know if anything like this has ever been attempted before?  I know
there are some other usages of graph theory related to source code
(e.g. McCabes complexity metric [1]), but I couldn't seem to find
anything related to what I'm proposing.  I intend to code this up in
Haskell (possibly using FGL: I know of it, but haven't really looked
at it) and use Haskell as my primary target for analysis, so in a
sense the resultant graph could be seen as a Haskell equivalent to
UML.

[1] http://en.wikipedia.org/wiki/Cyclomatic_complexity

-- 
Ivan Lazar Miljenovic