Module extractor
Jan Skibinski
jans@numeric-quest.com
Wed, 31 Jan 2001 20:55:57 -0500 (EST)
Hi All,
During the last few days I've been working on the
ModuleExtractor - a high level extractor of modules from
the Haskell source files. This is not a low level
parser -- as used by the compilers -- since it only cares
for the things related to documentation.
I am using the Daan's Leijen Parsec library, which seems
to be well designed, documented and reasonably fast.
The motivation for this work is to replace my home
brewed parsing of source files in Haskell Module
Browser (or rather a sophisticated "grepping") - which
I currently do in Smalltalk - by a Haskell version. I do
not believe that I will gain much on speed here (Hugs
implementation will be probably much slower than the Squeak's
one) but the idea is to move as much code as possible from
the Squeak's to the Haskell's side in order to create
a support code which could benefit other people wishing to
interface such browsers to systems other than the Squeak.
I think this information is relevant to our discussion
and could help in clarifying some issues and provide
some experimental tool.
The parser aims to extract this information from the source
files:
data Module = Module
{ name :: String -- done
, comment :: String -- done
, exports :: [Export] -- chunk for now
, imports :: [Import] -- chunk for now
, fixities :: [Fixity] -- done
, classes :: [Class] -- chunk for now
, instances :: [Instance] -- chunk for now
. categories :: [String] -- chunk for now
, functions :: [Function] -- done
, footnote :: String -- done
}
At the first stage, the parser breaks the source code into
chunks:
type Chunk = [Comment, Code]
and then examines each chunk to convert it to one of the
above specified entities. For example, the Function datatype
is defined as:
data Function = Function
{ funName :: String
, funSignature :: Signature
, funBody :: String
}
The good news is that the parser is able to deal with
any positional placement of comments. For example,
when it deals with functions it considers any one or all
(concatenating all of them) the following comment options:
+ Many "--" comments or "{- .. -}" comment before the signature
x Signature
+ Many "--" comments or "{- .. -}" comment after the signature
x First line of function body
+ Many indented "--" comment lines
x Indented function body
Similar pattern applies to other entities. But in order
of this positional approach to work I had to admit
a concept of a category (known and cherished in Smalltalk,
Objective C, Eiffel). In Haskell case, a special banner
separates groups of functions. If this is not indicated
somehow then the banner would become a part of the
comment of the entity that follows it (wrong, but not
catastrophic).
It seems, after all, that I was not entirely correct in
one of my previous posts - an intelligent parser can
cope with a purely positional layout, given a bit of help
related to definition of category delimiters.
I should have remembered this, because I've done similar
parsing for Xcoral browser for Java.
I thought that this would be a helpful information
for our discusion. I'll post the code when it's ready.
Jan