[Haskell-cafe] Hierachical abstraction
Andrew Coppin
andrewcoppin at btinternet.com
Thu Jan 28 14:42:55 EST 2010
OK, this is going to be a long one...
Let's start with parsers. A monadic parser library (e.g., Parsec) gives
you a set of primitive parsers. It also supplies functions to construct
more complex commonly-used parsers. And then it provides functions for
constructing even more complex parsers. (E.g., Parsec's expression
parser builder.)
Let's call the most basic primitive parsers level-0. Then we have
level-1 functions, which construct parsers from level-0 parsers. And
later we have level-2 functions which call the level-1 functions to
construct even more sophisticated parsers. And so on and so forth, to
unlimited depth. (The parser library itself has a limited depth, but the
library can have clients that build functions on top of functions.)
And so, in the end, the user calls a level-64 function, and gets a
parser that does the required task. And that's great.
The only problem is, the parser you get is a black box. You can't see
what's inside it or what it's going to do. You can only run it. (Or
combine it with other parsers.)
Why is that a problem? Well, suppose you want to pass the finished
parser through some kind of optimiser to rearrange its internal
structure. Ah. You can't. It's a black box.
Veering off onto a tangent for a moment... It seems to me that this is
an inevitable consequence for any monadic parser. After all, the bind
method runs some parse primitive, but it then feeds the result to a
general function, which can undertake some arbitrarily complex
Turing-complete calculation to decide what parse primitive to run next.
Hell, I can write a parser that parses a description of a grammar,
dynamically constructs a parser for that grammer, and then runs it!
There's no way in hell that some static optimiser can hope to optimise
the meta-parser so that it always generates optimal parsers. Even in
theory it's utterly impossible.
I wonder if you can make a parser combinator library which is *not*
monadic? And, if you could, in what way would that limit the kinds of
things you can parse?
Anyway, coming back on topic... Let's suppose that instead of parsers,
we've interested in queries. Suppose I'm writing some kind of relational
database engine library, and I want to optimise queries before running them.
The obvious solution is to design some kind of algebraic data type which
can represent every possible level-0 query primitive. I can then write
level-1 functions that construct level-0 data structures, and level-2
functions that call the level-1 functions, and so on, and at the end the
caller will get a (presumably complex) level-0 description, which my
optimiser can then go away and chew on.
Now suppose I want to perform optimisations on the higher-level
structure of the query. Currently, I can't. The hierachy of abstractions
gets flatterend to level-0 in the process of the call stack, so my
optimiser can only see the final level-0 representation.
The obvious solution would be to design an algebraic data type for every
level of the hierachy you want to analyse, complete with a function for
converting to the next level down. As I understand it, this is how GHC
works. It starts with raw Haskell source code, which it mashes around,
and eventually converts into Core. The Core representation is then
fiddled around with further, and eventually turned into STG. The STG
gets poked and prodded, and converted to C--. And that eventually
becomes raw assembly.
The thing is, these are all radically different representations of the
program. You can't mix (say) Haskell and STG together. And nor would you
want to, given that they're such wildly different abstractions.
[I notice in passing that, apparently, you can now FFI into C--, which
is interesting...]
However, consider the initial Haskell input. Initially it reflects what
the programmer typed in. But GHC gradually desugars this into some
minimal subset of the full Haskell language, before actually converting
it to Core. That means that [picking an example out of thin air]
initially an expression might contain the if/then/else construct, but at
some point [I presume] this is desugared into a case expression. And
that must mean that there are parts of the GHC codebase where it is
permissible for an expression to contain if/then/else, and other parts
where this is not permissible (and the code expects this construct to
not be present).
Is it possible to encode this kind of expectation into the type system?
Or does GHC just tacitly rely on the fact that source code takes a
predetermined path through the compiler engine? (That works for a
monolithic product like a compiler, but it wouldn't be so hot for a
library that 3rd parties can poke and prod arbitrary data into.)
I'm basically looking at a problem which is like this. There are
primitive constructs, and there are more sophisticated abstractions
built from these, and I would like to come up with a system where
abstractions are first-class, new abstractions can be added, and for any
particular data structure, you can statically guarantee which
abstractions are or aren't present.
Has anybody ever looked at a general solution to something like that?
[I'm going to stop talking now...]
More information about the Haskell-Cafe
mailing list