[Haskell-cafe] Re: ANNOUNCE: Harpy -- run-time code generation
library
apfelmus
apfelmus at quantentunnel.de
Wed May 16 06:54:08 EDT 2007
Dirk Kleeblatt wrote:
> apfelmus wrote:
>> Dirk Kleeblatt wrote:
>>> apfelmus wrote:
>>>> I also think that having liftIO in the CodeGen-monad is plain wrong. I
>>>> mean, CodeGen is a monad that generates code without any execution
>>>
>>> note that runCodeGen runs the code _generation_, executing the
>>> generated code is done _within_ the CodeGen monad via the functions
>>> generated by callDecl (or the predefined functions in the Harpy.Call
>>> module). This is even more intertwined, but intentional.
>>
>> Huh? That means that code gets executed during it's own generation? But
>> why do you mix separate concerns? I don't see what use this is besides
>> being an opportunity to mess up.
>
> One of our projects is a just-in-time compiler for a functional
> language. Here, compilation is done lazily: A starting stub is compiled
> and executed, when the execution reaches some point for which no code
> has been generated yet the next bit of code is compiled, and the program
> is resumed, and so on. So execution and code generation are interleaved.
>
> Another project is related to dependent type checking as described by
> Benjamin Grégoire and Xavier Leroy in [1]. Here, functions can occur in
> types, and the cited paper describes how these functions can be executed
> by compiled code. During type checking. So type checking, code
> generation, and execution are interleaved.
>
>>> Of course, again a different design is possible, making runCodeGen
>>> return a binary code object, that can be called from the IO monad. But
>>> then, the user has to care about releasing code buffers, and not to have
>>> unevaluated closures having code pointers to already released run-time
>>> generated code.
>>
>> Huh, why does the user have to care? Shouldn't wrapping the raw memory
>> block into a Foreign.ForeignPtr do the job?
>
> Well, both projects manage a separate memory block which is used like a
> heap by generated code. This heap contains closures that have code
> pointers into generated code. And these are subject to our (not yet
> implemented... ;-) ) garbage collectors, not the Haskell collector.
Ah, I think the memory organization issues are the driving force for
liftIO in the CodeGen-monad, not the possibility to interleave code
generation and execution. This is because you can always interleave code
generation and execution with the "binary code object"-design as well
like in
do
codeobject <- runCodeGen code1
result <- exec codeobject
codeobject2 <- runCodeGen (code2 result)
result2 <- exec codeobject2
...
Note that the generated second code may well depend on the result gained
from executing the first.
However, you currently don't want free-floating binary code objects
because you want to manage memory yourself. In other words, you carry
around a single-threaded memory agglomeration that contains code and
data, i.e. you're working in a monad
StateT Memory IO a
because only that can guarantee that there exists only a single instance
of the memory agglomeration. (In fact, you have to use IORefs but that's
not essential).
Still, I think that writing opcodes somewhere and memory managing this
somewhere are separate concerns. I'd separate them by splitting the
CodeGen monad into a monad (I'll call it "Executor") over IO that
threads the mentioned memory agglomeration and a data type that
represents x86 assembly (hereby called "Code"). It is convenient to have
Code being a monad too, but it's not necessary and conceptually wrong.
Given this separation, you can support *both* designs simultaneously:
- calling x86 assembly inside the Executor
foo :: Code -> Executor ()
foo code = do
buf <- newBuffer (size code)
writeCode buf code
exec buf
freeBuffer buf
- calling code from binary code objects
bar :: Code -> IO ()
bar code = do
(f :: IO ()) <- mkFunction code
f
The first one is for your intended applications. The second one is a
simple way to outsource performance critical parts of a Haskell programs
to assembly, something some people from this list are probably very
interesting in.
Last but not least, here are more details about Code and why it's not
really a monad.
The basic operation on Code is to glue two Code pieces together in sequence
append :: Code -> Code -> Code
In an assembly language without jumps, that's all about it besides
primitive instructions like
bite :: Code
bark :: Code
and so on. Thus, Code is at least a monoid.
Now, jumps are what make Code special, we need a way to reference code
positions, aka labels. Let's assume a primitive
jump :: Label -> Code
One way is to base it on the following two operations
append' :: Code -> (Label -> Code) -> Code
ccall :: (Label -> Code) -> Code
The function append' supplies the position of it's first argument to the
second so that the second can reference it. The function ccall supplies
the position after the end of the code block to the code block itself,
just like "call with current continutation" would do. The robodog
example from last time then becomes
pluto :: Code
pluto =
bark
`append` ccall (\loop -> (jump loop) `append` bite)
`append'` bark $ \loop ->
bark
`append` (jump loop)
Of course, making Code = Code Label a MonadFix is more convenient (and
I even think that the above primitives are not enough for multiple
cross-jumps). In fact, append' is very much like >>= and ccall like
mfix. In any case, I wanted to show that Code is weaker than monad. From
another point of view, jumps make Code a specification of graphs.
Regards,
apfelmus
More information about the Haskell-Cafe
mailing list