[Yhc] Regarding Yhc bytecode versioning

Sun Oct 29 15:51:36 EST 2006

Yhc hackers,

Several weeks ago I received a report that my bytecode library for  
Yhc was not working correctly.  I investigated the matter and  
discovered several reasons for the problem.  I've fixed the bugs and  
released a new version of the library.  The problems were:

* Idiocy on my part.  I somehow managed to get the minor version of  
the Yhc bytecode set wrong.  I thought it was 9, but it actually is 10.
* Compatibility-breaking changes to the bytecode file format.

The second problem is the one I wish to discuss.  Since I began work  
on the Yhc bytecode library in May, there have been at least two  
instances of compatibility-breaking changes.  One relates to Hat  
integration, I believe, and the other has to do with the switch to  
libFFI.  Both of these changes were made without bumping the version  
number that appears in the file header.

I would like to suggest that such changes be avoided in the future.   
It seems to me that Yhc is fairly rapidly approaching a feature- 
complete release, and I think we should start thinking pretty  
seriously about stability issues.  If it becomes necessary to somehow  
modify the file format, then I feel that we should be careful to  
document the changes, and be sure to bump the version number.  That  
way we can rely on the stated version number to reliably identify the  
proper parsing procedures for a bytecode file.  Without this basic  
guarantee, it becomes very difficult to achieve interoperability.

It may also be a good time to think about a ways to future-proof the  
file format so that future additions can be made without breaking  
compatibility.  Right now the format is quite fragile.  Perhaps we  
could take inspiration from the Java classfile format.  The basic  
idea is that there are named blocks of data with a minimal header  
which gives the name of the block and the size of the data payload.   
The name of the block defines the meaning of the data.  Eg, the  
'CODE' block contains bytecode instructions, etc.  If any block is  
encountered with an unrecognized name, it is ignored.  That way, one  
can have optional blocks, or one can add blocks without breaking  
compatibility.  One can also have optional information (like  
debugging symbols) and things of that nature.

What do you think?

Rob Dockins

Speak softly and drive a Sherman tank.
Laugh hard; it's a long way to the bank.
           -- TMBG