extent strings

Mon May 14 17:21:45 EDT 2001

As the extent strings acquire more structure, I'm starting to wonder
if encoding the information in strings is the right approach.  What's
the advantage of putting the information in a string?

1) All Haskell parsers will be able to read any ffi-ified code even
   if they don't support the language binding the ffi supports.

   But what does this achieve if the compiler backend doesn't support
   that ffi?

2) Avoids problems trying to forcefit Java/C/C++/... identifiers/keywords
   into Haskell and avoid the (ever-growing) set of Haskell keywords.

   But this could be done by using "" as an escape mechanism whenever
   there's a conflict.  For example, suppose I wanted to foreign import
   a function called "class" using currently implemented ffi syntax.
   I just put "" round the offending identifier and I'm done:

     foreign import "class" clazz :: <whatever>

3) Might be easier to generate good error messages when the language
   just isn't supported by that compiler.

   As long as the "language/calling convention" keyword appears near
   the start of the foreign decl, I think this is easy to do without
   packing everything into strings.

What's the disadvantage of having highly structured strings as part of
the ffi spec?  There's no big technical issue that I know of but as
the strings get more complex (i.e., as we start to write a grammar for
them), alarm bells start to go off in my head:

o Having a grammar for parsing strings seems to break the classic
  language structure of:

    lexical structure
    context free syntax
    static semantics
    dynamic semantics

o Reminiscent of C++'s extern "C" declarations.

  In itself this isn't necessarily bad - but every time I see
  one of those decls, the word "ugly hack" springs to mind.
  I suspect that making these strings even more structured would
  make the hack look worse.

o It seems simpler to have just one grammar for Haskell rather than to
  split it into two separate grammars.  By this, I mean, two logically
  separate grammars (as in the current proposed syntax) instead of two
  physically separated grammar (as, for example, if someone were to
  provide a grammar for the C++ ffi binding as appendix G to the
  report).

Is anyone else concerned about this or am I just pissing in the wind?

-- 
Alastair Reid        reid at cs.utah.edu        http://www.cs.utah.edu/~reid/