[Haskell-cafe] Implementing PADS in Haskell?
David Walker
dpw at CS.Princeton.EDU
Mon Jun 30 11:41:10 EDT 2008
Simon writes:
> It's not clear to me
>
> a) whether or not you want new syntax (requires writing a parser)
we are starting out our implementation without special syntactic
support, but I do anticipate it will be useful primarily because the
implementation I envision will need to generate both datastructures and
type declarations. I don't quite see how a combinator library, for
instance, can do that. For example, it is very nice for the user to
declare a pads datatype like this:
pdata Pairs =
Myints (Pint, /|/, Pint)
| Mychars (Pchar, /|/, Pchar)
I would hope the effect of such a declaration would be to do two
things. First, generate declaration of a Haskell datatype:
data Pairs =
Myint (int, int)
| Mychars (char, char)
and second, generate an instance of the GADT (type Pads a where a is the
type of the parsed data) that we are planning to use to represent the
PADS descriptions. Functions such as parse, print, toxml, etc. perform
walks over the GADT instance. eg:
Pairs :: Pads Pairs
Pairs = ... bunch of constructor applications to build the GADT
representing Pairs ...
If I just wanted to build the GADT data structure, I can see how I could
define a collection of combinators directly in Haskell to do that.
However, I want to build a data structure *and* declare a type. And I
want programmers to be able to refer to and use both. I do not see how
to get around that.
In addition, in order to include arbitrary new datatypes in the GADT
that we are using to represent pads, it seems necessary to me at the
moment to pull a bit of a hack. For each branch of the datatype, I need
to wrap up injection and projection from the datatype in a pair of
functions and pass that to a generic GADT constructor. Doing that by
hand would seem to incur quite a bit of programming overhead and detract
from the pleasure of using datatypes. It may be hard to follow what I
am talking about... if people were interested, I could post the GADT
definition we are working with in a few days when it is a little bit
more settled...
In addition to the argument about generating type declarations and data
structures at the same time, it would seem to me that doing things like
being able to put constant literals and regular expressions inline in a
custom syntax will improve the readability and useability of the
system. However, this argument is not as strong -- it is definitely
true of PADS for C and also, I think for O'Caml, but perhaps if we are
creative we can figure out a way to fit what we want elegantly into
Haskell syntax.
> Concerning (b), if you want to write a program that generates one or
more Haskell programs, then
> Template Haskell is good. But if you can get away with just one
program, then you may not
> need that extra complexity. Many domain-specific languages embedded
in Haskell use just
> Haskell (e.g. Parsec, Fran, Yampa...), not TH nor quasiquotation.
By the way, when you use the terminology "embedded domain-specific
language" is that really just a synonym for "library"? Is there any
difference between an "embedded domain-specific language" and an
ordinary library?
Thanks,
Dave
PS: it does not look like my email is getting to the Haskell-cafe
mailing list, especially from my gmail account. Is the list moderated?
Can that cause a delay? Does this list filter email if the reply-to
field is different from the sender? My gmail account is
princedpw at gmail.com with a reply-to of dpw at cs.princeton.edu ... I signed
up a second time from my princeton account with email address
dpw at cs.princeton.edu and same reply-to ...
--------------
---------------
hi,
I sent the message below to the template Haskell mailing list and it
was suggested that I send it to the more general Haskell mailing list.
After my question, I attached a few responses I have received.
Thanks in advance for any suggestions.
Dave
------------------------
I am new to Haskell and even more so to Haskell templates -- if there
is an obvious answer to my question, please just point me at the
relevant URL. Thanks! (If you think I should email this query to
a broader Haskell mailing list let me know too.)
My friends and I are thinking about adding an extension to Haskell to
implement a variant of the PADS domain-specific language. PADS is a
convenient syntax to describe a file format and generate a bunch of
tools for it such as parser, printer, xml translator, semi-structured
query engine, statistical profiler, etc. (We also have tools to
automatically infer descriptions from example data but that's not
quite the point here.) See www.padsproj.org/ <http://www.padsproj.org/>
<http://www.padsproj.org/> for lots more info and
examples of PADS for C and PADS for O'Caml.
The bottom line question is I'm wondering if I can or should implement
my PADS extension using Template Haskell to support the syntactic
extensions. If not Template Haskell, is there another Haskell toolkit
I should know about? If Template Haskell isn't powerful enough, I'll
probably just write a standalone compiler that parses my special
syntax and spits out a file full of Haskell declarations (this would
be a slight shame because then I would probably not support writing
PADS declarations in the midst of other Haskell code). I've looked at
simple Template Haskell examples from the Haskell wiki, such as the
printf example, but I need to introduce a lot more syntax than that ...
What do I need from Template Haskell? Well what I would like to do is
add a bit of syntax to Haskell that allows users to define what looks
like non-standard type declarations. The "compiler" for these
non-standard type declarations will generate a collection of ordinary
Haskell declarations that can be used by a programmer. The
non-standard typing declarations will be a mixture of what looks like
Haskell type declarations, Haskell code, perl-style syntax for regular
expressions and perhaps some other stuff I'm not thinking of right
now. For example, suppose I want to write a program that converts a
file that contains a list of friends into XML (it's a bit of an
artificial example so I can illustrate several features):
3
Simon;Peyton Jones
John;Launchbury
#Phil;Wadler
The format starts with a number on the first line to indicate the
number of entries
in the file. Each line contains one entry which is a first name then a
';' then
a last name. Some lines begin with a # -- they are commented out
because I'm
no longer friends with them. Naturally, Phil Wadler falls into that
category. :-) !
The Haskell program I want to write to do this would look something like:
-------------------------
{|
pdata Line =
Comment /#.*/
| Name {first::Pstr /;/,
last ::Pstr /\n/}
ptype Friends = (x::Pint, |{ x > 0 }|, /\n/, PlistFW x)
|}
friendsToXML :: String -> XML
friendsToXML s = PADSTools.toXML Friends s
-------------------------
Notice the following syntactic features:
-- {| and |} are delimiters that begin/end pads code (I don't really
care what the delimiters are as long as they are relatively
concise. For instance, if the delimiters were $( ... ), for instance,
that would be fine.
-- inside {| and |}, we can jump back into Haskell by using the delimiters
|{ and |}
-- there are some built-in, pre-defined base types like Pstr (for
parsing and
printing strings) and Pint (for parsing and printing integers), PlistFW
(which
takes an argument x to specify the number of elements in the list)
-- regular expressions /..../ show up in the middle of what would
otherwise be
Haskell type declarations
-- there is syntax for declaring datatype-like things (introduced by
pdata keyword)
-- the declarations are dependent types in that there is
binding (eg: the x::Pint in the "Friends" type -- the x is bound here)
-- variables bound in pads code can then be used in Haskell code nested
inside the declaration (eg: the Haskell expression x > 0 refers to the
variable x, defined in the outer pads code).
-- the outer Haskell code refers to declarations made inside the PADS
code. (eg: the Haskell function "friendsToXML s = PADS.Tools.toXML
Friends s" refers to Friends, where Friends would be bound to a
datastructure generated by the compiler for the pads code. PADSTools
is a module defined a priori that contains a number of functions such
as toXML, parse, print, query, etc.)
Anyway, if you have any comments on how I should handle the syntactic
extensions,
let me know.
Thanks again,
Dave
------------
Ian:
You can't create new syntax with TH; it's just Haskell (plus the TH
syntactic extensions themselves, i.e. $( e ), and a few variants of
[| e |]).
You'd have to encode the info into Haskell syntax somehow, e.g.
something like
$(
pdata "Line" [C "Comment" [], C "Name" [ ("first", ["Pstr", "/;"]),
...
)
or parse them from strings (which isn't nice either, as Haskell doesn't
have nice multi-line strings).
Thanks
Ian
----------------
From Claus Reinke:
well, how about using the new quasiquoting to get HereDocs:-)
{-# LANGUAGE TemplateHaskell #-}
module Here where
import Language.Haskell.TH <http://Language.Haskell.TH>
<http://Language.Haskell.TH>
import Language.Haskell.TH.Quote
here :: QuasiQuoter
here = QuasiQuoter (litE . stringL) (litP . stringL)
then
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TemplateHaskell #-}
import Here
s = [$here| hello
multiline
world
|]
test = lines s
gives us
*Main> test
[" \r"," hello\r"," multiline\r"," world\r"," "]
Claus
------------------
From Simon Peyton Jones:
Dave
It's not clear to me
a) whether or not you want new syntax (requires writing a parser)
b) whether you want a single description to generate multiple
different Haskell programs, or just one
Concerning (a), as Claus says if you want special purpose syntax (ie
not Haskell) then you need quasiquotation. This is only in the HEAD,
not a released GHC, but it works reliably as far as I know.
http://www.haskell.org/ghc/dist/current/docs/users_guide/template-haskell.html#th-quasiquotation
<http://www.haskell.org/ghc/dist/current/docs/users_guide/template-haskell.html#th-quasiquotation>
<http://www.haskell.org/ghc/dist/current/docs/users_guide/template-haskell.html#th-quasiquotation
<http://www.haskell.org/ghc/dist/current/docs/users_guide/template-haskell.html#th-quasiquotation>>
Concerning (b), if you want to write a program that generates one or
more Haskell programs, then Template Haskell is good. But if you can
get away with just one program, then you may not need that extra
complexity. Many domain-specific languages embedded in Haskell use
just Haskell (e.g. Parsec, Fran, Yampa...), not TH nor quasiquotation.
You'd get a 30x wider audience on the Haskell Cafe mailing list, so
yes, I'd try there too. Even jumping to Template Haskell as a
solution may be premature; I'm not sure.
Meanwhile perhaps other TH folk would like to join in?
Simon
______________________________
_________________
Haskell-Cafe mailing list
Haskell-Cafe at haskell.org <mailto:Haskell-Cafe at haskell.org>
http://www.haskell.org/mailman/listinfo/haskell-cafe
<http://www.haskell.org/mailman/listinfo/haskell-cafe>
More information about the Haskell-Cafe
mailing list