[Haskell-cafe] Modelling structurally subtyped and cyclic data in type safe Haskell

Patrick Chilton chpatrick at gmail.com
Mon Dec 12 07:40:00 UTC 2016


Hi Aura,

I recently added a typed module for exactly this data in clang-pure:
https://hackage.haskell.org/package/clang-pure-0.2.0.2/docs/Language-C-Clang-Cursor-Typed.html

As in libclang, the AST is represented as a tree of Cursors, which have a
CursorKind such as StructDecl, CallExpr etc.
In Haskell it's possible to lift this CursorKind to the type level, so you
know at compile-time what kind of Cursor you're dealing with.
This is useful because only certain kinds of Cursors have certain types of
properties, for example source extent, spelling etc.

You can use

matchKind :: forall kind. SingI kind => Cursor -> Maybe (CursorK kind)

to convert an untyped Cursor into a Cursor with a particular known kind.
Then, if you search the AST and only look at cursors where this function
returns Just,
you know that the given cursor has certain properties you expect.

Patrick

On Thu, Dec 8, 2016 at 1:19 PM, Aura Kelloniemi <kaura.dev at sange.fi> wrote:

> Hello
>
> This is my first post to Haskell Café. I am a hobbyist programmer who has
> lots
> of experience with imperative and OO languages ranging from 80286 assembly
> to
> Ada and Ruby. Last four years I've been learning Haskell, which has been a
> painfully educational experience - I have a good understanding about how
> CPUs
> work, but very little understanding about maths. This is a very long post,
> I apologize.
>
> Now I'm trying to create a tool which automatically generates foreign
> function
> interface (FFI) imports from C and C++ to Haskell. I have written a GCC
> plugin
> which extracts all relevant (and lots of irrelevant) information from GCC's
> TREEs and dumps the output as JSON. It also dumps preprocessor macros
> which is
> one of the reasons this tol is being given birth.
>
> The tool is used like Shake - import Development.Familiar and write code
> which
> glues all relevant data processing and FFI generation functions together.
>
> This is also going to be a reusable library which others can use for
> whatever
> they find the data to be useful for. This could include making FFI
> bindings to
> other languages. Some code analysis tools also might find the data useful.
>
> I'm having a hard time importing this data to Haskell in a type safe way.
> GCC's TREEs are structured in an OO way. I have tried to find a more
> functional way of representing them, but so far with no luck.
>
> The JSON data is typed, and the hierarchy is exactly the same as in GCC.
> Below
> I construct a tree of the hierarchy where I list a few of possible types
> and
> very few of the fields the data types contain. For listing fields and their
> types I use a syntax similar to Haskell. This shortened listing should be
> enough for illustration:
>
> * macro: (data fields include: name :: string, tokens :: [token], etc.)
> * tree: all nodes have a unique ID to be able to recover the cycles
>     among the node graph; (id :: integer)
> * * translation_unit: (declarations :: [top_level_declaration])
> * * constant: a compile-time constant value; (type :: type)
> * * * complex_constant: (value :: complex)
> * * * integer_constant: (value :: integer)
> * * namespace_decl: (name :: identifier,
>                      declarations :: [top_level_declaration])
> * * declaration: (name :: identifier, type :: type)
> * * * const_decl: One component of enum type; (value :: type)
> * * * function_decl: (type :: function_type, params :: [param_decl],
>                       result :: result_decl)
> * * * type_decl: (no additional fields)
> * * type: (declaration :: type_decl, name :: identifier,
>            size :: integer_constant, alignment :: integer,
>            qualifiers :: [qualifier], completeness :: bool)
> * * * array_type: (element_type :: type, element_count :: integer)
> * * * numeric_type: (precision :: integer, signed :: bool,
>                      min_value :: integer_constant,
>                      max_value :: integer_constant)
> * * * * enumeral_type: (values :: [const_decl])
> * * * function_type: (param_types :: [type], result_type :: type)
> * * * pointer_type: (referred_type :: type)
> * * * record_type: struct (C or C++)/union;
>         (base_types :: [(type, access_info)],
>          fields :: [field_decl or const_decl or type_decl],
>          methods :: [method_decl])
> * reference: Any node may be replaced by a reference. They can be
>     forward or backward references. (referred_id :: integer)
>
> As you may see, tree nodes form many many cycles together. Subclasses
> sometimes provide stronger invariants for their fields than the parent
> classes. For example all declarations are associated with a type and for
> function_decl nodes this type is always a function_type.
>
> Now to my question. How to represent this in Haskell. Because of the
> enormous
> amount of fields per tree node, I certainly need to use records.
>
> -- Haskell follows:
> data BaseTree = BaseTree { id :: Int, ... ... }
> data BaseDecl = BaseDecl { baseTree :: BaseTree, name :: Identifier,
>                            declType :: Type, ... }
> data ConstDecl = ConstDecl { baseDecl :: BaseDecl, value :: ConstValue }
> data ConstValue = ComplexConst Complex | IntegerConst Integer
> data TypeDecl = TypeDecl { baseDecl :: BaseDecl }
>
> But... What shall I do about the heterogenous lists which many nodes
> contain.
> I wish I don't need to make myriards of ADTs for them, like:
>
> data AnyTopLevelDecl = ATLDConst ConstDecl
>                      | ATLDFunction FunctionDecl
>                      | ATLDNamespace NamespaceDecl
>                      | ATLDType TypeDecl -- ...
> data StructFieldDecl = SFDConst ConstDecl
>                      | SFDField FieldDecl
>                      | SFDType TypeDecl
>
> And so forth, basicly each heterogenous list gets its own ADT, because
> there
> is often variance in which node types can appear in the lists.
>
> I could use one big ADT like AnyDecl. However, to me it would feel like a
> bad
> design, if my types allowed RecordType nodes to reference NamespaceDecls,
> and
> therefore I don't want to pack all nodes into one type, unless I could use
> some type parameters or constraints to restrict the possible values the ADT
> could take.
>
> At first I thought that the following could solve my problem:
>
> data SomeDecl a = SomeDecl where
>     SomeConst :: ConstDecl -> SomeDecl ConstDecl
>     SomeField :: FieldDecl ->SomeDecl FieldDecl
>     SomeFunction :: FunctionDecl -> SomeDecl FunctionDecl
>     SomeType :: TypeDecl -> SomeDecl TypeDecl
>
> -- and then I would use type classes like this
> class TopLevelDecl decl where
>     -- No methods needed
> instance TopLevelDecl ConstDecl
> instance TopLevelDecl FunctionDecl
> instance TopLevelDecl TypeDecl
>
> -- And then I could write functions like this:
> handleTopLevelDecl :: TopLevelDecl d => SomeDecl d -> Whatever
> handleTopLevelDecl (FunctionDecl f) = -- ...
> -- and I'd list all cases which have a TopLevelDecl instance, but not
> -- those which don't have. I also thought that my data types could look
> like
> -- this:
>
> data NamespaceDecl where
>     NamespaceDecl :: {
>         baseTree :: BaseTree,
>         forall a. TopLevelDecl a => declarations :: [a]
>         }
>
> Obviously this does not work, because type classes are open, and even
> though
> GHC prevents me from calling handleTopLevelDecl with (SOmeDecl FieldDecl),
> it
> still complains if I don't define an equation for it, because in some other
> module somebody might make a TopLevelDecl instance for FieldDecl.
>
> Now I would like to know if there is a way to solve my problem in an
> elegant
> way, e.g. by using type families.
>
> My second question is how to resolve the cyclicity of the node graph. I
> don't
> want to tie the knot, because I want to be able to manipulate the graph.
>
> Maps are one option, yes:
>
> data AnyNode = (list all possible nodes)
> type Id = Int
> type NodeIdMap = Map Id AnyNode
>
> -- Then my data types refer to other nodes like this:
> data FunctionDecl = FunctionDecl {
>     parameters :: [Id]
>     }
>
> Now I lose all type safety. I would have to insert a run-time check to
> every
> dereference of an Id if I want to make sure that function parameter
> declarations don't contain StringConstants.
>
> I've also considered using STRefs. During JSON parsing I would need to
> populate the nodes with the node Ids they refer to and then have a separate
> pass which turns all Ids to typed STRefs. At least with this approach I
> would
> get all possible run-time errors straight after parsing, and not when the
> data
> is used.
>
> I'm again sorry for this long elaboration. However I think that if you
> consider answering me you have a pretty good understanding of my problem
> and
> also a good knowledge of my level of understanding Haskell. I would like to
> unlearn OO-style thinking, and I would want to find a data and type
> representation which feels Haskellish. However I don't want to cut down the
> data. I want all information to be accessible in case somebody needs it.
>
> Thank you in advance!
>
> --
> Aura
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20161212/80a5d474/attachment.html>


More information about the Haskell-Cafe mailing list