[Haskell-cafe] static analysis of a C-like language

Thu Nov 22 13:26:38 UTC 2018

What is the canonical way to lift an interpreter to an abstract
interpreter, while ensuring convergence ---figuring out how and when to
widen?

Thanks
Siddharth

On Thu, 22 Nov, 2018, 18:52 Holger Siegel via Haskell-Cafe, <
haskell-cafe at haskell.org> wrote:

> Hello Olaf,
>
> to me that sounds as if you want to do an abstract interpretation with a
> forward collecting semantics that employs non-relational abstract
> domains for the primitive data types and summarizes the dimensions of
> arrays. That is what the Java compiler does when it analyzes whether a
> variable is 'effectively final' (except that it doesn't even have to
> summarize arrays for that), and it should suffice for the kind of bugs
> you want to find.
>
> I would start by writing a simple interpreter for the language to be
> analyzed. That way you fix messy details before they bite you, e.g. the
> order in which submodules are loaded and initialized. You also get a
> clear formalization of the semantics, and you can annotate the syntax
> tree and implement informative error messages for the bugs you want to
> detect. You don't even need to implement every primitive function - it
> suffices that they return some random value, provided that the intended
> return value is one of the possible values. Lifting the interpreter to
> an abstract interpreter can then be done in a canonical way.
>
>
> Regards,
>
> Holger
>
>
> Am 21.11.2018 um 22:01 schrieb Olaf Klinke:
> > Dear Cafe,
> >
> > I want to build a linter for a C-like language. My company uses a UI
> system that comes with a C-like language that is used to provide
> functionality to the UI elements. The language is interpreted at runtime.
> The user, however, has no access to the built-in parser and checking
> functions. Only a basic syntax check comes with their editor. Therefore
> many stupid mistakes only show at runtime and can not be tested for
> statically.
> > The language is in many respects even simpler than C. No structs, no
> enumerations, only a handful of basic data types and arrays thereof, plus
> pointers and the usual control flow structures. On the other hand, there
> are some features that make static analysis near impossible, e.g. the
> possibility of dynamic registration of variables. Higher-order types are
> emulated by passing the name of other functions as strings. There is
> explicit referencing of other sources (#include statements) as well as
> implicit referencing within UI elements and within the project hierarchy.
> >
> > We want to catch basic programming mistakes - misspelled variables,
> missing return statements, usage of variables before assignment, wrong
> number of function arguments/types. A project's code is scattered across
> multiple files and layers - a function declaration can mask a declaration
> from the layer below. Therefore, a per-file-linter will not suffice.
> > As a side-effect this may also yield a documentation generator (boy do I
> miss haddock while working with that), which is also missing in the system.
> >
> > I have never done something like this and will seek help from local CS
> departments (If you are a CS student in Paderborn or Bielefeld, Germany,
> get in touch!). My company would be willing to sponsor a Bachelor or
> Master's thesis project. Ultimately this will be comercialized as an
> add-on, so anyone who helps will also profit from the result financially.
> The user community is not large, but includes big names such as Siemens,
> London Heathrow, the Metros in NYC and Bilbao, and CERN.
> >
> > What kind of expertise shall I look for? Compiler theorists? What
> information shall I request from the language author? I feel that Haskell
> might be a good language to represent the AST in. But I am overwhelmed by
> the numerous AST libraries on hackage.
> >
> > How similar is my problem to what is covered by, say, haskell-src-exts,
> hlint, alex or syntactic? Can I re-use code from these projects?
> >
> > What solutions are out there for dependency resolution between source
> files?
> >
> > I suppose for informative error messages I need to decorate each
> semantic bit with information about
> > - source file and location
> > - what is in scope at that position, plus the source of the symbols in
> scope
> > What data structures can be used to build and represent this information?
> >
> > Any pointers welcome.
> > Olaf
> >
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > To (un)subscribe, modify options or view archives go to:
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> > Only members subscribed via the mailman list are allowed to post.
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.

-- 
Sending this from my phone, please excuse any typos!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20181122/92a40d62/attachment.html>