[Haskell-cafe] static analysis of a C-like language

Holger Siegel holgersiegel74 at yahoo.de
Thu Nov 22 13:21:38 UTC 2018


Hello Olaf,

to me that sounds as if you want to do an abstract interpretation with a 
forward collecting semantics that employs non-relational abstract 
domains for the primitive data types and summarizes the dimensions of 
arrays. That is what the Java compiler does when it analyzes whether a 
variable is 'effectively final' (except that it doesn't even have to 
summarize arrays for that), and it should suffice for the kind of bugs 
you want to find.

I would start by writing a simple interpreter for the language to be 
analyzed. That way you fix messy details before they bite you, e.g. the 
order in which submodules are loaded and initialized. You also get a 
clear formalization of the semantics, and you can annotate the syntax 
tree and implement informative error messages for the bugs you want to 
detect. You don't even need to implement every primitive function - it 
suffices that they return some random value, provided that the intended 
return value is one of the possible values. Lifting the interpreter to 
an abstract interpreter can then be done in a canonical way.


Regards,

Holger


Am 21.11.2018 um 22:01 schrieb Olaf Klinke:
> Dear Cafe,
>
> I want to build a linter for a C-like language. My company uses a UI system that comes with a C-like language that is used to provide functionality to the UI elements. The language is interpreted at runtime. The user, however, has no access to the built-in parser and checking functions. Only a basic syntax check comes with their editor. Therefore many stupid mistakes only show at runtime and can not be tested for statically.
> The language is in many respects even simpler than C. No structs, no enumerations, only a handful of basic data types and arrays thereof, plus pointers and the usual control flow structures. On the other hand, there are some features that make static analysis near impossible, e.g. the possibility of dynamic registration of variables. Higher-order types are emulated by passing the name of other functions as strings. There is explicit referencing of other sources (#include statements) as well as implicit referencing within UI elements and within the project hierarchy.
>
> We want to catch basic programming mistakes - misspelled variables, missing return statements, usage of variables before assignment, wrong number of function arguments/types. A project's code is scattered across multiple files and layers - a function declaration can mask a declaration from the layer below. Therefore, a per-file-linter will not suffice.
> As a side-effect this may also yield a documentation generator (boy do I miss haddock while working with that), which is also missing in the system.
>
> I have never done something like this and will seek help from local CS departments (If you are a CS student in Paderborn or Bielefeld, Germany, get in touch!). My company would be willing to sponsor a Bachelor or Master's thesis project. Ultimately this will be comercialized as an add-on, so anyone who helps will also profit from the result financially. The user community is not large, but includes big names such as Siemens, London Heathrow, the Metros in NYC and Bilbao, and CERN.
>
> What kind of expertise shall I look for? Compiler theorists? What information shall I request from the language author? I feel that Haskell might be a good language to represent the AST in. But I am overwhelmed by the numerous AST libraries on hackage.
>
> How similar is my problem to what is covered by, say, haskell-src-exts, hlint, alex or syntactic? Can I re-use code from these projects?
>
> What solutions are out there for dependency resolution between source files?
>
> I suppose for informative error messages I need to decorate each semantic bit with information about
> - source file and location
> - what is in scope at that position, plus the source of the symbols in scope
> What data structures can be used to build and represent this information?
>
> Any pointers welcome.
> Olaf
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> To (un)subscribe, modify options or view archives go to:
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> Only members subscribed via the mailman list are allowed to post.



More information about the Haskell-Cafe mailing list