[Haskell-cafe] static analysis of a C-like language

Wed Nov 21 21:01:08 UTC 2018

Dear Cafe, 

I want to build a linter for a C-like language. My company uses a UI system that comes with a C-like language that is used to provide functionality to the UI elements. The language is interpreted at runtime. The user, however, has no access to the built-in parser and checking functions. Only a basic syntax check comes with their editor. Therefore many stupid mistakes only show at runtime and can not be tested for statically. 
The language is in many respects even simpler than C. No structs, no enumerations, only a handful of basic data types and arrays thereof, plus pointers and the usual control flow structures. On the other hand, there are some features that make static analysis near impossible, e.g. the possibility of dynamic registration of variables. Higher-order types are emulated by passing the name of other functions as strings. There is explicit referencing of other sources (#include statements) as well as implicit referencing within UI elements and within the project hierarchy. 

We want to catch basic programming mistakes - misspelled variables, missing return statements, usage of variables before assignment, wrong number of function arguments/types. A project's code is scattered across multiple files and layers - a function declaration can mask a declaration from the layer below. Therefore, a per-file-linter will not suffice. 
As a side-effect this may also yield a documentation generator (boy do I miss haddock while working with that), which is also missing in the system. 

I have never done something like this and will seek help from local CS departments (If you are a CS student in Paderborn or Bielefeld, Germany, get in touch!). My company would be willing to sponsor a Bachelor or Master's thesis project. Ultimately this will be comercialized as an add-on, so anyone who helps will also profit from the result financially. The user community is not large, but includes big names such as Siemens, London Heathrow, the Metros in NYC and Bilbao, and CERN. 

What kind of expertise shall I look for? Compiler theorists? What information shall I request from the language author? I feel that Haskell might be a good language to represent the AST in. But I am overwhelmed by the numerous AST libraries on hackage. 

How similar is my problem to what is covered by, say, haskell-src-exts, hlint, alex or syntactic? Can I re-use code from these projects? 

What solutions are out there for dependency resolution between source files? 

I suppose for informative error messages I need to decorate each semantic bit with information about
- source file and location
- what is in scope at that position, plus the source of the symbols in scope
What data structures can be used to build and represent this information? 

Any pointers welcome.
Olaf