[Haskell-cafe] GSoC proposal: Haskell AST-based refactoring and API upgrading tool

Mon Apr 29 12:09:56 CEST 2013

Hi Niklas,
I haven't read the whole proposal as I'm short of time.  But Alan
Zimmerman is doing a lot of work on integrating HaRe with the GHC API
[1].  He is alanz on freenode and a regular in #hspec.

I haven't looked at the code, but maybe it's of interest to you.

Cheers,
Simon

[1] https://github.com/alanz/HaRe/tree/ghc-api

On Mon, Apr 29, 2013 at 02:00:23PM +0800, Niklas Hambüchen wrote:
> I would like to propose the development of source code refactoring tool
> that operates on Haskell source code ASTs and lets you formulate rewrite
> rules written in Haskell.
> 
> Objective
> ---------
> 
> The goal is to make refactorings easier and allow global code changes
> that might be incredibly tedious to do in a non-automated way.
> By making these transformations convenient, we can make it easier to
> maintain clean code, add new features or clean up leftovers faster, and
> reduce the fear and effort to upgrade to newer versions of packages and
> APIs.
> 
> 
> Transformations
> ---------------
> 
> First, here are a few operations you would use this tool for. Some of
> them are common operations you would also do in other programming
> languages, some are more specific to Haskell.
> 
> * Changing all occurrences of "import Prelude hiding (catch)" to "import
> qualified Control.Exception as E"
> 
> * Replacing all uses of a function with that function being imported
> qualified or the other way around
> 
> * Adding a field to data constructor a record, setting user-supplied
> defaults for construction and destruction:
> 
>     -- Suppose you want to change one of these
>     data User = User { name :: String, age :: Int }
>     data User = User String Int
> 
>     -- into one of these
>     data User = User { name :: String, age :: Int, active :: Bool }
>     data User = User String Int Bool
> 
>     -- the refactoring tool could perform, in all relevant locations:
>     show (User name age) = ...
>     show (User name age _) = ...
> 
>     -- and also this transformation:
>     ... u { name = "deleted" } ...
>     ... u { name = "deleted", active = False } ...
> 
>     -- or equivalently with records.
> 
>     -- Special cases could be taken care of as specified, such as
>     --   "whenever an object of [this User type] has
>     --    of its records passed into some function 'email', do this
>     --    now only if the user is active, so modify all relevant code
>     --        email (name u)
>     --    to
>     --        if (active u) then email (name u) else return ()
> 
>     -- Other examples include adding a position counter to attoparsec.
> 
> * Adding a type parameter to a type
> 
>     -- This happens a lot on monad transformer stacks, e.g.
>     newtype MyMonad a b c = MyMonad (ReaderT a (WriterT b ...
> 
>     -- and as you would probably agree on, this is not the most
>     -- comfortable change to make; in big project this can mean
>     -- hour-long grinding.
> 
>     -- It has also recently happened in the basic underlying types
>     -- of packages like conduit and pipes.
> 
> * Adding a new transformer around a monad
> 
> * Addressing problems like mentioned in
> http://blog.ezyang.com/2012/01/modelling-io/:
>   "There is one last problem with this approach: once the primitives
> have been selected, huge swaths of the standard library have to be
> redefined by “copy pasting” their definitions ..."
> 
> * Extracting a value into a let or where clause
> 
> * Renaming a variable, and all its occurrences that are semantically
> same variable (based on its scope)
> 
> * Changing the way things are done, such as:
> 
>     * Replacing uses of fmap with <$>, also taking care of the
>       corresponding import, and such cases were partial application
>       is involved
> 
>     * Replacing uses of "when (isJust)" to "forM_"
> 
> * Making imports clearer by adding all functions used to the file to the
> import list of the module that gets them in scope
> 
> * Finding all places where an exported function does not have all its
> arguments haddock-documented.
> 
> * Performing whole-project refactorings instead of operating on single
> files only, allowing operations like
> 
>     "Find me all functions of this type, e.g.
>          Maybe a -> (a -> m a) -> m a
>      in the project and extract them into this new module,
>      with the name 'onJust'."
> 
> 
> Some of the problems above can be tried to address using regex-based
> search and replace, but this already fails in the simplest case of
> "import Prelude hiding (catch)" in case there is more than that imported
> from Prelude or newlines involved in the import list.
> 
> Transformation on the AST are much more powerful, and can guarantee that
> the result is, at least syntactically, valid. No text base tool can do that.
> 
> 
> Other uses
> ----------
> 
> In addition to being able to perform transformations as mentioned above,
> the refactoring tool as a library can be leveraged to:
> 
> * Support or be the base of code formatting tools such as
> haskell-stylish, linters, style/convention checkers, static analyzers,
> test coverage tools etc.
> 
> * Implement automatic API upgrades.
> 
>   Imagine the author of a library you use deprecates some functions,
> introduces replacements, adds type parameters. In these cases, it is
> very clear and often well-documented which code has to be replaced by
> what. The library author could, along with the new release, publish
> "upgrade transformations" that you can apply to your code base to save
> most of the manual work.
> 
>   These upgrade transformations could be either parts of the packages
> themselves or be separately maintained and refined on the feedback of
> users applying them to their code bases.
>   This could allow the Haskell community to keep up their fast pace
> while making API breakage a non-problem.
> 
>   Automation is what makes us deal well with test suites, source
> control, error checking. Taking the pain out of this will increase
> people's incentive to upgrade their code an to keep it well-maintained
> by easy refactorings.
> 
>   This concept of automatic code upgrades would, to my knowledge, be
> quite unique in the programming language world, and the possibility to
> have this is, as many things, the result of Haskell's excellent type system.
>   In comparison to dynamic and "unsafe" languages, we actually have a
> lot of information around that we can use to write powerful and
> expressive tools.
> 
> 
> Splitting into GSoC tasks
> -------------------------
> 
> I think that the project is too large for one summer and should be split
> into two parts:
> 
> 1. Implementation of a "full-source" transformation-enabling AST
> 2. The transformation engine and API / DSL
> 
> 
> About 1: Creating a full-source AST
> 
> Existing parsers for Haskell tend to throw away information that is not
> necessary for compiling the code. Of course this is not good for a
> code-to-code transformation tool, stripping comments or whitespace the
> programmer cares about is not an option.
> 
> haskell-src-exts has gained support for comments around version 1.1 (as
> another GSoC project?), but still comments are treated as foreigners,
> given to you in a list detached from the AST; their positions are
> specified with integers, which means that when you modify the AST, you
> have to take care to adjust the comments as well.
> 
> The problem seems to be that haskell-source-exts is made for parsing,
> its AST is made for transformations. I have yet to see a tool that
> actually *modifies* haskell-src-exts's AST - most of them use it for
> parsing, and then apply some form of pretty-printing to get back to code.
> 
> We need an AST that contains *all* information about the original source
> code, which means render . parse = id; this is why I called it "full
> source" AST.
> Also, this AST should have all elements available in a way that
> encourages modification and does not discriminate against operations
> like re-indentation or alignment.
> 
> I like to think of this as something that could eventually be part of
> ghc, in a pipeline like
> 
>     GHC's parser
>  -> full-source AST
>  -> reduced AST that strips parts irrelevant for compilation
>  -> usual compilation pipeline
> 
> I am thinking of this being in a GHC environment because it would
> guarantee that it is up-to-date with the currently supported language
> extensions etc.
> However, I also see that this might slow down development and make
> things harder, so at least for the beginning it might be better to do it
> as as a normal library; I would love to get feedback on this.
> 
> It might be a good idea to start with GHC's or haskell-source-exts
> parser and modify them to obtain the full-source AST. I believe though
> that it would be beneficial if the AST would be to a certain extent be
> decoupled from the parser, such that other people could write other
> parsers (e.g. using other parsing libraries like Parsec, attoparsec,
> uu-parsinglib, trifecta) that produce the same compatible AST type.
> 
> I believe that creating this AST makes a GSoC project for itself, and
> that it would be beneficial for all efforts processing Haskell as a
> source language.
> 
> 
> About 2: Transformation engine and how to write transformations
> 
> Transformations should be Haskell code that operates on the AST.
> 
> They would be similar to how you write TemplateHaskell in a way, yet
> much easier and more intuitive to use in most cases. I suggest the
> creation of a monadic DSL that allows you to select and match those
> parts of the code you are interested in, perform some case analysis and
> computation to determine how you want to transform it, and then give you
> convenient ways to express your changes in that DSL.
> 
> Functions in this DSL would roughly belong to one of the kinds
>     * finding/matching
>     * transforming/rewriting.
> 
> I have not thought about how this DSL would exactly look like, but I
> could imagine myself writing a high-level transformation like this:
> 
>     is           <- getImports
>     let matching = filter (importsFunction "catch")
>                  . filter (hasModuleName "Prelude")
>     mapM_ (rewrite . removeImportFunction "catch") matching
> 
> One of the priority goals of this GSoC project is making this API
> convenient; transformations should read more naturally and be much
> shorter than equivalent TemplateHaskell.
> 
> This extensive, convenience oriented monadic DSL should aim to have a
> reasonably large set of functions, and "do things for you" even if you
> could build them yourself with three or four combinators.
> It should base itself on a minimal set of core transformations.
> I think that the Haskell community has a lot of expertise, especially
> from parsing libraries, in creating this core API + convenience
> combinators duo, that could aid the GSoC student to get this right.
> 
> Along with this way to specify transformations, a transformation engine
> has to be constructed which can perform them somewhat efficiently over
> large code-bases.
> 
> While speed should should not be a main target for this project and
> functionality and expressiveness are the main goals, the transformation
> engine should be designed at least with speed in mind, which can be
> worked on at a later stage outside of the GSoC project.
> 
> This second part of the proposal should result in a library to perform
> transformations and an executable that can apply them to a folder of
> source files.
> 
> 
> Possible extensions and follow-up projects
> ------------------------------------------
> 
> * Interactive refactoring tool
> 
> The refactoring tool could be driven by an interactive application that
> lets you write some transformations, apply them to your code base, view
> the diffs and build the project; on build failures or otherwise
> not-complete transformations it would allow to refine your
> transformations, or write custom special cases for selected files, line
> ranges or functions.
> This way you could quickly upgrade your code base in a fast
> write-check-repeat cycle.
> 
> * An API upgrade infrastructure
> 
> As mentioned above, transformations could be published along with new
> software packages that allow their dependants to easily upgrade their
> code to the latest API version with minimal manual effort.
> An infrastructure or standard way of way of doing this could be established.
> For now, I think that package maintainers would publish their upgrade
> transformations in their projects' or separate code repositories, and
> users would apply them with the refactoring tool as needed.
> I can also imagine a more dedicated infrastructure though, where API
> upgrades are stored on Hackage or a similar database.
> 
> 
> Summary
> -------
> 
> This proposal contains two GSoC projects that I believe to be reasonably
> sized for one summer each.
> 
> I think that they make good projects according to GSoC standards as they
> are focused, feasible, and aimed at creating real-world code that bring
> a clearly visible benefit.
> 
> The entry barrier is not too high since knowledge of compiler or runtime
> internals is not required. However, applicants should probably have a
> fairly good understanding of parsers and how they are dealt with in
> Haskell, and be somewhat familiar with the Haskell community in order to
> find sources of good feedback and for being able to judge whether the
> tools being created would be convenient for the community to use.
> 
> 
> Discussion
> ----------
> 
> Now I would be glad to get some responses on this proposal; I have
> written a bit of text, but it is still a very rough idea and I would
> love to hear your thoughts about it.
> 
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe