[Haskell-cafe] GHC API: parsing as much as I can

Artem Pelenitsyn a.pelenitsyn at gmail.com
Fri Nov 15 16:51:28 UTC 2019


Hello Cafe,

I need an advice on how to use GHC API to parse big collections of Haskell
source files.
Say, I want to collect ASTs of everything that is on Hackage.
I downloaded the whole Hackage (latest versions only) and have it locally
now.
I tried simple advice found in the Parse module documentation:
https://hackage.haskell.org/package/ghc-8.6.5/docs/Parser.html

    runParser :: DynFlags -> String -> P a -> ParseResult a
    runParser flags str parser = unP parser parseState
    where
      filename = "<interactive>"
      location = mkRealSrcLoc (mkFastString filename) 1 1
      buffer = stringToStringBuffer str
      parseState = mkPState flags buffer location

It mostly works: 75% of .hs files on Hackage seem to parse fine. I looked
into the rest 25%
and noticed that this snippet can't handle files using GHC extensions such
as RankNTypes,
TemplateHaskell, BangPatterns, etc.when given the default DynFlags. This
leads me to the question of how should I initialize DynFlags?

Currently, I use this for getting DynFlags:

initDflags :: IO DynFlags
initDflags = do
    let ldir = Just libdir
    mySettings <- initSysTools ldir
    myLlvmConfig <- initLlvmConfig ldir
    initDynFlags (defaultDynFlags mySettings myLlvmConfig)

I understand that simple parsing of individual files can't take into
account extensions activated inside .cabal files. But I'd expect that it
should be possible to, at least, consider the extensions mentioned in the
LANGUAGE pragmas. Currently, this isn't happening. Any suggestions on how
to achieve this are welcomed. I probably won't get to parsing 100% of
Hackage, but I'd hope for better than the current 75%.

--
Best wishes,
Artem
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20191115/7c43af2e/attachment.html>


More information about the Haskell-Cafe mailing list