GHC API: monad and error handling

Mon Jul 14 10:03:48 EDT 2008

Hello librarians,

as some may know, I am currently working on improvements to the GHC  
API.  Many of the exported functions of the GHC module have a type of  
the form

   Session -> ... -> IO (Maybe X)

where 'Session' is mutable and 'X' is the actual result type of the  
function.  In order to enforce (the implicitly assumed) single- 
threaded use of a session and to provide richer error information I  
am restructuring all exported functions of the GHC API to return a  
computation in the 'Ghc' monad.  This currently looks like this(*)

   newtype Ghc a = Ghc { unGhc :: Session -> IO (Either GhcError a }

   newtype Session = Session (IORef HscEnv)

Functions that modify a session are now in this monad but behave  
mostly the same.  A more difficult decision is how to deal with  
errors.  The GhcError type currently looks like this:

   -- | An error annotated with the phase it happened in.
   data GhcError
       = GhcError HscPhase Messages
         -- ^ A "normal" compilation error.
       | ApiError HscPhase String
         -- ^ An error that violated some pre-condition/invariant of  
the API.
       | GhcIOException Exception
         -- some IO exception
         -- XXX: would (forall e. Typeable e => e) be better?

Note that we have to wrap IO exceptions and propagate them separately  
in the Ghc monad.(**)

   -- | Lift an 'IO' action into the 'Ghc' monad.  IO exceptions are  
wrapped and
   --   can be queried by matching on the 'GhcIOException'  
constructor in
   --   'ghcCatch'.  XXX: is there a cleaner way?
   io :: IO a -> Ghc a
   io action = Ghc $ \_ ->
                 Exception.handle (return . Left . GhcIOException) $ do
                   a <- action
                   return (Right a)

Now, what is the proper way to translate a function of type   ... ->  
IO (Maybe a) into this monad?  My suggestion would be to prefer ... - 
 > Ghc a wherever possible.  This would make things nicer with the  
upcoming split-up of more fine-grained control over the executed  
phases.  We could write

   parsed_file <- parse file
   tc_rn_file  <- typecheckRename parsed_file
   simpl_file  <- simplify tc_rn_file
   ...
   `onCompileError` $ \GhcError phase msgs ->
      somehowHandleErrors phase msgs

instead of

   mb_parsed_file <- parse file
   case mb_parsed_file of
     Nothing -> ...
     Just parsed_file -> do
       mb_tc_rn_file <- typecheckRename parsed_file
       case mb_tr_rn_file of
         ... etc ...

where 'onCompileError' would only catch errors with the constructor  
'GhcError', not 'ApiError' or 'GhcIOException'.  For the latter It  
is, however, not always clear what is a compile error and what is  
not.  Consider, for example, the following function:

-- | Takes a 'ModuleName' and possibly a 'PackageId', and consults the
-- filesystem and package database to find the corresponding 'Module', 
-- using the algorithm that is used for an @import@ declaration.
findModule :: ModuleName -> Maybe PackageId -> Ghc Module
findModule mod_name maybe_pkg = withSession $ \hsc_env ->
   let
         dflags = hsc_dflags hsc_env
         hpt    = hsc_HPT hsc_env
         this_pkg = thisPackage dflags
   in
   case lookupUFM hpt mod_name of
     Just mod_info -> return (mi_module (hm_iface mod_info))
     _not_a_home_module -> do
           -- XXX: should we really throw IO exceptions here?
	  res <- io $ findImportedModule hsc_env mod_name maybe_pkg
	  case res of
	    Found _ m | modulePackageId m /= this_pkg -> return m
		      | otherwise -> throwDyn (CmdLineError (showSDoc $
					text "module" <+> pprModule m <+>
					text "is not loaded"))
	    err -> let msg = cannotFindModule dflags mod_name err in
		   throwDyn (CmdLineError (showSDoc msg))

This either returns a Module or throw an exception by looking it up  
in the home package table.  Otherwise, it throws a CmdLineError  
exception, which is intended for reporting failure inside GHCi.  I  
guess the proper way for this function would be to throw an  
'ApiError', i.e., we expect the looked-up module to be existing in  
the home package table and to be loaded.

My questions thus are:

  - Does this sound like a reasonable strategy?
  - Is such a monad a good or a bad idea?
  - Does anyone have an idea of a classification of errors, or  
guidelines/principles for one?
  - Any other comments?

Thanks,

/ Thomas

(*) an alternative implementation that would probably be a bit more  
efficient in case of errors could use continuation-passing style:

   newtype Ghc a
      = Ghc { unGhc :: forall ans.
                       Session
                    -> (GhcError -> IO ans)  -- failure continuation
                    -> (a -> IO ans)         -- success continuation
                    -> IO ans

(**) instances for mtl classes will be provided in a separate package  
to avoid adding mtl as a build dependency.
--
I was wrong.  / This changes everything.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 194 bytes
Desc: This is a digitally signed message part
Url : http://www.haskell.org/pipermail/libraries/attachments/20080714/b83a132e/PGP.bin