[Haskell-cafe] help w/ improving custom Stream for parsec

Malcolm Wallace malcolm.wallace at me.com
Thu Apr 9 15:22:34 UTC 2015


In cpphs (a Haskell implementation of the C preprocessor), there are two passes: the first pass splits the input into lines, and interprets #if and #include directives, leaving all other lines either untouched, or eliminated when in the wrong branch of a #if conditional.  Its output is a list of lines, each of which is paired with its original source position (i.e. file and line number).  Because #includes are transitive, the state for the first pass contains a little stack of contexts, that is, the file that triggered the inclusion, and its line number, so that once the included file is finished, the annotations can be popped to return to their enclosing position.

The second pass accepts the [(Posn,String)] emitted by the first pass, and interprets/expands #define macros and their usage sites.  It uses the positional annotations to give better warning/error messages in the case that, for instance, a #define is syntactically not-well-formed.  Macro expansion needs to treat the input as words rather than lines, but the macro definitions obviously are lexed and parsed differently, so it is useful to identify them on separate lines before splitting the non-#define lines into words.

Regards,
    Malcolm

On 9 Apr 2015, at 16:01, Maurizio Vitale wrote:

> Thanks Malcolm,
> 
> I did consider the two pass approach (and actually having pass 1 returning a stream of tokens annotated with position information)
> I'm keeping that option open, especially because for speed one might implement the first pass with Attoparsec and the rest with parsec.
> How would you keep track of macro expansions and source positions in order to provide nice error messages?
> Do you know of anything on hackage that does something similar (either the two pass  or the custom stream approach)?
> 
> Again, thanks. I'm still playing with alternatives before implementing the real language (which, for the curious, is SystemVerilog) so my barrier to trying out and benchmark different approaches is at this moment very low.
> And the real goal is to learn Haskell,don't care much if I'll have a full Verilog parser/elaborator; so playing with alternatives is very much useful.
> The language has also interesting features that make compiling separate files in parallel very challenging, so that's another area I want to play with before being too invested.
> 
> 
> On Thu, Apr 9, 2015 at 2:05 AM, Malcolm Wallace <malcolm.wallace at me.com> wrote:
> I think what you really need is a two-pass parser.  The first parser consumes the input stream, and copies it to the output stream with files inserted, and macros expanded.  The second parser consumes the already-preprocessed stream, and does whatever you like with it.
> 
> Regards,
>     Malcolm
> 
> On 7 Apr 2015, at 17:25, Maurizio Vitale wrote:
> 
> > I need a custom stream that supports insertion of include files and expansions of macros.
> > I also want to be able to give nice error messages (think of clang macro-expansion backtrace), so I cannot use the standard trick of concatenating included files and expanded macros to the current input with setInput/getInput (I think I can't maybe there's a way of keeping a more complex "position" and since the use in producing an error backtrac is rare, it migth be worth exploring; if anybody has ideas here, I'm listening)
> >
> > Assuming I need a more compelx stream, this is what I have (Macro and File both have a string argument, but it will be more compicated, a list of expansions for Macro for instance).
> >
> > Is there a better way for doing this?
> > What are the performance implications with backtracking? I'll be benchmarking it, but if people see obvious problems, let me know.
> >
> > Thanks a lot,
> >   Maurizio
> >
> > {-# LANGUAGE FlexibleInstances #-}
> > {-# LANGUAGE FlexibleContexts #-}
> > {-# LANGUAGE InstanceSigs #-}
> > {-# LANGUAGE MultiParamTypeClasses #-}
> >
> > module Parsing where
> >
> > import Text.Parsec
> >
> > type Parser s m = ParsecT s () m
> >
> > data VStream = File String | Macro String deriving Show
> >
> > newtype StreamStack = StreamStack [VStream] deriving Show
> >
> > instance (Monad m) ⇒ Stream VStream m Char where
> >   uncons ∷ VStream -> m (Maybe (Char, VStream))
> >   uncons (File (a:as)) = return $ Just (a, File as)
> >   uncons (File []) = return Nothing
> >   uncons (Macro (a:as)) = return $ Just (a, File as)
> >   uncons (Macro []) = return Nothing
> >
> >
> >
> > instance (Monad m) => Stream StreamStack  m Char where
> >   uncons (StreamStack []) = return Nothing
> >   uncons (StreamStack (s:ss)) =
> >     case uncons s of
> >      Nothing → uncons $ StreamStack ss
> >      Just Nothing → uncons $ StreamStack ss
> >      Just (Just (c, File s')) → return $ Just (c, StreamStack (File s': ss))
> >      Just (Just (c, Macro s')) → return $ Just (c, StreamStack (Macro s':ss))
> >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > Haskell-Cafe at haskell.org
> > http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> 
> 



More information about the Haskell-Cafe mailing list