[C2hs] Profiling c2hs

Sat May 15 19:32:31 EDT 2004

Duncan,

On Wed, 2004-05-12 at 18:54, Duncan Coutts wrote: 
> On Wed, 2004-05-12 at 05:35, Manuel M T Chakravarty wrote:
> > >  At the moment
> > > this is done by after parsing each declaration (which might contain
> > > typedefs) modifying the remaining stream of tokens changing all
> > > identifiers that have been discovered to be typedefs from ordinary
> > > identifier tokens to typedef-identifier tokens. This is expensive.
> > 
> > To be honest, I am not entirely convinced that the basic idea of
> > modifying the token stream to rewrite identifier tokens is that
> > expensive.  I suspect it is the naive way in which morphTypeNames does
> > the job that's the problem.
> 
> The set of new type names passed to morphTypeNames after each
> parseCExtDEclList is quite small. So small that using lists and elem is
> faster than using Sets and elemSet (I tried this).

Sure, I wouldn't propose this for the small sets that we have at the
moment, only if we can combine all morphTypeNames into one.

> > > Perhaps it would be possible to add this set of typedef identifiers to
> > > the parser state monad and thread it through to the parsers for cid and
> > > typedefName which could consult this parser state and match
> > > conditionally on the identifier token being in the typedef-identifier
> > > set or not. (cidOrTN would not need to to a lookup) This would avoid
> > > traversing the token stream after parsing each declaration.
> > 
> > I guess, it would be possible to interleave this with the parser, but
> > what I like about the current setup is that it is more modular.  I don't
> > think that it is a problem to traverse the token stream.  
> 
> True, it looks like it would be difficult to interleave in the manner I
> suggested anyway because as far as I can tell, the parser combinators
> don't make it easy to do a conditional parse - conditional on some
> internal monad state. Swierstra & Duponcheel's combinator isn't a monad
> right? And doesn't allow you to thread state from earlier in the token
> stream to later.

Yes.

> > The problem with the current setup is that morphTypeNames is interleaved
> > with parseCExtDeclList; ie, with each call to parseCExtDEclList that
> > reads so a typedef one more morphTypeNames-filter is added to the token
> > stream.  In other words, after N typedefs, each token passes through N
> > instances of morphTypeNames.  Instead, we want a single instance of
> > morphTypeNames that collects all the typedef'd names in a set and
> > rewrites token accordingly.  If the set of represented as a balanced
> > tree (ie, using the "Set" module in "base/general/Sets.hs") we be able
> > to remove a factor of log N from the asymptotic complexity.  This would
> > be a much more localised change than changing the parser state.
> > 
> > What do you think?
> 
> If it can be done with a single pass of morphTypeNames that simply
> accumulates a set of type names that need to be changed, I imagine that
> would have similar speed improvements to the thing I suggested.
> 
> It's not clear how that might be accomplished in the current scheme
> where we don't know the typedef'ed names until after parsing each
> declaration.
> 
> How difficult is it to pick out typedef'ed names from the token stream
> (ie without full parsing)? Then we could do it in a single lazy pass
> over the token stream between the lexer and the parser.
> 
> Perhaps it could be done in the lexer itself, if the lexer is monadic.

The lexer isn't monadic either.  It uses a self-optimisation technique
similar to that in the parser combinators.

I had a closer look now and we need a bit of cooperation from the parser
library; so, I extended Parsers.execParser to take a token converter as
an additional argument.  In the C declaration parser, this token
converter changes from declaration to declaration, as more and more
typedef'd names are collected.  (They are collected in a Set now.)

I attach a patch that modifies base/syntax/Parsers.hs and
c2hs/c/CParser.hs accordingly.  As the patched functions didn't change
for quite a while, this patch should be quite independent of the version
of c2hs that you have got.  Could you please profile the patched parser
again and see whether its space and runtime behaviour changed?

Cheers,
Manuel

-------------- next part --------------
A non-text attachment was scrubbed...
Name: c2hs-parser.patch
Type: text/x-patch
Size: 7540 bytes
Desc: not available
Url : http://www.haskell.org//pipermail/c2hs/attachments/20040515/6a07d028/c2hs-parser.bin