Optimisation of unpackCString#

Don Stewart dons at galois.com
Mon Apr 28 17:00:38 EDT 2008


dons:
> ndmitchell:
> > Hi
> > 
> > (All these results are from GHC 6.9.20071226, but suspect they hold
> > for 6.9.* and 6.8)
> > 
> > The following code:
> > 
> > test = head "neil"
> > 
> > Produces (with -O2):
> > 
> > Text.HTML.TagSoup.Development.Sample.test =
> >   case GHC.Base.unpackCString# "neil" of wild_aaU {
> >     [] -> GHC.List.badHead @ GHC.Base.Char; : x_aaY ds1_aaZ -> x_aaY
> >   }
> > 
> > However:
> > 
> > test = head ['n','e','i','l']
> > 
> > Produces:
> > 
> > Text.HTML.TagSoup.Development.Sample.test = GHC.Base.C# 'n'
> > 
> > The packing of the string has damaged the optimisation. Is there
> > anyway to mitigate this? I ideally want to do this in RULES to derive
> > an efficient parser from a set of parser combinators, and the
> > unpackCString# really gets in the way of future optimisations.
> > 
> > I could imagine adding two rules to the simplifier:
> > 
> > case unpackCString# "" of  ==> case [] of
> > case unpackCString# "xyz" of ==> case (C# 'x': unpackCString# "yz") of
> > 
> > Then the simplifier could recover the optimised version.
> 
> The first case makes sense, and is just a RULE. Though it seems GHC already
> does this?
> 
>     g = head ""
> 
> goes to:
> 
>     M.g = badHead @ Char
> 
> without prompting.
> 
> Now, ByteString uses a rule relating to unpackCString#:
> 
>     {-# RULES
>     "ByteString pack/packAddress" forall s .
>        pack (unpackCString# s) = inlinePerformIO (B.unsafePackAddress s)
>      #-}
> 
> So I wondered if I could write a rule for head/unpackCString
> 
>     {-# OPTIONS -fglasgow-exts #-}
> 
>     module M where
> 
>     import GHC.Base
> 
>     -- ^ enable rewrite rules
>     {-# RULES
>     "head/string literal" forall s .
>        head (unpackCString# s) = case s of
>             ""# -> head []
>             _   -> C# (indexCharArray# (unsafeCoerce# s) 0#)
>      #-}
> 
>     f = head "neil"
>             
> Urgh. not what we want though. GHC won't index the string at compile time,
> and I can't write a RULE for it.
> 
> I was surprised to note we can actually match on unboxed string literals in
> rules, but then the question is how to recover the nice list structure again.
> 
> As the string has already been packed for us, this seems a bit hard.
> Otherwise a unpack (pack s) rule could be used.
> 
> Perhaps you can newtype Char and use string overloading to work around the 
> packing issue?

This goes back to an old gripe of mine actually -- we can't get
at the length of a C string literal at compile time either, which
would be super useful in rules.

If we had some light primitives for this, that GHC new about (head#,
length#), that accessed the internal data about what strings are up to,
that could be useful.

-- Don


More information about the Glasgow-haskell-users mailing list