Optimisation of unpackCString#
Don Stewart
dons at galois.com
Mon Apr 28 17:00:38 EDT 2008
dons:
> ndmitchell:
> > Hi
> >
> > (All these results are from GHC 6.9.20071226, but suspect they hold
> > for 6.9.* and 6.8)
> >
> > The following code:
> >
> > test = head "neil"
> >
> > Produces (with -O2):
> >
> > Text.HTML.TagSoup.Development.Sample.test =
> > case GHC.Base.unpackCString# "neil" of wild_aaU {
> > [] -> GHC.List.badHead @ GHC.Base.Char; : x_aaY ds1_aaZ -> x_aaY
> > }
> >
> > However:
> >
> > test = head ['n','e','i','l']
> >
> > Produces:
> >
> > Text.HTML.TagSoup.Development.Sample.test = GHC.Base.C# 'n'
> >
> > The packing of the string has damaged the optimisation. Is there
> > anyway to mitigate this? I ideally want to do this in RULES to derive
> > an efficient parser from a set of parser combinators, and the
> > unpackCString# really gets in the way of future optimisations.
> >
> > I could imagine adding two rules to the simplifier:
> >
> > case unpackCString# "" of ==> case [] of
> > case unpackCString# "xyz" of ==> case (C# 'x': unpackCString# "yz") of
> >
> > Then the simplifier could recover the optimised version.
>
> The first case makes sense, and is just a RULE. Though it seems GHC already
> does this?
>
> g = head ""
>
> goes to:
>
> M.g = badHead @ Char
>
> without prompting.
>
> Now, ByteString uses a rule relating to unpackCString#:
>
> {-# RULES
> "ByteString pack/packAddress" forall s .
> pack (unpackCString# s) = inlinePerformIO (B.unsafePackAddress s)
> #-}
>
> So I wondered if I could write a rule for head/unpackCString
>
> {-# OPTIONS -fglasgow-exts #-}
>
> module M where
>
> import GHC.Base
>
> -- ^ enable rewrite rules
> {-# RULES
> "head/string literal" forall s .
> head (unpackCString# s) = case s of
> ""# -> head []
> _ -> C# (indexCharArray# (unsafeCoerce# s) 0#)
> #-}
>
> f = head "neil"
>
> Urgh. not what we want though. GHC won't index the string at compile time,
> and I can't write a RULE for it.
>
> I was surprised to note we can actually match on unboxed string literals in
> rules, but then the question is how to recover the nice list structure again.
>
> As the string has already been packed for us, this seems a bit hard.
> Otherwise a unpack (pack s) rule could be used.
>
> Perhaps you can newtype Char and use string overloading to work around the
> packing issue?
This goes back to an old gripe of mine actually -- we can't get
at the length of a C string literal at compile time either, which
would be super useful in rules.
If we had some light primitives for this, that GHC new about (head#,
length#), that accessed the internal data about what strings are up to,
that could be useful.
-- Don
More information about the Glasgow-haskell-users
mailing list