Optimisation of unpackCString#
Don Stewart
dons at galois.com
Mon Apr 28 16:54:41 EDT 2008
ndmitchell:
> Hi
>
> (All these results are from GHC 6.9.20071226, but suspect they hold
> for 6.9.* and 6.8)
>
> The following code:
>
> test = head "neil"
>
> Produces (with -O2):
>
> Text.HTML.TagSoup.Development.Sample.test =
> case GHC.Base.unpackCString# "neil" of wild_aaU {
> [] -> GHC.List.badHead @ GHC.Base.Char; : x_aaY ds1_aaZ -> x_aaY
> }
>
> However:
>
> test = head ['n','e','i','l']
>
> Produces:
>
> Text.HTML.TagSoup.Development.Sample.test = GHC.Base.C# 'n'
>
> The packing of the string has damaged the optimisation. Is there
> anyway to mitigate this? I ideally want to do this in RULES to derive
> an efficient parser from a set of parser combinators, and the
> unpackCString# really gets in the way of future optimisations.
>
> I could imagine adding two rules to the simplifier:
>
> case unpackCString# "" of ==> case [] of
> case unpackCString# "xyz" of ==> case (C# 'x': unpackCString# "yz") of
>
> Then the simplifier could recover the optimised version.
The first case makes sense, and is just a RULE. Though it seems GHC already
does this?
g = head ""
goes to:
M.g = badHead @ Char
without prompting.
Now, ByteString uses a rule relating to unpackCString#:
{-# RULES
"ByteString pack/packAddress" forall s .
pack (unpackCString# s) = inlinePerformIO (B.unsafePackAddress s)
#-}
So I wondered if I could write a rule for head/unpackCString
{-# OPTIONS -fglasgow-exts #-}
module M where
import GHC.Base
-- ^ enable rewrite rules
{-# RULES
"head/string literal" forall s .
head (unpackCString# s) = case s of
""# -> head []
_ -> C# (indexCharArray# (unsafeCoerce# s) 0#)
#-}
f = head "neil"
Urgh. not what we want though. GHC won't index the string at compile time,
and I can't write a RULE for it.
I was surprised to note we can actually match on unboxed string literals in
rules, but then the question is how to recover the nice list structure again.
As the string has already been packed for us, this seems a bit hard.
Otherwise a unpack (pack s) rule could be used.
Perhaps you can newtype Char and use string overloading to work around the
packing issue?
-- Don
More information about the Glasgow-haskell-users
mailing list