Optimisation of unpackCString#

Don Stewart dons at galois.com
Mon Apr 28 16:54:41 EDT 2008


ndmitchell:
> Hi
> 
> (All these results are from GHC 6.9.20071226, but suspect they hold
> for 6.9.* and 6.8)
> 
> The following code:
> 
> test = head "neil"
> 
> Produces (with -O2):
> 
> Text.HTML.TagSoup.Development.Sample.test =
>   case GHC.Base.unpackCString# "neil" of wild_aaU {
>     [] -> GHC.List.badHead @ GHC.Base.Char; : x_aaY ds1_aaZ -> x_aaY
>   }
> 
> However:
> 
> test = head ['n','e','i','l']
> 
> Produces:
> 
> Text.HTML.TagSoup.Development.Sample.test = GHC.Base.C# 'n'
> 
> The packing of the string has damaged the optimisation. Is there
> anyway to mitigate this? I ideally want to do this in RULES to derive
> an efficient parser from a set of parser combinators, and the
> unpackCString# really gets in the way of future optimisations.
> 
> I could imagine adding two rules to the simplifier:
> 
> case unpackCString# "" of  ==> case [] of
> case unpackCString# "xyz" of ==> case (C# 'x': unpackCString# "yz") of
> 
> Then the simplifier could recover the optimised version.

The first case makes sense, and is just a RULE. Though it seems GHC already
does this?

    g = head ""

goes to:

    M.g = badHead @ Char

without prompting.

Now, ByteString uses a rule relating to unpackCString#:

    {-# RULES
    "ByteString pack/packAddress" forall s .
       pack (unpackCString# s) = inlinePerformIO (B.unsafePackAddress s)
     #-}

So I wondered if I could write a rule for head/unpackCString

    {-# OPTIONS -fglasgow-exts #-}

    module M where

    import GHC.Base

    -- ^ enable rewrite rules
    {-# RULES
    "head/string literal" forall s .
       head (unpackCString# s) = case s of
            ""# -> head []
            _   -> C# (indexCharArray# (unsafeCoerce# s) 0#)
     #-}

    f = head "neil"
            
Urgh. not what we want though. GHC won't index the string at compile time,
and I can't write a RULE for it.

I was surprised to note we can actually match on unboxed string literals in
rules, but then the question is how to recover the nice list structure again.

As the string has already been packed for us, this seems a bit hard.
Otherwise a unpack (pack s) rule could be used.

Perhaps you can newtype Char and use string overloading to work around the 
packing issue?

-- Don


More information about the Glasgow-haskell-users mailing list