[Haskell-cafe] Re[2]: [Haskell] I18N, external strings

Donald Bruce Stewart dons at cse.unsw.edu.au
Fri Nov 17 20:32:20 EST 2006


bulat.ziganshin:
> Hello Axel,
> 
> Friday, November 17, 2006, 11:27:00 AM, you wrote:
> 
> > (l_ "Translate this")
> 
> > is compiled into a C string constant, that GHC then turns lazily into a
> > list of characters, which l_ then turns into an array in C land to pass
> > to the gettext function, which, in turn, returns a new C string array
> > that has to be turned into a Haskell string again. So I'm glad Lennart
> > proposed to turn String into a class which would then make it possible
> > to pass a pointer to a contant C string to gettext.
> 
> i'm not sure that you are right. GHC can perform compile-time computation
> of constant expressions. Don Bruce should know better, once he pointed
> out how a String constant can be turned into ByteString one at compile
> time

Quite so, using (what else?) rewrite rules!

Supposing I want to use a ByteString literal (and lennart hasn't yet
committed overloaded strings into GHC..) I can write:

    import qualified Data.ByteString.Char8 as C
    main = C.putStrLn mystring
    mystring = C.pack "rewrite rules are fun!"

and it is compiled by GHC to:

    mystring_rDR =
      Data.ByteString.Char8.pack (GHC.Base.unpackCString# "rewrite rules are fun!"#)

    Main.main :: GHC.IOBase.IO ()
    Main.main = Data.ByteString.putStrLn mystring_rDR

    :Main.main :: GHC.IOBase.IO ()
    :Main.main = GHC.TopHandler.runMainIO @ () Main.main

Now, the string literals has been packed by GHC for us into a Addr#,
pointing to a proper C string. Now, we'd like to remove that
intermediate unpackCString#, and just pack the string literal straight
into a ByteString, avoiding an intermediate list.

So we add a rewrite rule:

    {-# RULES
        "pack/packAddress" forall s .
              pack (unpackCString# s) = B.packAddress s
      #-}

Now, when compiled with -O -fglasgow-exts, we get:

    1 RuleFired
        1 FPS pack/packAddress

and the code is transformed as follows:

    mystring = Data.ByteString.Base.packAddress "rewrite rules are fun!"

to

    mystring =
      Data.ByteString.Char8.pack (GHC.Base.unpackCString# "rewrite rules are fun!"#)

and then the rewrite rule kicks in, and we construct a new ByteString
directly from the Addr#, via a call to strlen to get the length:

    mystring =
      let addr#_a146 :: GHC.Prim.Addr#
          addr#_a146 = "rewrite rules are fun!"#
      in case Data.ByteString.Base.$wccall addr#_a146 s2#_a1Q8 of
         (# ds3_a1Re, ds4_a1Rf #) ->
            Data.ByteString.Base.PS addr#_a146 
                                    (GHC.ForeignPtr.PlainForeignPtr var#_a1Q9)
                                    0
                                    (GHC.Prim.word2Int# ds4_a1Rf)

*exactly* what we'd like.
So at runtime, this string will require only a call to strlen to build.

If the compiler was able to pack string literals to CStringLen, tagging
them with their length, we could avoid the O(n) strlen call, and directy
build ByteStrings in O(1), using unsafePackAddress#, which takes a
length field. 

I.e. something like:

    mystring =
      let addr#_a146 :: GHC.Prim.Addr#
          addr#_a146 = "rewrite rules are fun!"#

          len        :: GHC.Prim.Int#
          len        = 22

      in Data.ByteString.Base.PS addr#_a146 
                                 (GHC.ForeignPtr.PlainForeignPtr var#_a1Q9)
                                 0
                                 len#

-- Don


More information about the Haskell-Cafe mailing list