Compilation of Overloaded strings and saving constant results

Adam Gundry adam at well-typed.com
Mon Jul 25 19:53:52 UTC 2022


Hi Clinton,


On 25/07/2022 05:15, Clinton Mead wrote:
> I'm presuming (correct me if I'm wrong) that when I have an overloaded 
> string literal in my program, let's say it's overloaded to Text, the 
> conversion from [Char] -> Text happens at runtime, instead of what would 
> be ideal being that the compiler somehow does that conversion at compile 
> time and puts the raw bytes that represent the Text object in the 
> executable?

GHC can actually do a little bit better than this, because a literal is 
stored in the binary as a CString# rather than a list of Char. Thus the 
conversion doesn't need to follow a linked list, but it is still O(n).


> If it does work like that, that's not a big deal. But let's say I have:
> 
> f :: Text -> Text -> Text
> f x y = Text.concat ["Hello", x, "World", y]
> 
> This is desugared like so:
> 
> f :: Text -> Text -> Text
> f x y = Text.concat [fromString "Hello", x, fromString "World", y]
> 
> Would there be a [Char] -> Text conversion from ['H', 'e', 'l', 'l', 
> 'o'] -> Text "Hello" everytime 'f' is called, or is 'f' 
> rewritten something like this:
> 
> f :: Text -> Text -> Text
> f x y = Text.concat [hello, x, world, y]
> 
> hello :: Text
> hello = fromString "Hello"
> 
> world :: Text
> world = fromString "World"
> 
> So the [Char] -> Text conversion is only done once and memomised?

This transformation is known as "let floating" or "full-laziness" and is 
enabled with -O1. The easiest way to see this is to put this definition 
in a module and compile it, like this:

$ ghc-9.2.2 M.hs -ddump-simpl -dno-typeable-binds -dsuppress-all -O0

The Core output takes a bit of decoding, but if you squint you can see 
the definition of f is essentially unchanged:

f = \ x_arM y_arN ->
       concat
         (: (($fIsStringText `cast` <Co:2>) (unpackCString# "Hello"#))
            (: x_arM
               (: (($fIsStringText `cast` <Co:2>) (unpackCString# "World"#))
                  (: y_arN []))))

If you compile with -O1 instead you get:

f4 = "Hello"#
f3 = unpackCString# f4
f2 = "World"#
f1 = unpackCString# f2
f = \ x_arI y_arJ -> concat (: f3 (: x_arI (: f1 (: y_arJ []))))

which is pretty much what you wanted.


> I guess this brings up the more general question, if I have this:
> 
> f x = h x (g (42 :: Int))
> 
> Can I rely on GHC rewriting it like this?
> 
> f x = h x y
> y = g (42 :: Int)
> 
> Or is this a transformation I should do explicitly if I want to rely on it?

In this case, a similar experiment shows that GHC will float out the 
function call (here I've used NOINLINE to prevent the function 
applications being optimised out):

f x = h x (g 42)

{-# NOINLINE g #-}
g :: Int -> Int
g x = x

{-# NOINLINE h #-}
h :: Int -> Int -> Int
h x y = x * y

Compiling the above gives this Core:

f2 = I# 42#
f1 = g f2
f = \ x_atQ -> h x_atQ f1

Whether you can rely on this is a slightly different question though. 
There are lots of heuristics in GHC's optimiser which mean that in more 
complicated situations it will not always apply full-laziness as much as 
possible. Moreover, full-laziness can introduce space leaks. My 
colleague Edsko wrote a nice blog post a few years ago that explores 
this in more detail: 
https://www.well-typed.com/blog/2016/09/sharing-conduit/

Hope this helps,

Adam


-- 
Adam Gundry, Haskell Consultant
Well-Typed LLP, https://www.well-typed.com/

Registered in England & Wales, OC335890
27 Old Gloucester Street, London WC1N 3AX, England


More information about the Glasgow-haskell-users mailing list