Compilation of Overloaded strings and saving constant results
Adam Gundry
adam at well-typed.com
Mon Jul 25 19:53:52 UTC 2022
Hi Clinton,
On 25/07/2022 05:15, Clinton Mead wrote:
> I'm presuming (correct me if I'm wrong) that when I have an overloaded
> string literal in my program, let's say it's overloaded to Text, the
> conversion from [Char] -> Text happens at runtime, instead of what would
> be ideal being that the compiler somehow does that conversion at compile
> time and puts the raw bytes that represent the Text object in the
> executable?
GHC can actually do a little bit better than this, because a literal is
stored in the binary as a CString# rather than a list of Char. Thus the
conversion doesn't need to follow a linked list, but it is still O(n).
> If it does work like that, that's not a big deal. But let's say I have:
>
> f :: Text -> Text -> Text
> f x y = Text.concat ["Hello", x, "World", y]
>
> This is desugared like so:
>
> f :: Text -> Text -> Text
> f x y = Text.concat [fromString "Hello", x, fromString "World", y]
>
> Would there be a [Char] -> Text conversion from ['H', 'e', 'l', 'l',
> 'o'] -> Text "Hello" everytime 'f' is called, or is 'f'
> rewritten something like this:
>
> f :: Text -> Text -> Text
> f x y = Text.concat [hello, x, world, y]
>
> hello :: Text
> hello = fromString "Hello"
>
> world :: Text
> world = fromString "World"
>
> So the [Char] -> Text conversion is only done once and memomised?
This transformation is known as "let floating" or "full-laziness" and is
enabled with -O1. The easiest way to see this is to put this definition
in a module and compile it, like this:
$ ghc-9.2.2 M.hs -ddump-simpl -dno-typeable-binds -dsuppress-all -O0
The Core output takes a bit of decoding, but if you squint you can see
the definition of f is essentially unchanged:
f = \ x_arM y_arN ->
concat
(: (($fIsStringText `cast` <Co:2>) (unpackCString# "Hello"#))
(: x_arM
(: (($fIsStringText `cast` <Co:2>) (unpackCString# "World"#))
(: y_arN []))))
If you compile with -O1 instead you get:
f4 = "Hello"#
f3 = unpackCString# f4
f2 = "World"#
f1 = unpackCString# f2
f = \ x_arI y_arJ -> concat (: f3 (: x_arI (: f1 (: y_arJ []))))
which is pretty much what you wanted.
> I guess this brings up the more general question, if I have this:
>
> f x = h x (g (42 :: Int))
>
> Can I rely on GHC rewriting it like this?
>
> f x = h x y
> y = g (42 :: Int)
>
> Or is this a transformation I should do explicitly if I want to rely on it?
In this case, a similar experiment shows that GHC will float out the
function call (here I've used NOINLINE to prevent the function
applications being optimised out):
f x = h x (g 42)
{-# NOINLINE g #-}
g :: Int -> Int
g x = x
{-# NOINLINE h #-}
h :: Int -> Int -> Int
h x y = x * y
Compiling the above gives this Core:
f2 = I# 42#
f1 = g f2
f = \ x_atQ -> h x_atQ f1
Whether you can rely on this is a slightly different question though.
There are lots of heuristics in GHC's optimiser which mean that in more
complicated situations it will not always apply full-laziness as much as
possible. Moreover, full-laziness can introduce space leaks. My
colleague Edsko wrote a nice blog post a few years ago that explores
this in more detail:
https://www.well-typed.com/blog/2016/09/sharing-conduit/
Hope this helps,
Adam
--
Adam Gundry, Haskell Consultant
Well-Typed LLP, https://www.well-typed.com/
Registered in England & Wales, OC335890
27 Old Gloucester Street, London WC1N 3AX, England
More information about the Glasgow-haskell-users
mailing list