[Haskell-cafe] Time performance with inlining and space performance with newtype?

Fri Oct 9 13:48:06 UTC 2015

On 10/09/2015 01:30 PM, Rob Stewart wrote:
> Hi,
> 
> I'm in search of clarifications and descriptions about code
> optimisations that are recommended in GHC docs, regarding inlining and
> newtype vs data.
> 
> 1. Inlining small and often used functions
> 
> See https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/pragmas.html
> , Section "7.22.6.1. INLINE pragma". It says:
> 
> "GHC tries to inline functions/values that are 'small enough', thus
> avoiding the call overhead and possibly exposing other more-wonderful
> optimisations."
> 
> I understand the reason for function inlining with other languages
> like C, as it avoids the construction of new stack frames and
> coordinating stack pointers for each function call. In Haskell, rather
> than call stack we've got evaluation stacks.

I am not an expert, but I can try to share my understanding of the issue.

For one, you have the possibility for other optimisations. Consider:

addThree :: Int -> Int -> Int
addThree a b c = a + b + c

This is compiled to the following core (simplified):

addThree = \ a b c -> (+) $dNum_Int ((+) $dNum_Int a b) c

(+) is a record selector for the Num dictionary. By inlining (+) and
$dNum_Int we get this core:

add_Int = \ a b -> case a of
	I# a# -> case b of
		I# b# -> I# (a# +# b#)

addThree = \ a b c -> add_Int (add_Int a b) c

where Int is defined as
data Int = I# Int#

and (+#) implements primitive (C-like) addition and has type Int# ->
Int# -> Int#.

add_Int is again a small function and we inline it:

addThree = \ a b c -> add_Int (case a of
	I# a# -> case b of
		I# b# -> I# (a# +# b#)) c

So now we can see the code allocates an Int (the I# constructor) only to
pattern match on it immediately again. If we inline the second
occurrence of add_Int, we get this:

addThree = \ a b c -> case (case a of
	I# a# -> case b of
		I# b# -> I# (a# +# b#)) of
	I# a' -> case c of
		I# c# -> I# (a' +# c#)

Now we can not inline further, but the simplifier can apply
transformations it couldn't do previously. For example it can apply the
case-of-case transformation:

addThree = \ a b c -> case a of
	I# a# -> case b of
		I# b# -> case I# (a# +# b#) of
			I# a' -> case c of
				I# c# -> I# (a' +# c#)

Now it sees it can eliminate the I# constructor:

addThree = \ a b c -> case a of
	I# a# -> case b of
		I# b# -> case a# +# b# of
			a' -> case c of
				I# c# -> I# (a' +# c#)

I think GHC now floats the addition to its usage site, so we get:

addThree = \ a b c -> case a of
	I# a# -> case b of
		I# b# -> case c of
			I# c# -> I# ((a# +# b#) +# c#)

So inlining did not only save some function calls, but we also avoided
to allocate a boxed Int value.

> 
> Here are my questions: what exactly is the call overhead of a
> non-inlned Haskell function? What additional runtime work needs to
> happen when calling to a non-inlined function? Specifically what other
> GHC optimisations are possible only if function inlining happens
> first? Can anyone provide a simple example where 1) a function is not
> inlined, versus 2) when that function is inlined, which demonstrates
> two things: 1) a GHC report of the additional optimisations that were
> applied because of inlining and 2) shorter program runtime as a result
> of inlining.

As I tried to show in the above example, its not about the call
overhead, but additional possible simplifications. Inlining a function
can make polymorphic arguments monomorphic and provides the possibility
to inline other small functions. It can further expose combinations of
functions used for list- or stream-fusion (many fusing functions are
implemented as unstream . streamedOp . stream with a rule stream .
unsteam = id). Without inlining, fusion would not work at all.

> 
> 2. Calculating the memory costs of a data type
> 
> As Don Stewart nicely explains on SO
> http://stackoverflow.com/a/5889784/1526266 , we should always use
> newtype when we have only a single constructor which has a single
> field. His example is this:
> 
>     data Book = Book Int Int
>     newtype Book = Book (Int, Int)
> 
> With newtype, the `Book` construct is guaranteed to be erased at
> compile time, to have the same representation as `(Int,Int)`. How to I
> measure the memory space costs of having `Book Int Int` hanging around
> at runtime, as opposed to just `(Int,Int)`? How many bytes does it
> cost to store `Book 4 5`? How many bytes does it cost to store `(4,5)`
> ? Can I ask GHC to tell me how many bytes the storage of each object
> would require? Or more generally, would looking at GHC Core for all my
> data structures inform me of the memory space costs for each of them?

There is no difference in your example. Consider instead

data BookD = BookD Int
newtype BookN = BookN Int

BookD consumes 2 pointers (info pointer and pointer to the boxed Int)
while BookN has only the size of a boxed Int (a pointer and a machine
word for the Int# stored in the boxed Int). So by using a newtype you
save 2 pointers and an additional indirection. It also changes the
semantics of the code: Because a newtype is "free", it has to be strict
in its one and only field. So if you evaluate BookD undefined to WHNF,
you will get no exception while evaluating BookN undefined will throw an
exception.

Back to your example: Here we have either a constructor with two fields,
so we need three pointers to store the object (one for the info
pointer), or we have a tuple, which also contains two pointers, so it is
again three pointers large. The only difference is the newtype can reuse
the already existing info table for the (,) data type while the Book
data type needs its own info table, so we save a few bytes of generated
code (if at all, I do not know if GHC reuses info tables in such a
situation).

> 
> Thanks
> 

You're welcome.

> --
> Rob
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20151009/9cbb4da7/attachment.sig>