[Haskell-cafe] Time performance with inlining and space performance with newtype?

Fri Oct 9 14:16:04 UTC 2015

Thank you Jonas, both answers were very nicely explained!

On 9 October 2015 at 14:48, Jonas Scholl <anselm.scholl at tu-harburg.de> wrote:
> On 10/09/2015 01:30 PM, Rob Stewart wrote:
>> Hi,
>>
>> I'm in search of clarifications and descriptions about code
>> optimisations that are recommended in GHC docs, regarding inlining and
>> newtype vs data.
>>
>> 1. Inlining small and often used functions
>>
>> See https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/pragmas.html
>> , Section "7.22.6.1. INLINE pragma". It says:
>>
>> "GHC tries to inline functions/values that are 'small enough', thus
>> avoiding the call overhead and possibly exposing other more-wonderful
>> optimisations."
>>
>> I understand the reason for function inlining with other languages
>> like C, as it avoids the construction of new stack frames and
>> coordinating stack pointers for each function call. In Haskell, rather
>> than call stack we've got evaluation stacks.
>
> I am not an expert, but I can try to share my understanding of the issue.
>
> For one, you have the possibility for other optimisations. Consider:
>
> addThree :: Int -> Int -> Int
> addThree a b c = a + b + c
>
> This is compiled to the following core (simplified):
>
> addThree = \ a b c -> (+) $dNum_Int ((+) $dNum_Int a b) c
>
> (+) is a record selector for the Num dictionary. By inlining (+) and
> $dNum_Int we get this core:
>
> add_Int = \ a b -> case a of
>         I# a# -> case b of
>                 I# b# -> I# (a# +# b#)
>
> addThree = \ a b c -> add_Int (add_Int a b) c
>
> where Int is defined as
> data Int = I# Int#
>
> and (+#) implements primitive (C-like) addition and has type Int# ->
> Int# -> Int#.
>
> add_Int is again a small function and we inline it:
>
> addThree = \ a b c -> add_Int (case a of
>         I# a# -> case b of
>                 I# b# -> I# (a# +# b#)) c
>
> So now we can see the code allocates an Int (the I# constructor) only to
> pattern match on it immediately again. If we inline the second
> occurrence of add_Int, we get this:
>
> addThree = \ a b c -> case (case a of
>         I# a# -> case b of
>                 I# b# -> I# (a# +# b#)) of
>         I# a' -> case c of
>                 I# c# -> I# (a' +# c#)
>
> Now we can not inline further, but the simplifier can apply
> transformations it couldn't do previously. For example it can apply the
> case-of-case transformation:
>
> addThree = \ a b c -> case a of
>         I# a# -> case b of
>                 I# b# -> case I# (a# +# b#) of
>                         I# a' -> case c of
>                                 I# c# -> I# (a' +# c#)
>
> Now it sees it can eliminate the I# constructor:
>
> addThree = \ a b c -> case a of
>         I# a# -> case b of
>                 I# b# -> case a# +# b# of
>                         a' -> case c of
>                                 I# c# -> I# (a' +# c#)
>
> I think GHC now floats the addition to its usage site, so we get:
>
> addThree = \ a b c -> case a of
>         I# a# -> case b of
>                 I# b# -> case c of
>                         I# c# -> I# ((a# +# b#) +# c#)
>
> So inlining did not only save some function calls, but we also avoided
> to allocate a boxed Int value.
>
>>
>> Here are my questions: what exactly is the call overhead of a
>> non-inlned Haskell function? What additional runtime work needs to
>> happen when calling to a non-inlined function? Specifically what other
>> GHC optimisations are possible only if function inlining happens
>> first? Can anyone provide a simple example where 1) a function is not
>> inlined, versus 2) when that function is inlined, which demonstrates
>> two things: 1) a GHC report of the additional optimisations that were
>> applied because of inlining and 2) shorter program runtime as a result
>> of inlining.
>
> As I tried to show in the above example, its not about the call
> overhead, but additional possible simplifications. Inlining a function
> can make polymorphic arguments monomorphic and provides the possibility
> to inline other small functions. It can further expose combinations of
> functions used for list- or stream-fusion (many fusing functions are
> implemented as unstream . streamedOp . stream with a rule stream .
> unsteam = id). Without inlining, fusion would not work at all.
>
>>
>> 2. Calculating the memory costs of a data type
>>
>> As Don Stewart nicely explains on SO
>> http://stackoverflow.com/a/5889784/1526266 , we should always use
>> newtype when we have only a single constructor which has a single
>> field. His example is this:
>>
>>     data Book = Book Int Int
>>     newtype Book = Book (Int, Int)
>>
>> With newtype, the `Book` construct is guaranteed to be erased at
>> compile time, to have the same representation as `(Int,Int)`. How to I
>> measure the memory space costs of having `Book Int Int` hanging around
>> at runtime, as opposed to just `(Int,Int)`? How many bytes does it
>> cost to store `Book 4 5`? How many bytes does it cost to store `(4,5)`
>> ? Can I ask GHC to tell me how many bytes the storage of each object
>> would require? Or more generally, would looking at GHC Core for all my
>> data structures inform me of the memory space costs for each of them?
>
> There is no difference in your example. Consider instead
>
> data BookD = BookD Int
> newtype BookN = BookN Int
>
> BookD consumes 2 pointers (info pointer and pointer to the boxed Int)
> while BookN has only the size of a boxed Int (a pointer and a machine
> word for the Int# stored in the boxed Int). So by using a newtype you
> save 2 pointers and an additional indirection. It also changes the
> semantics of the code: Because a newtype is "free", it has to be strict
> in its one and only field. So if you evaluate BookD undefined to WHNF,
> you will get no exception while evaluating BookN undefined will throw an
> exception.
>
> Back to your example: Here we have either a constructor with two fields,
> so we need three pointers to store the object (one for the info
> pointer), or we have a tuple, which also contains two pointers, so it is
> again three pointers large. The only difference is the newtype can reuse
> the already existing info table for the (,) data type while the Book
> data type needs its own info table, so we save a few bytes of generated
> code (if at all, I do not know if GHC reuses info tables in such a
> situation).
>
>>
>> Thanks
>>
>
> You're welcome.
>
>> --
>> Rob
>> _______________________________________________
>> Haskell-Cafe mailing list
>> Haskell-Cafe at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>>
>
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe
>