From george.colpitts at gmail.com  Sun Apr  2 11:41:07 2023
From: george.colpitts at gmail.com (George Colpitts)
Date: Sun, 2 Apr 2023 08:41:07 -0300
Subject: =?UTF-8?Q?on_my_mac_9=2E6=2E1_can=27t_compile_with_ghc_=2Dprof_=2Dfprof=2D?=
 =?UTF-8?Q?auto_=2C_gets_error_Could_not_find_module_=E2=80=98Prelude=E2=80=99?=
In-Reply-To: <87edpupvhh.fsf@smart-cactus.org>
References: <87edpupvhh.fsf@smart-cactus.org>
Message-ID: <CAB-d4A48-Q9f5VD0rqBGvA4kOHuwNaGFrTM37MJ97ve7Wgq6hw@mail.gmail.com>

Hello
On my mac, in ghc  9.6.1 can't compile with ghc -prof -fprof-auto  , I get
the error:

Could not find module ‘Prelude’    Perhaps you haven't installed the
"p_dyn" libraries for package ‘base-4.18.0.0’?    Use -v (or `:set -v`
in ghci) to see a list of the files searched for.

Is anybody else experiencing this?


In more detail:

> ghc -prof  -fprof-auto hello.hsLoaded package environment from /Users/gcolpitts/.ghc/x86_64-darwin-9.6.1/environments/default[1 of 2] Compiling Main             ( hello.hs, hello.o )hello.hs:1:1: error:    Could not find module ‘Prelude’    Perhaps you haven't installed the "p_dyn" libraries for package ‘base-4.18.0.0’?    Use -v (or `:set -v` in ghci) to see a list of the files searched for.  |1 | main = print "hello"  | ^

hello.hs consists of

main = print "hello"

I reported this in 9.2.3 , in #21709
<https://gitlab.haskell.org/ghc/ghc/-/issues/21709>. At that time there was
a workaround of adding -static but that no longer works. It gives a
slightly different error message:

ghc -prof  -fprof-auto -static hello.hs
Loaded package environment from
/Users/gcolpitts/.ghc/x86_64-darwin-9.6.1/environments/default
[2 of 2] Linking hello
ld: warning: directory not found for option '-L/opt/local/lib/'
ld: library not found for -lHStyp-qlty-1-186ccc78_p
clang: error: linker command failed with exit code 1 (use -v to see
invocation)
ghc-9.6.1: `gcc' failed in phase `Linker'. (Exit code: 1)

I have updated 21709 with the details of the problem on 9.6.1

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230402/84b281e5/attachment.html>

From harendra.kumar at gmail.com  Wed Apr  5 01:50:30 2023
From: harendra.kumar at gmail.com (Harendra Kumar)
Date: Wed, 5 Apr 2023 07:20:30 +0530
Subject: Performance of small allocations via prim ops
Message-ID: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>

I was looking at the RTS code for allocating small objects via prim ops
e.g. newByteArray# . The code looks like:

stg_newByteArrayzh ( W_ n )
{
    MAYBE_GC_N(stg_newByteArrayzh, n);

    payload_words = ROUNDUP_BYTES_TO_WDS(n);
    words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
    ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);

We are making a foreign call here (ccall). I am wondering how much overhead
a ccall adds? I guess it may have to save and restore registers. Would it
be better to do the fast path case of allocating small objects from the
nursery using cmm code like in stg_gc_noregs?

-harendra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230405/8cfa3769/attachment.html>

From carter.schonwald at gmail.com  Thu Apr  6 20:47:51 2023
From: carter.schonwald at gmail.com (Carter Schonwald)
Date: Thu, 6 Apr 2023 16:47:51 -0400
Subject: Performance of small allocations via prim ops
In-Reply-To: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
Message-ID: <CAHYVw0y9stsVU=tGv49MupshrJmEyVqSWR7zhucmrTXOKBM2kg@mail.gmail.com>

That sounds like a worthy experiment!

I  guess that would look like having an inline macro’d up path that checks
if it can get the job done that falls back to the general code?

Last I checked, the overhead for this sort of c call was on the order of
10nanoseconds or less which seems like it’d be very unlikely to be a
bottleneck, but do you have any natural or artificial benchmark programs
that would show case this?

For this sortah code, extra branching for that optimization could easily
have a larger performance impact than the known function call on modern
hardware.  (Though take my intuitions about these things with a grain of
salt. )

On Tue, Apr 4, 2023 at 9:50 PM Harendra Kumar <harendra.kumar at gmail.com>
wrote:

> I was looking at the RTS code for allocating small objects via prim ops
> e.g. newByteArray# . The code looks like:
>
> stg_newByteArrayzh ( W_ n )
> {
>     MAYBE_GC_N(stg_newByteArrayzh, n);
>
>     payload_words = ROUNDUP_BYTES_TO_WDS(n);
>     words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
>     ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);
>
> We are making a foreign call here (ccall). I am wondering how much
> overhead a ccall adds? I guess it may have to save and restore registers.
> Would it be better to do the fast path case of allocating small objects
> from the nursery using cmm code like in stg_gc_noregs?
>
> -harendra
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230406/0f923855/attachment.html>

From ben at smart-cactus.org  Thu Apr  6 22:02:08 2023
From: ben at smart-cactus.org (Ben Gamari)
Date: Thu, 06 Apr 2023 18:02:08 -0400
Subject: Performance of small allocations via prim ops
In-Reply-To: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
Message-ID: <87fs9cllg1.fsf@smart-cactus.org>

Harendra Kumar <harendra.kumar at gmail.com> writes:

> I was looking at the RTS code for allocating small objects via prim ops
> e.g. newByteArray# . The code looks like:
>
> stg_newByteArrayzh ( W_ n )
> {
>     MAYBE_GC_N(stg_newByteArrayzh, n);
>
>     payload_words = ROUNDUP_BYTES_TO_WDS(n);
>     words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
>     ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);
>
> We are making a foreign call here (ccall). I am wondering how much overhead
> a ccall adds? I guess it may have to save and restore registers. Would it
> be better to do the fast path case of allocating small objects from the
> nursery using cmm code like in stg_gc_noregs?
>
GHC's operational model is designed in such a way that foreign calls are
fairly cheap (e.g. we don't need to switch stacks, which can be quite
costly). Judging by the assembler produced for newByteArray# in one
random x86-64 tree that I have lying around, it's only a couple of
data-movement instructions, an %eax clear, and a stack pop:

      36:       48 89 ce                mov    %rcx,%rsi
      39:       48 89 c7                mov    %rax,%rdi
      3c:       31 c0                   xor    %eax,%eax
      3e:       e8 00 00 00 00          call   43 <stg_newByteArrayzh+0x43>
      43:       48 83 c4 08             add    $0x8,%rsp

The data movement operations in particular are quite cheap on most
microarchitectures where GHC would run due to register renaming. I doubt
that this overhead would be noticable in anything but a synthetic
benchmark. However, it never hurts to measure.

Cheers,

- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 905 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230406/6adbbf93/attachment.sig>

From harendra.kumar at gmail.com  Fri Apr  7 05:19:59 2023
From: harendra.kumar at gmail.com (Harendra Kumar)
Date: Fri, 7 Apr 2023 10:49:59 +0530
Subject: Performance of small allocations via prim ops
In-Reply-To: <87fs9cllg1.fsf@smart-cactus.org>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
 <87fs9cllg1.fsf@smart-cactus.org>
Message-ID: <CAPW+kkb2VZbJ=X+c=nrCT_6nM1xRaYDbB=VjnQsDp1RCnYqLSw@mail.gmail.com>

Thanks Ben and Carter.

I compiled the following to Cmm:

{-# LANGUAGE MagicHash #-}
{-# LANGUAGE UnboxedTuples #-}

import GHC.IO
import GHC.Exts

data M = M (MutableByteArray# RealWorld)

main = do
     _ <- IO (\s -> case newByteArray# 1# s of (# s1, arr #) -> (# s1, M
arr #))
     return ()

It produced the following Cmm:

     {offset
       c1k3: // global
           Hp = Hp + 24;
           if (Hp > HpLim) (likely: False) goto c1k7; else goto c1k6;
       c1k7: // global
           HpAlloc = 24;
           R1 = Main.main1_closure;
           call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8;
       c1k6: // global
           I64[Hp - 16] = stg_ARR_WORDS_info;
           I64[Hp - 8] = 1;
           R1 = GHC.Tuple.()_closure+1;
           call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
     }

It seems to be as good as it gets. There is absolutely no scope for
improvement in this.

-harendra

On Fri, 7 Apr 2023 at 03:32, Ben Gamari <ben at smart-cactus.org> wrote:

> Harendra Kumar <harendra.kumar at gmail.com> writes:
>
> > I was looking at the RTS code for allocating small objects via prim ops
> > e.g. newByteArray# . The code looks like:
> >
> > stg_newByteArrayzh ( W_ n )
> > {
> >     MAYBE_GC_N(stg_newByteArrayzh, n);
> >
> >     payload_words = ROUNDUP_BYTES_TO_WDS(n);
> >     words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
> >     ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);
> >
> > We are making a foreign call here (ccall). I am wondering how much
> overhead
> > a ccall adds? I guess it may have to save and restore registers. Would it
> > be better to do the fast path case of allocating small objects from the
> > nursery using cmm code like in stg_gc_noregs?
> >
> GHC's operational model is designed in such a way that foreign calls are
> fairly cheap (e.g. we don't need to switch stacks, which can be quite
> costly). Judging by the assembler produced for newByteArray# in one
> random x86-64 tree that I have lying around, it's only a couple of
> data-movement instructions, an %eax clear, and a stack pop:
>
>       36:       48 89 ce                mov    %rcx,%rsi
>       39:       48 89 c7                mov    %rax,%rdi
>       3c:       31 c0                   xor    %eax,%eax
>       3e:       e8 00 00 00 00          call   43 <stg_newByteArrayzh+0x43>
>       43:       48 83 c4 08             add    $0x8,%rsp
>
> The data movement operations in particular are quite cheap on most
> microarchitectures where GHC would run due to register renaming. I doubt
> that this overhead would be noticable in anything but a synthetic
> benchmark. However, it never hurts to measure.
>
> Cheers,
>
> - Ben
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230407/81dfd90f/attachment.html>

From harendra.kumar at gmail.com  Fri Apr  7 05:38:16 2023
From: harendra.kumar at gmail.com (Harendra Kumar)
Date: Fri, 7 Apr 2023 11:08:16 +0530
Subject: Performance of small allocations via prim ops
In-Reply-To: <CAPW+kkb2VZbJ=X+c=nrCT_6nM1xRaYDbB=VjnQsDp1RCnYqLSw@mail.gmail.com>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
 <87fs9cllg1.fsf@smart-cactus.org>
 <CAPW+kkb2VZbJ=X+c=nrCT_6nM1xRaYDbB=VjnQsDp1RCnYqLSw@mail.gmail.com>
Message-ID: <CAPW+kkbpz3weeeL51_XdTh0rv9XUOUgaFeurG-1hKOzgDgW6ZQ@mail.gmail.com>

Ah, some other optimization seems to be kicking in here. When I increase
the size of the array to > 128 then I see a call to stg_newByteArray# being
emitted:

     {offset
       c1kb: // global
           if ((Sp + -8) < SpLim) (likely: False) goto c1kc; else goto c1kd;
       c1kc: // global
           R1 = Main.main1_closure;
           call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8;
       c1kd: // global
           I64[Sp - 8] = c1k9;
           R1 = 129;
           Sp = Sp - 8;
           call stg_newByteArray#(R1) returns to c1k9, args: 8, res: 8,
upd: 8;

-harendra

On Fri, 7 Apr 2023 at 10:49, Harendra Kumar <harendra.kumar at gmail.com>
wrote:

> Thanks Ben and Carter.
>
> I compiled the following to Cmm:
>
> {-# LANGUAGE MagicHash #-}
> {-# LANGUAGE UnboxedTuples #-}
>
> import GHC.IO
> import GHC.Exts
>
> data M = M (MutableByteArray# RealWorld)
>
> main = do
>      _ <- IO (\s -> case newByteArray# 1# s of (# s1, arr #) -> (# s1, M
> arr #))
>      return ()
>
> It produced the following Cmm:
>
>      {offset
>        c1k3: // global
>            Hp = Hp + 24;
>            if (Hp > HpLim) (likely: False) goto c1k7; else goto c1k6;
>        c1k7: // global
>            HpAlloc = 24;
>            R1 = Main.main1_closure;
>            call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8;
>        c1k6: // global
>            I64[Hp - 16] = stg_ARR_WORDS_info;
>            I64[Hp - 8] = 1;
>            R1 = GHC.Tuple.()_closure+1;
>            call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
>      }
>
> It seems to be as good as it gets. There is absolutely no scope for
> improvement in this.
>
> -harendra
>
> On Fri, 7 Apr 2023 at 03:32, Ben Gamari <ben at smart-cactus.org> wrote:
>
>> Harendra Kumar <harendra.kumar at gmail.com> writes:
>>
>> > I was looking at the RTS code for allocating small objects via prim ops
>> > e.g. newByteArray# . The code looks like:
>> >
>> > stg_newByteArrayzh ( W_ n )
>> > {
>> >     MAYBE_GC_N(stg_newByteArrayzh, n);
>> >
>> >     payload_words = ROUNDUP_BYTES_TO_WDS(n);
>> >     words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
>> >     ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);
>> >
>> > We are making a foreign call here (ccall). I am wondering how much
>> overhead
>> > a ccall adds? I guess it may have to save and restore registers. Would
>> it
>> > be better to do the fast path case of allocating small objects from the
>> > nursery using cmm code like in stg_gc_noregs?
>> >
>> GHC's operational model is designed in such a way that foreign calls are
>> fairly cheap (e.g. we don't need to switch stacks, which can be quite
>> costly). Judging by the assembler produced for newByteArray# in one
>> random x86-64 tree that I have lying around, it's only a couple of
>> data-movement instructions, an %eax clear, and a stack pop:
>>
>>       36:       48 89 ce                mov    %rcx,%rsi
>>       39:       48 89 c7                mov    %rax,%rdi
>>       3c:       31 c0                   xor    %eax,%eax
>>       3e:       e8 00 00 00 00          call   43
>> <stg_newByteArrayzh+0x43>
>>       43:       48 83 c4 08             add    $0x8,%rsp
>>
>> The data movement operations in particular are quite cheap on most
>> microarchitectures where GHC would run due to register renaming. I doubt
>> that this overhead would be noticable in anything but a synthetic
>> benchmark. However, it never hurts to measure.
>>
>> Cheers,
>>
>> - Ben
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230407/227a264a/attachment.html>

From harendra.kumar at gmail.com  Fri Apr  7 06:07:05 2023
From: harendra.kumar at gmail.com (Harendra Kumar)
Date: Fri, 7 Apr 2023 11:37:05 +0530
Subject: Performance of small allocations via prim ops
In-Reply-To: <CAPW+kkbpz3weeeL51_XdTh0rv9XUOUgaFeurG-1hKOzgDgW6ZQ@mail.gmail.com>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
 <87fs9cllg1.fsf@smart-cactus.org>
 <CAPW+kkb2VZbJ=X+c=nrCT_6nM1xRaYDbB=VjnQsDp1RCnYqLSw@mail.gmail.com>
 <CAPW+kkbpz3weeeL51_XdTh0rv9XUOUgaFeurG-1hKOzgDgW6ZQ@mail.gmail.com>
Message-ID: <CAPW+kkZzVDssJ_DCCL5MuVn5PBnN8mikYLAWjf45VzBB0h5tyA@mail.gmail.com>

Little bit of grepping in the code gave me this:

emitPrimOp cfg primop =
  let max_inl_alloc_size = fromIntegral (stgToCmmMaxInlAllocSize cfg)
  in case primop of
  NewByteArrayOp_Char -> \case
    [(CmmLit (CmmInt n w))]
      | asUnsigned w n <= max_inl_alloc_size     --
<------------------------------- see this line
      -> opIntoRegs  $ \ [res] -> doNewByteArrayOp res (fromInteger n)
    _ -> PrimopCmmEmit_External

We are emitting a more efficient code when the size of the array is
smaller. And the threshold is governed by a compiler flag:

  , make_ord_flag defGhcFlag "fmax-inline-alloc-size"
      (intSuffix (\n d -> d { maxInlineAllocSize = n }))

This means allocation of smaller arrays is extremely efficient and we can
control it using `-fmax-inline-alloc-size`, the default is 128. That's a
new thing I learnt today.

Given this new finding, my original question now applies only to the case
when the array size is bigger than this configurable threshold, which is a
little less motivating. And Ben says that the call is not expensive, so we
can leave it there.

-harendra

On Fri, 7 Apr 2023 at 11:08, Harendra Kumar <harendra.kumar at gmail.com>
wrote:

> Ah, some other optimization seems to be kicking in here. When I increase
> the size of the array to > 128 then I see a call to stg_newByteArray# being
> emitted:
>
>      {offset
>        c1kb: // global
>            if ((Sp + -8) < SpLim) (likely: False) goto c1kc; else goto
> c1kd;
>        c1kc: // global
>            R1 = Main.main1_closure;
>            call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8;
>        c1kd: // global
>            I64[Sp - 8] = c1k9;
>            R1 = 129;
>            Sp = Sp - 8;
>            call stg_newByteArray#(R1) returns to c1k9, args: 8, res: 8,
> upd: 8;
>
> -harendra
>
> On Fri, 7 Apr 2023 at 10:49, Harendra Kumar <harendra.kumar at gmail.com>
> wrote:
>
>> Thanks Ben and Carter.
>>
>> I compiled the following to Cmm:
>>
>> {-# LANGUAGE MagicHash #-}
>> {-# LANGUAGE UnboxedTuples #-}
>>
>> import GHC.IO
>> import GHC.Exts
>>
>> data M = M (MutableByteArray# RealWorld)
>>
>> main = do
>>      _ <- IO (\s -> case newByteArray# 1# s of (# s1, arr #) -> (# s1, M
>> arr #))
>>      return ()
>>
>> It produced the following Cmm:
>>
>>      {offset
>>        c1k3: // global
>>            Hp = Hp + 24;
>>            if (Hp > HpLim) (likely: False) goto c1k7; else goto c1k6;
>>        c1k7: // global
>>            HpAlloc = 24;
>>            R1 = Main.main1_closure;
>>            call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8;
>>        c1k6: // global
>>            I64[Hp - 16] = stg_ARR_WORDS_info;
>>            I64[Hp - 8] = 1;
>>            R1 = GHC.Tuple.()_closure+1;
>>            call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
>>      }
>>
>> It seems to be as good as it gets. There is absolutely no scope for
>> improvement in this.
>>
>> -harendra
>>
>> On Fri, 7 Apr 2023 at 03:32, Ben Gamari <ben at smart-cactus.org> wrote:
>>
>>> Harendra Kumar <harendra.kumar at gmail.com> writes:
>>>
>>> > I was looking at the RTS code for allocating small objects via prim ops
>>> > e.g. newByteArray# . The code looks like:
>>> >
>>> > stg_newByteArrayzh ( W_ n )
>>> > {
>>> >     MAYBE_GC_N(stg_newByteArrayzh, n);
>>> >
>>> >     payload_words = ROUNDUP_BYTES_TO_WDS(n);
>>> >     words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
>>> >     ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);
>>> >
>>> > We are making a foreign call here (ccall). I am wondering how much
>>> overhead
>>> > a ccall adds? I guess it may have to save and restore registers. Would
>>> it
>>> > be better to do the fast path case of allocating small objects from the
>>> > nursery using cmm code like in stg_gc_noregs?
>>> >
>>> GHC's operational model is designed in such a way that foreign calls are
>>> fairly cheap (e.g. we don't need to switch stacks, which can be quite
>>> costly). Judging by the assembler produced for newByteArray# in one
>>> random x86-64 tree that I have lying around, it's only a couple of
>>> data-movement instructions, an %eax clear, and a stack pop:
>>>
>>>       36:       48 89 ce                mov    %rcx,%rsi
>>>       39:       48 89 c7                mov    %rax,%rdi
>>>       3c:       31 c0                   xor    %eax,%eax
>>>       3e:       e8 00 00 00 00          call   43
>>> <stg_newByteArrayzh+0x43>
>>>       43:       48 83 c4 08             add    $0x8,%rsp
>>>
>>> The data movement operations in particular are quite cheap on most
>>> microarchitectures where GHC would run due to register renaming. I doubt
>>> that this overhead would be noticable in anything but a synthetic
>>> benchmark. However, it never hurts to measure.
>>>
>>> Cheers,
>>>
>>> - Ben
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230407/ce12ab99/attachment.html>

From simon.peytonjones at gmail.com  Fri Apr  7 07:28:56 2023
From: simon.peytonjones at gmail.com (Simon Peyton Jones)
Date: Fri, 7 Apr 2023 08:28:56 +0100
Subject: Performance of small allocations via prim ops
In-Reply-To: <CAPW+kkZzVDssJ_DCCL5MuVn5PBnN8mikYLAWjf45VzBB0h5tyA@mail.gmail.com>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
 <87fs9cllg1.fsf@smart-cactus.org>
 <CAPW+kkb2VZbJ=X+c=nrCT_6nM1xRaYDbB=VjnQsDp1RCnYqLSw@mail.gmail.com>
 <CAPW+kkbpz3weeeL51_XdTh0rv9XUOUgaFeurG-1hKOzgDgW6ZQ@mail.gmail.com>
 <CAPW+kkZzVDssJ_DCCL5MuVn5PBnN8mikYLAWjf45VzBB0h5tyA@mail.gmail.com>
Message-ID: <CAJKmMz87fghKaMPkeyGQ+Nm26WKty-qM8PzhL5+5dfJEHmAz8A@mail.gmail.com>

> We are emitting a more efficient code when the size of the array is
smaller. And the threshold is governed by a compiler flag:

It would be good if this was documented.  Perhaps in the Haddock for
`newByteArray#`?  Or where?

S

On Fri, 7 Apr 2023 at 07:07, Harendra Kumar <harendra.kumar at gmail.com>
wrote:

> Little bit of grepping in the code gave me this:
>
> emitPrimOp cfg primop =
>   let max_inl_alloc_size = fromIntegral (stgToCmmMaxInlAllocSize cfg)
>   in case primop of
>   NewByteArrayOp_Char -> \case
>     [(CmmLit (CmmInt n w))]
>       | asUnsigned w n <= max_inl_alloc_size     --
> <------------------------------- see this line
>       -> opIntoRegs  $ \ [res] -> doNewByteArrayOp res (fromInteger n)
>     _ -> PrimopCmmEmit_External
>
> We are emitting a more efficient code when the size of the array is
> smaller. And the threshold is governed by a compiler flag:
>
>   , make_ord_flag defGhcFlag "fmax-inline-alloc-size"
>       (intSuffix (\n d -> d { maxInlineAllocSize = n }))
>
> This means allocation of smaller arrays is extremely efficient and we can
> control it using `-fmax-inline-alloc-size`, the default is 128. That's a
> new thing I learnt today.
>
> Given this new finding, my original question now applies only to the case
> when the array size is bigger than this configurable threshold, which is a
> little less motivating. And Ben says that the call is not expensive, so we
> can leave it there.
>
> -harendra
>
> On Fri, 7 Apr 2023 at 11:08, Harendra Kumar <harendra.kumar at gmail.com>
> wrote:
>
>> Ah, some other optimization seems to be kicking in here. When I increase
>> the size of the array to > 128 then I see a call to stg_newByteArray# being
>> emitted:
>>
>>      {offset
>>        c1kb: // global
>>            if ((Sp + -8) < SpLim) (likely: False) goto c1kc; else goto
>> c1kd;
>>        c1kc: // global
>>            R1 = Main.main1_closure;
>>            call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8;
>>        c1kd: // global
>>            I64[Sp - 8] = c1k9;
>>            R1 = 129;
>>            Sp = Sp - 8;
>>            call stg_newByteArray#(R1) returns to c1k9, args: 8, res: 8,
>> upd: 8;
>>
>> -harendra
>>
>> On Fri, 7 Apr 2023 at 10:49, Harendra Kumar <harendra.kumar at gmail.com>
>> wrote:
>>
>>> Thanks Ben and Carter.
>>>
>>> I compiled the following to Cmm:
>>>
>>> {-# LANGUAGE MagicHash #-}
>>> {-# LANGUAGE UnboxedTuples #-}
>>>
>>> import GHC.IO
>>> import GHC.Exts
>>>
>>> data M = M (MutableByteArray# RealWorld)
>>>
>>> main = do
>>>      _ <- IO (\s -> case newByteArray# 1# s of (# s1, arr #) -> (# s1, M
>>> arr #))
>>>      return ()
>>>
>>> It produced the following Cmm:
>>>
>>>      {offset
>>>        c1k3: // global
>>>            Hp = Hp + 24;
>>>            if (Hp > HpLim) (likely: False) goto c1k7; else goto c1k6;
>>>        c1k7: // global
>>>            HpAlloc = 24;
>>>            R1 = Main.main1_closure;
>>>            call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8;
>>>        c1k6: // global
>>>            I64[Hp - 16] = stg_ARR_WORDS_info;
>>>            I64[Hp - 8] = 1;
>>>            R1 = GHC.Tuple.()_closure+1;
>>>            call (P64[Sp])(R1) args: 8, res: 0, upd: 8;
>>>      }
>>>
>>> It seems to be as good as it gets. There is absolutely no scope for
>>> improvement in this.
>>>
>>> -harendra
>>>
>>> On Fri, 7 Apr 2023 at 03:32, Ben Gamari <ben at smart-cactus.org> wrote:
>>>
>>>> Harendra Kumar <harendra.kumar at gmail.com> writes:
>>>>
>>>> > I was looking at the RTS code for allocating small objects via prim
>>>> ops
>>>> > e.g. newByteArray# . The code looks like:
>>>> >
>>>> > stg_newByteArrayzh ( W_ n )
>>>> > {
>>>> >     MAYBE_GC_N(stg_newByteArrayzh, n);
>>>> >
>>>> >     payload_words = ROUNDUP_BYTES_TO_WDS(n);
>>>> >     words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words;
>>>> >     ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words);
>>>> >
>>>> > We are making a foreign call here (ccall). I am wondering how much
>>>> overhead
>>>> > a ccall adds? I guess it may have to save and restore registers.
>>>> Would it
>>>> > be better to do the fast path case of allocating small objects from
>>>> the
>>>> > nursery using cmm code like in stg_gc_noregs?
>>>> >
>>>> GHC's operational model is designed in such a way that foreign calls are
>>>> fairly cheap (e.g. we don't need to switch stacks, which can be quite
>>>> costly). Judging by the assembler produced for newByteArray# in one
>>>> random x86-64 tree that I have lying around, it's only a couple of
>>>> data-movement instructions, an %eax clear, and a stack pop:
>>>>
>>>>       36:       48 89 ce                mov    %rcx,%rsi
>>>>       39:       48 89 c7                mov    %rax,%rdi
>>>>       3c:       31 c0                   xor    %eax,%eax
>>>>       3e:       e8 00 00 00 00          call   43
>>>> <stg_newByteArrayzh+0x43>
>>>>       43:       48 83 c4 08             add    $0x8,%rsp
>>>>
>>>> The data movement operations in particular are quite cheap on most
>>>> microarchitectures where GHC would run due to register renaming. I doubt
>>>> that this overhead would be noticable in anything but a synthetic
>>>> benchmark. However, it never hurts to measure.
>>>>
>>>> Cheers,
>>>>
>>>> - Ben
>>>>
>>> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230407/cfa76ad8/attachment.html>

From harendra.kumar at gmail.com  Fri Apr  7 10:15:41 2023
From: harendra.kumar at gmail.com (Harendra Kumar)
Date: Fri, 7 Apr 2023 15:45:41 +0530
Subject: Performance of small allocations via prim ops
In-Reply-To: <CAJKmMz87fghKaMPkeyGQ+Nm26WKty-qM8PzhL5+5dfJEHmAz8A@mail.gmail.com>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
 <87fs9cllg1.fsf@smart-cactus.org>
 <CAPW+kkb2VZbJ=X+c=nrCT_6nM1xRaYDbB=VjnQsDp1RCnYqLSw@mail.gmail.com>
 <CAPW+kkbpz3weeeL51_XdTh0rv9XUOUgaFeurG-1hKOzgDgW6ZQ@mail.gmail.com>
 <CAPW+kkZzVDssJ_DCCL5MuVn5PBnN8mikYLAWjf45VzBB0h5tyA@mail.gmail.com>
 <CAJKmMz87fghKaMPkeyGQ+Nm26WKty-qM8PzhL5+5dfJEHmAz8A@mail.gmail.com>
Message-ID: <CAPW+kkbJA1e+32M8KVqivF5w_XhawQ-Vz35d07o04qGfFgTAkQ@mail.gmail.com>

On Fri, 7 Apr 2023 at 12:57, Simon Peyton Jones <simon.peytonjones at gmail.com>
wrote:

> > We are emitting a more efficient code when the size of the array is
> smaller. And the threshold is governed by a compiler flag:
>
> It would be good if this was documented.  Perhaps in the Haddock for
> `newByteArray#`?  Or where?
>

The flag is documented in the GHC user guide but the behavior would be
better discoverable if `newByteArray#` mentions it.

-harendra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230407/b3e3bb70/attachment.html>

From harendra.kumar at gmail.com  Fri Apr  7 10:28:57 2023
From: harendra.kumar at gmail.com (Harendra Kumar)
Date: Fri, 7 Apr 2023 15:58:57 +0530
Subject: Performance of small allocations via prim ops
In-Reply-To: <CAPW+kkbJA1e+32M8KVqivF5w_XhawQ-Vz35d07o04qGfFgTAkQ@mail.gmail.com>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
 <87fs9cllg1.fsf@smart-cactus.org>
 <CAPW+kkb2VZbJ=X+c=nrCT_6nM1xRaYDbB=VjnQsDp1RCnYqLSw@mail.gmail.com>
 <CAPW+kkbpz3weeeL51_XdTh0rv9XUOUgaFeurG-1hKOzgDgW6ZQ@mail.gmail.com>
 <CAPW+kkZzVDssJ_DCCL5MuVn5PBnN8mikYLAWjf45VzBB0h5tyA@mail.gmail.com>
 <CAJKmMz87fghKaMPkeyGQ+Nm26WKty-qM8PzhL5+5dfJEHmAz8A@mail.gmail.com>
 <CAPW+kkbJA1e+32M8KVqivF5w_XhawQ-Vz35d07o04qGfFgTAkQ@mail.gmail.com>
Message-ID: <CAPW+kkZa+Yg=d_tf+WQ_YNtb55_XXiwQssTeL54wAeT-gqiaVw@mail.gmail.com>

I am confused by this flag. This flag allows us to allocate statically
known arrays sizes of <= n to be allocated from the current nursery block.
But looking at the code in allocateMightFail, as I interpret it, any size
array up to LARGE_OBJECT_THRESHOLD is anyway allocated from the current
nursery block. So why have this option? Why not fix this to
LARGE_OBJECT_THRESHOLD? Maybe I am missing something.

-harendra

On Fri, 7 Apr 2023 at 15:45, Harendra Kumar <harendra.kumar at gmail.com>
wrote:

>
>
> On Fri, 7 Apr 2023 at 12:57, Simon Peyton Jones <
> simon.peytonjones at gmail.com> wrote:
>
>> > We are emitting a more efficient code when the size of the array is
>> smaller. And the threshold is governed by a compiler flag:
>>
>> It would be good if this was documented.  Perhaps in the Haddock for
>> `newByteArray#`?  Or where?
>>
>
> The flag is documented in the GHC user guide but the behavior would be
> better discoverable if `newByteArray#` mentions it.
>
> -harendra
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230407/e07e6183/attachment.html>

From harendra.kumar at gmail.com  Fri Apr  7 11:34:51 2023
From: harendra.kumar at gmail.com (Harendra Kumar)
Date: Fri, 7 Apr 2023 17:04:51 +0530
Subject: Performance of small allocations via prim ops
In-Reply-To: <CAHYVw0y9stsVU=tGv49MupshrJmEyVqSWR7zhucmrTXOKBM2kg@mail.gmail.com>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
 <CAHYVw0y9stsVU=tGv49MupshrJmEyVqSWR7zhucmrTXOKBM2kg@mail.gmail.com>
Message-ID: <CAPW+kkbwEaJ7BqzY8fjKmc1eOi-fRwAxEyn-tvX2_vMsPegZMw@mail.gmail.com>

On Fri, 7 Apr 2023 at 02:18, Carter Schonwald <carter.schonwald at gmail.com>
wrote:

> That sounds like a worthy experiment!
>
> I  guess that would look like having an inline macro’d up path that checks
> if it can get the job done that falls back to the general code?
>
> Last I checked, the overhead for this sort of c call was on the order of
> 10nanoseconds or less which seems like it’d be very unlikely to be a
> bottleneck, but do you have any natural or artificial benchmark programs
> that would show case this?
>

I converted my example code into a loop and ran it a million times with a 1
byte array size (would be 8 bytes after alignment). So roughly 3 words
would be allocated per array, including the header and length. It took 5 ms
using the statically known size optimization which inlines the alloc
completely, and 10 ms using an unknown size (from program arg) which makes
a call to newByteArray# . That turns out to be of the order of 5ns more per
allocation. It does not sound like a big deal.

-harendra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230407/db80c3e0/attachment.html>

From carter.schonwald at gmail.com  Fri Apr  7 12:41:20 2023
From: carter.schonwald at gmail.com (Carter Schonwald)
Date: Fri, 7 Apr 2023 08:41:20 -0400
Subject: Performance of small allocations via prim ops
In-Reply-To: <CAPW+kkbwEaJ7BqzY8fjKmc1eOi-fRwAxEyn-tvX2_vMsPegZMw@mail.gmail.com>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
 <CAHYVw0y9stsVU=tGv49MupshrJmEyVqSWR7zhucmrTXOKBM2kg@mail.gmail.com>
 <CAPW+kkbwEaJ7BqzY8fjKmc1eOi-fRwAxEyn-tvX2_vMsPegZMw@mail.gmail.com>
Message-ID: <CAHYVw0zcVd+zadHt75GWdM7sn65BR_AQ3_eQFnmf8XOyA2-U_g@mail.gmail.com>

Great /fast experimentation!

I will admit I’m pleased that my dated intuition is still correct, but more
importantly we have more current data!

Thanks for the exploration and sharing what you found!

On Fri, Apr 7, 2023 at 7:35 AM Harendra Kumar <harendra.kumar at gmail.com>
wrote:

>
>
> On Fri, 7 Apr 2023 at 02:18, Carter Schonwald <carter.schonwald at gmail.com>
> wrote:
>
>> That sounds like a worthy experiment!
>>
>> I  guess that would look like having an inline macro’d up path that
>> checks if it can get the job done that falls back to the general code?
>>
>> Last I checked, the overhead for this sort of c call was on the order of
>> 10nanoseconds or less which seems like it’d be very unlikely to be a
>> bottleneck, but do you have any natural or artificial benchmark programs
>> that would show case this?
>>
>
> I converted my example code into a loop and ran it a million times with a
> 1 byte array size (would be 8 bytes after alignment). So roughly 3 words
> would be allocated per array, including the header and length. It took 5 ms
> using the statically known size optimization which inlines the alloc
> completely, and 10 ms using an unknown size (from program arg) which makes
> a call to newByteArray# . That turns out to be of the order of 5ns more per
> allocation. It does not sound like a big deal.
>
> -harendra
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230407/13044255/attachment.html>

From ben at smart-cactus.org  Sun Apr  9 22:02:57 2023
From: ben at smart-cactus.org (Ben Gamari)
Date: Sun, 09 Apr 2023 18:02:57 -0400
Subject: Performance of small allocations via prim ops
In-Reply-To: <CAPW+kkZa+Yg=d_tf+WQ_YNtb55_XXiwQssTeL54wAeT-gqiaVw@mail.gmail.com>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
 <87fs9cllg1.fsf@smart-cactus.org>
 <CAPW+kkb2VZbJ=X+c=nrCT_6nM1xRaYDbB=VjnQsDp1RCnYqLSw@mail.gmail.com>
 <CAPW+kkbpz3weeeL51_XdTh0rv9XUOUgaFeurG-1hKOzgDgW6ZQ@mail.gmail.com>
 <CAPW+kkZzVDssJ_DCCL5MuVn5PBnN8mikYLAWjf45VzBB0h5tyA@mail.gmail.com>
 <CAJKmMz87fghKaMPkeyGQ+Nm26WKty-qM8PzhL5+5dfJEHmAz8A@mail.gmail.com>
 <CAPW+kkbJA1e+32M8KVqivF5w_XhawQ-Vz35d07o04qGfFgTAkQ@mail.gmail.com>
 <CAPW+kkZa+Yg=d_tf+WQ_YNtb55_XXiwQssTeL54wAeT-gqiaVw@mail.gmail.com>
Message-ID: <87fs9890ke.fsf@smart-cactus.org>

Harendra Kumar <harendra.kumar at gmail.com> writes:

> I am confused by this flag. This flag allows us to allocate statically
> known arrays sizes of <= n to be allocated from the current nursery block.
> But looking at the code in allocateMightFail, as I interpret it, any size
> array up to LARGE_OBJECT_THRESHOLD is anyway allocated from the current
> nursery block. So why have this option? Why not fix this to
> LARGE_OBJECT_THRESHOLD? Maybe I am missing something.
>
In principle we could do so. The motivation for making this a flag isn't
immediately clear from the commit implementing this optimisation
(1eece45692fb5d1a5f4ec60c1537f8068237e9c1).

One complication is that currently GHC has no way to know the value of
LARGE_OBJECT_THRESHOLD (which is a runtime system macro). Typically to
handle this sort of thing we use utils/deriveConstants to generate a
Haskell binding mirroring the value of the C declaration. However,
as GHC becomes runtime-retargetable we may need to revisit this design.

Cheers,

- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 487 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230409/5fe1d8e4/attachment.sig>

From harendra.kumar at gmail.com  Wed Apr 12 09:02:43 2023
From: harendra.kumar at gmail.com (Harendra Kumar)
Date: Wed, 12 Apr 2023 14:32:43 +0530
Subject: GHC 9.6.1 rejects previously working code
Message-ID: <CAPW+kkZRTzOkTx3iD01VYiGSN4wsPHoNfFCVzTqj-HpFdaOjrQ@mail.gmail.com>

The following code compiles with older compilers but does not compile with
GHC 9.6.1:

{-# LANGUAGE KindSignatures #-}
module A () where

import Control.Monad.IO.Class
import Control.Monad.Trans.Class

data T (m :: * -> *) a = T

instance Functor (T m) where
    fmap f T = undefined

instance Applicative (T m) where
    pure = undefined
    (<*>) = undefined

instance MonadIO m => Monad (T m) where
    return = pure
    (>>=) = undefined

instance MonadTrans T where
    lift = undefined

It fails with the following error:

xx.hs:20:10: error: [GHC-39999]
    • Could not deduce ‘MonadIO m’
        arising from the head of a quantified constraint
        arising from the superclasses of an instance declaration
      from the context: Monad m
        bound by a quantified context at xx.hs:20:10-21
      Possible fix:
        add (MonadIO m) to the context of a quantified context
    • In the instance declaration for ‘MonadTrans T’
   |
20 | instance MonadTrans T where
   |          ^^^^^^^^^^^^

What is the correct resolution for this?

-harendra
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230412/83c1c2ab/attachment.html>

From tom-lists-haskell-cafe-2017 at jaguarpaw.co.uk  Wed Apr 12 09:10:10 2023
From: tom-lists-haskell-cafe-2017 at jaguarpaw.co.uk (Tom Ellis)
Date: Wed, 12 Apr 2023 10:10:10 +0100
Subject: GHC 9.6.1 rejects previously working code
In-Reply-To: <CAPW+kkZRTzOkTx3iD01VYiGSN4wsPHoNfFCVzTqj-HpFdaOjrQ@mail.gmail.com>
References: <CAPW+kkZRTzOkTx3iD01VYiGSN4wsPHoNfFCVzTqj-HpFdaOjrQ@mail.gmail.com>
Message-ID: <ZDZ1cj/OK51Bnr+u@cloudinit-builder>

On Wed, Apr 12, 2023 at 02:32:43PM +0530, Harendra Kumar wrote:
> instance MonadIO m => Monad (T m) where
>     return = pure
>     (>>=) = undefined
> 
> instance MonadTrans T where
>     lift = undefined

I guess it's nothing to do with 9.6 per se, but rather the difference
between

* https://hackage.haskell.org/package/transformers-0.5.6.2/docs/Control-Monad-Trans-Class.html#t:MonadTrans

* https://hackage.haskell.org/package/transformers-0.6.1.0/docs/Control-Monad-Trans-Class.html#t:MonadTrans

I'm not sure I can see any solution for this.  A monad transformer `T`
must give rise to a monad `T m` regardless of what `m` is.  If `T m`
is only a monad when `MonadIO m` then `T` can't be a monad transformer
(under transformers 0.6).

Tom

From rodrigo.m.mesquita at gmail.com  Wed Apr 12 09:30:50 2023
From: rodrigo.m.mesquita at gmail.com (Rodrigo Mesquita)
Date: Wed, 12 Apr 2023 10:30:50 +0100
Subject: GHC 9.6.1 rejects previously working code
In-Reply-To: <ZDZ1cj/OK51Bnr+u@cloudinit-builder>
References: <CAPW+kkZRTzOkTx3iD01VYiGSN4wsPHoNfFCVzTqj-HpFdaOjrQ@mail.gmail.com>
 <ZDZ1cj/OK51Bnr+u@cloudinit-builder>
Message-ID: <550F8331-4BE0-4DF4-B9D8-12CD8EC4ED03@gmail.com>

Indeed, this is included in the GHC 9.6.x Migration Guide <https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.6#transformers-06-new-monadtrans-quantified-context>.

Unfortunately, I’m also not sure there is a solution for this particular where (T m) is only a Monad if m instances MonadIO.
As Tom explained, under transformers 0.6 `T` no longer is a monad transformer.

A few workarounds I can think of:

- No longer instance `MonadTrans T`, and use a instance `MonadIO m => MonadIO (T m)` instead.
  Rationale: if you always require `m` to be `MonadIO`, perhaps the ability to always lift an `m` to `T m` with `liftIO` is sufficient.

- Add the `MonadIO` instance to the `m` field of `T`, GADT style, `data T m a where T :: MonadIO m => m -> T m a`
  Rational: You would no longer need `MonadIO` in the `Monad` instance, which will make it possible to instance `MonadTrans`.

- Redefine your own `lift` regardless of `MonadTrans`

Good luck!
Rodrigo

> On 12 Apr 2023, at 10:10, Tom Ellis <tom-lists-haskell-cafe-2017 at jaguarpaw.co.uk> wrote:
> 
> On Wed, Apr 12, 2023 at 02:32:43PM +0530, Harendra Kumar wrote:
>> instance MonadIO m => Monad (T m) where
>>    return = pure
>>    (>>=) = undefined
>> 
>> instance MonadTrans T where
>>    lift = undefined
> 
> I guess it's nothing to do with 9.6 per se, but rather the difference
> between
> 
> * https://hackage.haskell.org/package/transformers-0.5.6.2/docs/Control-Monad-Trans-Class.html#t:MonadTrans
> 
> * https://hackage.haskell.org/package/transformers-0.6.1.0/docs/Control-Monad-Trans-Class.html#t:MonadTrans
> 
> I'm not sure I can see any solution for this.  A monad transformer `T`
> must give rise to a monad `T m` regardless of what `m` is.  If `T m`
> is only a monad when `MonadIO m` then `T` can't be a monad transformer
> (under transformers 0.6).
> 
> Tom
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230412/4370205e/attachment.html>

From harendra.kumar at gmail.com  Wed Apr 12 09:42:26 2023
From: harendra.kumar at gmail.com (Harendra Kumar)
Date: Wed, 12 Apr 2023 15:12:26 +0530
Subject: GHC 9.6.1 rejects previously working code
In-Reply-To: <550F8331-4BE0-4DF4-B9D8-12CD8EC4ED03@gmail.com>
References: <CAPW+kkZRTzOkTx3iD01VYiGSN4wsPHoNfFCVzTqj-HpFdaOjrQ@mail.gmail.com>
 <ZDZ1cj/OK51Bnr+u@cloudinit-builder>
 <550F8331-4BE0-4DF4-B9D8-12CD8EC4ED03@gmail.com>
Message-ID: <CAPW+kkZn-XsESj6GmzQxMP5J4gTp3+TNSuz_AAvpqpwOCa073A@mail.gmail.com>

Thanks Tom and Rodrigo.

That clarifies the problem. We will need to think which solution makes
better sense.

On Wed, 12 Apr 2023 at 15:01, Rodrigo Mesquita <rodrigo.m.mesquita at gmail.com>
wrote:

> Indeed, this is included in the GHC 9.6.x Migration Guide
> <https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.6#transformers-06-new-monadtrans-quantified-context>
> .
>
> Unfortunately, I’m also not sure there is a solution for this particular
> where (T m) is only a Monad if m instances MonadIO.
> As Tom explained, under transformers 0.6 `T` no longer is a monad
> transformer.
>
> A few workarounds I can think of:
>
> - No longer instance `MonadTrans T`, and use a instance `MonadIO m =>
> MonadIO (T m)` instead.
>   Rationale: if you always require `m` to be `MonadIO`, perhaps the
> ability to always lift an `m` to `T m` with `liftIO` is sufficient.
>
> - Add the `MonadIO` instance to the `m` field of `T`, GADT style, `data T
> m a where T :: MonadIO m => m -> T m a`
>   Rational: You would no longer need `MonadIO` in the `Monad` instance,
> which will make it possible to instance `MonadTrans`.
>
> - Redefine your own `lift` regardless of `MonadTrans`
>
> Good luck!
> Rodrigo
>
> On 12 Apr 2023, at 10:10, Tom Ellis <
> tom-lists-haskell-cafe-2017 at jaguarpaw.co.uk> wrote:
>
> On Wed, Apr 12, 2023 at 02:32:43PM +0530, Harendra Kumar wrote:
>
> instance MonadIO m => Monad (T m) where
>    return = pure
>    (>>=) = undefined
>
> instance MonadTrans T where
>    lift = undefined
>
>
> I guess it's nothing to do with 9.6 per se, but rather the difference
> between
>
> *
> https://hackage.haskell.org/package/transformers-0.5.6.2/docs/Control-Monad-Trans-Class.html#t:MonadTrans
>
> *
> https://hackage.haskell.org/package/transformers-0.6.1.0/docs/Control-Monad-Trans-Class.html#t:MonadTrans
>
> I'm not sure I can see any solution for this.  A monad transformer `T`
> must give rise to a monad `T m` regardless of what `m` is.  If `T m`
> is only a monad when `MonadIO m` then `T` can't be a monad transformer
> (under transformers 0.6).
>
> Tom
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230412/0b383185/attachment.html>

From sylvain at haskus.fr  Wed Apr 12 10:06:12 2023
From: sylvain at haskus.fr (Sylvain Henry)
Date: Wed, 12 Apr 2023 12:06:12 +0200
Subject: Performance of small allocations via prim ops
In-Reply-To: <87fs9890ke.fsf@smart-cactus.org>
References: <CAPW+kkYraE9prXqdGH_4D0jX1ntk5wRrgcs3KTmBPT9DBbtotQ@mail.gmail.com>
 <87fs9cllg1.fsf@smart-cactus.org>
 <CAPW+kkb2VZbJ=X+c=nrCT_6nM1xRaYDbB=VjnQsDp1RCnYqLSw@mail.gmail.com>
 <CAPW+kkbpz3weeeL51_XdTh0rv9XUOUgaFeurG-1hKOzgDgW6ZQ@mail.gmail.com>
 <CAPW+kkZzVDssJ_DCCL5MuVn5PBnN8mikYLAWjf45VzBB0h5tyA@mail.gmail.com>
 <CAJKmMz87fghKaMPkeyGQ+Nm26WKty-qM8PzhL5+5dfJEHmAz8A@mail.gmail.com>
 <CAPW+kkbJA1e+32M8KVqivF5w_XhawQ-Vz35d07o04qGfFgTAkQ@mail.gmail.com>
 <CAPW+kkZa+Yg=d_tf+WQ_YNtb55_XXiwQssTeL54wAeT-gqiaVw@mail.gmail.com>
 <87fs9890ke.fsf@smart-cactus.org>
Message-ID: <5c68b310-b980-e8fd-bb90-5fd3d450fb04@haskus.fr>


> One complication is that currently GHC has no way to know the value of
> LARGE_OBJECT_THRESHOLD (which is a runtime system macro). Typically to
> handle this sort of thing we use utils/deriveConstants to generate a
> Haskell binding mirroring the value of the C declaration. However,
> as GHC becomes runtime-retargetable we may need to revisit this design.

Since 
https://gitlab.haskell.org/ghc/ghc/-/commit/085983e63bfe6af23f8b85fbfcca8db4872d2f60 
(2021-03) we don't do this. We only read constants from the header file 
provided by the RTS unit. Adding one more constant for 
LARGE_OBJECT_THRESHOLD shouldn't be an issue.

Cheers

Sylvain


From zubin at well-typed.com  Tue Apr 18 13:56:48 2023
From: zubin at well-typed.com (Zubin Duggal)
Date: Tue, 18 Apr 2023 19:26:48 +0530
Subject: [Haskell] [ANNOUNCE] GHC 9.4.5 released
Message-ID: <uahhxtswzjuqsmdlgizx6n4xhrap5zwumlx32jg3g3ukp4jpmk@ern5qr56pjnx>

The GHC developers are happy to announce the availability of GHC 9.4.5. Binary
distributions, source distributions, and documentation are available at
[downloads.haskell.org](https://downloads.haskell.org/ghc/9.4.5).

Download Page: https://www.haskell.org/ghc/download_ghc_9.4.5.html
Blog Post: https://www.haskell.org/ghc/blog/20230418-ghc-9.4.5-released.html

This release is primarily a bugfix release addressing a few issues
found in 9.4.4. These include:

  * Fixes for a number of bug fixes in the simplifier (#22623, #22718, #22913, 22695,
    #23184, #22998, #22662, #22725).
  * Many bug fixes to the non-moving and parallel GCs (#22264, #22327, #22926,
    #22927, #22929, #22930, #17574, #21840, #22528)
  * A fix a bug with the alignment of RTS data structures that could result in
    segfaults when compiled with high optimisation settings on certain platforms
    (#22975 , #22965).
  * Bumping `gmp-tarballs` to a version which doesn't use the reserved `x18`
    register on AArch64/Darwin systems, and also has fixes for CVE-2021-43618
    (#22497, #22789).
  * A number of improvements to recompilation avoidance with multiple home units
    (#22675, #22677, #22669, #22678, #22679, #22680)
  * Fixes for regressions in the typechecker and constraint solver (#22647,
    #23134, #22516, #22743)
  * Easier installation of binary distribution on MacOS platforms by changing the
    installation Makefile to remove the quarantine attribute when installing.
  * ... and many more. See the [release notes] for a full accounting.

As some of the fixed issues do affect correctness users are encouraged to
upgrade promptly.

We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool,
Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and
other anonymous contributors whose on-going financial and in-kind support has
facilitated GHC maintenance and release management over the years. Finally,
this release would not have been possible without the hundreds of open-source
contributors whose work comprise this release.

As always, do give this release a try and open a [ticket][] if you see
anything amiss.

Happy compiling,

- Zubin

[ticket]: https://gitlab.haskell.org/ghc/ghc/-/issues/new
[release notes]: https://downloads.haskell.org/~ghc/9.4.5/docs/html/users_guide/9.4.5-notes.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230418/da64d250/attachment.sig>

From george.colpitts at gmail.com  Mon Apr 24 20:49:46 2023
From: george.colpitts at gmail.com (George Colpitts)
Date: Mon, 24 Apr 2023 17:49:46 -0300
Subject: does llvm 16 work with ghc 9.6.1 ?
Message-ID: <CAB-d4A6os0KsAsfHC+brN34vho-wAyQ37w9LE9A2OMkiZY9=eg@mail.gmail.com>

Hi

Does anybody know if  llvm 16 works with ghc 9.6.1 ?

Thanks
George
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230424/95666b71/attachment.html>

From moritz.angermann at gmail.com  Tue Apr 25 01:01:48 2023
From: moritz.angermann at gmail.com (Moritz Angermann)
Date: Tue, 25 Apr 2023 09:01:48 +0800
Subject: does llvm 16 work with ghc 9.6.1 ?
In-Reply-To: <CAB-d4A6os0KsAsfHC+brN34vho-wAyQ37w9LE9A2OMkiZY9=eg@mail.gmail.com>
References: <CAB-d4A6os0KsAsfHC+brN34vho-wAyQ37w9LE9A2OMkiZY9=eg@mail.gmail.com>
Message-ID: <CAKfdd-xz-r=QdxTQAipRie6-EtbaU_=6MZCp5SoU6-WSCiK7pg@mail.gmail.com>

Hi George,

while I personally haven’t tried. I’d encourage you to just try. Unless
they changed their textual IR (they don’t do that often anymore), it could
just work.

Whether or not you run into bugs for the specific target you are looking
at, is hard to say without knowing the target.

My suggestion would be to just try building your configuration with the
llvm backend against llvm16, and run validate if you can.

Cheers,
  Moritz

On Tue, 25 Apr 2023 at 4:50 AM, George Colpitts <george.colpitts at gmail.com>
wrote:

> Hi
>
> Does anybody know if  llvm 16 works with ghc 9.6.1 ?
>
> Thanks
> George
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230425/2ec55e48/attachment.html>

From godzbanebane at gmail.com  Tue Apr 25 06:38:59 2023
From: godzbanebane at gmail.com (Georgi Lyubenov)
Date: Tue, 25 Apr 2023 09:38:59 +0300
Subject: GHC 9.6.1 rejects previously working code
In-Reply-To: <CAPW+kkZn-XsESj6GmzQxMP5J4gTp3+TNSuz_AAvpqpwOCa073A@mail.gmail.com>
References: <CAPW+kkZRTzOkTx3iD01VYiGSN4wsPHoNfFCVzTqj-HpFdaOjrQ@mail.gmail.com>
 <ZDZ1cj/OK51Bnr+u@cloudinit-builder>
 <550F8331-4BE0-4DF4-B9D8-12CD8EC4ED03@gmail.com>
 <CAPW+kkZn-XsESj6GmzQxMP5J4gTp3+TNSuz_AAvpqpwOCa073A@mail.gmail.com>
Message-ID: <51f59bba-c4a3-ef83-bf55-9c11ffaf7534@gmail.com>

Out of curiosity, why do you require the `MonadIO` on the `Monad` instance?

On 4/12/23 12:42, Harendra Kumar wrote:
> Thanks Tom and Rodrigo.
>
> That clarifies the problem. We will need to think which solution makes 
> better sense.
>
> On Wed, 12 Apr 2023 at 15:01, Rodrigo Mesquita 
> <rodrigo.m.mesquita at gmail.com> wrote:
>
>     Indeed, this is included in the GHC 9.6.x Migration Guide
>     <https://gitlab.haskell.org/ghc/ghc/-/wikis/migration/9.6#transformers-06-new-monadtrans-quantified-context>.
>
>
>     Unfortunately, I’m also not sure there is a solution for this
>     particular where (T m) is only a Monad if m instances MonadIO.
>     As Tom explained, under transformers 0.6 `T` no longer is a monad
>     transformer.
>
>     A few workarounds I can think of:
>
>     - No longer instance `MonadTrans T`, and use a instance `MonadIO m
>     => MonadIO (T m)` instead.
>       Rationale: if you always require `m` to be `MonadIO`, perhaps
>     the ability to always lift an `m` to `T m` with `liftIO` is
>     sufficient.
>
>     - Add the `MonadIO` instance to the `m` field of `T`, GADT style,
>     `data T m a where T :: MonadIO m => m -> T m a`
>       Rational: You would no longer need `MonadIO` in the `Monad`
>     instance, which will make it possible to instance `MonadTrans`.
>
>     - Redefine your own `lift` regardless of `MonadTrans`
>
>     Good luck!
>     Rodrigo
>
>>     On 12 Apr 2023, at 10:10, Tom Ellis
>>     <tom-lists-haskell-cafe-2017 at jaguarpaw.co.uk> wrote:
>>
>>     On Wed, Apr 12, 2023 at 02:32:43PM +0530, Harendra Kumar wrote:
>>>     instance MonadIO m => Monad (T m) where
>>>        return = pure
>>>        (>>=) = undefined
>>>
>>>     instance MonadTrans T where
>>>        lift = undefined
>>
>>     I guess it's nothing to do with 9.6 per se, but rather the difference
>>     between
>>
>>     *
>>     https://hackage.haskell.org/package/transformers-0.5.6.2/docs/Control-Monad-Trans-Class.html#t:MonadTrans
>>
>>     *
>>     https://hackage.haskell.org/package/transformers-0.6.1.0/docs/Control-Monad-Trans-Class.html#t:MonadTrans
>>
>>     I'm not sure I can see any solution for this.  A monad
>>     transformer `T`
>>     must give rise to a monad `T m` regardless of what `m` is.  If `T m`
>>     is only a monad when `MonadIO m` then `T` can't be a monad
>>     transformer
>>     (under transformers 0.6).
>>
>>     Tom
>>     _______________________________________________
>>     ghc-devs mailing list
>>     ghc-devs at haskell.org
>>     http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>     _______________________________________________
>     ghc-devs mailing list
>     ghc-devs at haskell.org
>     http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>
>
> _______________________________________________
> ghc-devs mailing list
> ghc-devs at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230425/f6e23d33/attachment.html>

From george.colpitts at gmail.com  Tue Apr 25 13:54:41 2023
From: george.colpitts at gmail.com (George Colpitts)
Date: Tue, 25 Apr 2023 10:54:41 -0300
Subject: does llvm 16 work with ghc 9.6.1 ?
In-Reply-To: <CAKfdd-xz-r=QdxTQAipRie6-EtbaU_=6MZCp5SoU6-WSCiK7pg@mail.gmail.com>
References: <CAB-d4A6os0KsAsfHC+brN34vho-wAyQ37w9LE9A2OMkiZY9=eg@mail.gmail.com>
 <CAKfdd-xz-r=QdxTQAipRie6-EtbaU_=6MZCp5SoU6-WSCiK7pg@mail.gmail.com>
Message-ID: <CAB-d4A4WPAF744Eu=JcNBN8ppYUrGL1HYMrkJ8AUcvvabOLVqQ@mail.gmail.com>

Thanks Moritz. I went ahead and tried it. On a very simple smoke test I
observed that "-fllvm works" but "-O2 -fllvm" does not. It fails with
"Cannot use -O# with legacy PM.". There is already a bug for changing ghc
to work with the new pass manager. It wasn't clear to me that this would be
needed for llvm 16. It seems that it is.

Cheers
George


On Mon, Apr 24, 2023 at 10:02 PM Moritz Angermann <
moritz.angermann at gmail.com> wrote:

> Hi George,
>
> while I personally haven’t tried. I’d encourage you to just try. Unless
> they changed their textual IR (they don’t do that often anymore), it could
> just work.
>
> Whether or not you run into bugs for the specific target you are looking
> at, is hard to say without knowing the target.
>
> My suggestion would be to just try building your configuration with the
> llvm backend against llvm16, and run validate if you can.
>
> Cheers,
>   Moritz
>
> On Tue, 25 Apr 2023 at 4:50 AM, George Colpitts <george.colpitts at gmail.com>
> wrote:
>
>> Hi
>>
>> Does anybody know if  llvm 16 works with ghc 9.6.1 ?
>>
>> Thanks
>> George
>>
>> _______________________________________________
>> ghc-devs mailing list
>> ghc-devs at haskell.org
>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230425/06a5b1fc/attachment.html>

From george.colpitts at gmail.com  Tue Apr 25 14:35:27 2023
From: george.colpitts at gmail.com (George Colpitts)
Date: Tue, 25 Apr 2023 11:35:27 -0300
Subject: does llvm 16 work with ghc 9.6.1 ?
In-Reply-To: <CAB-d4A4WPAF744Eu=JcNBN8ppYUrGL1HYMrkJ8AUcvvabOLVqQ@mail.gmail.com>
References: <CAB-d4A6os0KsAsfHC+brN34vho-wAyQ37w9LE9A2OMkiZY9=eg@mail.gmail.com>
 <CAKfdd-xz-r=QdxTQAipRie6-EtbaU_=6MZCp5SoU6-WSCiK7pg@mail.gmail.com>
 <CAB-d4A4WPAF744Eu=JcNBN8ppYUrGL1HYMrkJ8AUcvvabOLVqQ@mail.gmail.com>
Message-ID: <CAB-d4A5uD-ny=vDYi=z-FtC_kyPZLcB1O6RvuhcbU3WuiKaNjg@mail.gmail.com>

the bug for this is https://gitlab.haskell.org/ghc/ghc/-/issues/22954

On Tue, Apr 25, 2023 at 10:54 AM George Colpitts <george.colpitts at gmail.com>
wrote:

> Thanks Moritz. I went ahead and tried it. On a very simple smoke test I
> observed that "-fllvm works" but "-O2 -fllvm" does not. It fails with
> "Cannot use -O# with legacy PM.". There is already a bug for changing ghc
> to work with the new pass manager. It wasn't clear to me that this would be
> needed for llvm 16. It seems that it is.
>
> Cheers
> George
>
>
>
>
> On Mon, Apr 24, 2023 at 10:02 PM Moritz Angermann <
> moritz.angermann at gmail.com> wrote:
>
>> Hi George,
>>
>> while I personally haven’t tried. I’d encourage you to just try. Unless
>> they changed their textual IR (they don’t do that often anymore), it could
>> just work.
>>
>> Whether or not you run into bugs for the specific target you are looking
>> at, is hard to say without knowing the target.
>>
>> My suggestion would be to just try building your configuration with the
>> llvm backend against llvm16, and run validate if you can.
>>
>> Cheers,
>>   Moritz
>>
>> On Tue, 25 Apr 2023 at 4:50 AM, George Colpitts <
>> george.colpitts at gmail.com> wrote:
>>
>>> Hi
>>>
>>> Does anybody know if  llvm 16 works with ghc 9.6.1 ?
>>>
>>> Thanks
>>> George
>>>
>>> _______________________________________________
>>> ghc-devs mailing list
>>> ghc-devs at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230425/f2809894/attachment.html>

From george.colpitts at gmail.com  Tue Apr 25 19:56:52 2023
From: george.colpitts at gmail.com (George Colpitts)
Date: Tue, 25 Apr 2023 16:56:52 -0300
Subject: does llvm 16 work with ghc 9.6.1 ?
In-Reply-To: <CAB-d4A5uD-ny=vDYi=z-FtC_kyPZLcB1O6RvuhcbU3WuiKaNjg@mail.gmail.com>
References: <CAB-d4A6os0KsAsfHC+brN34vho-wAyQ37w9LE9A2OMkiZY9=eg@mail.gmail.com>
 <CAKfdd-xz-r=QdxTQAipRie6-EtbaU_=6MZCp5SoU6-WSCiK7pg@mail.gmail.com>
 <CAB-d4A4WPAF744Eu=JcNBN8ppYUrGL1HYMrkJ8AUcvvabOLVqQ@mail.gmail.com>
 <CAB-d4A5uD-ny=vDYi=z-FtC_kyPZLcB1O6RvuhcbU3WuiKaNjg@mail.gmail.com>
Message-ID: <CAB-d4A635QDi873hEy-5y_Vr28sn7ti2z4ke5hpX8hcOFL7cvw@mail.gmail.com>

@duog has documented workarounds in
https://gitlab.haskell.org/ghc/ghc/-/issues/21936:


These /ghc/ flags reproduce -O0:

-optlo-passes='module(default<O0>,function(mem2reg),globalopt,function(lower-expect))'
-fno-llvm-tbaa -O0


These /ghc/ flags reproduce -O1:

-optlo-passes='module(default<O1>,globalopt)' -O1 -fno-llvm-tbaa


These /ghc/ flags reproduce -O2:

-optlo-passes='module(default<O2>)' -O2 -fno-llvm-tbaa


On Tue, Apr 25, 2023 at 11:35 AM George Colpitts <george.colpitts at gmail.com>
wrote:

> the bug for this is https://gitlab.haskell.org/ghc/ghc/-/issues/22954
>
> On Tue, Apr 25, 2023 at 10:54 AM George Colpitts <
> george.colpitts at gmail.com> wrote:
>
>> Thanks Moritz. I went ahead and tried it. On a very simple smoke test I
>> observed that "-fllvm works" but "-O2 -fllvm" does not. It fails with
>> "Cannot use -O# with legacy PM.". There is already a bug for changing ghc
>> to work with the new pass manager. It wasn't clear to me that this would be
>> needed for llvm 16. It seems that it is.
>>
>> Cheers
>> George
>>
>>
>>
>>
>> On Mon, Apr 24, 2023 at 10:02 PM Moritz Angermann <
>> moritz.angermann at gmail.com> wrote:
>>
>>> Hi George,
>>>
>>> while I personally haven’t tried. I’d encourage you to just try. Unless
>>> they changed their textual IR (they don’t do that often anymore), it could
>>> just work.
>>>
>>> Whether or not you run into bugs for the specific target you are looking
>>> at, is hard to say without knowing the target.
>>>
>>> My suggestion would be to just try building your configuration with the
>>> llvm backend against llvm16, and run validate if you can.
>>>
>>> Cheers,
>>>   Moritz
>>>
>>> On Tue, 25 Apr 2023 at 4:50 AM, George Colpitts <
>>> george.colpitts at gmail.com> wrote:
>>>
>>>> Hi
>>>>
>>>> Does anybody know if  llvm 16 works with ghc 9.6.1 ?
>>>>
>>>> Thanks
>>>> George
>>>>
>>>> _______________________________________________
>>>> ghc-devs mailing list
>>>> ghc-devs at haskell.org
>>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-devs/attachments/20230425/7b378e14/attachment.html>

From ietf-dane at dukhovni.org  Sun Apr 30 01:00:55 2023
From: ietf-dane at dukhovni.org (Viktor Dukhovni)
Date: Sat, 29 Apr 2023 21:00:55 -0400
Subject: Build of GHC 9.6 fails when the build directory is not a child of
 the source directory
Message-ID: <ZE29x4k45gD2HYaq@straasha.imrryr.org>

For some time now I'd been unable to build GHC 9.6 from source.  The
reason turned out to be that my hadrian command-line selected an
explicit build directory that was not an immediate child of the source
directory (default it seems is "_build").

With the source tree under "$HOME/dev/ghc/", the hardrian command

    $ hadrian/build -V -V -o"$HOME/dev/buildghc" --docs=no-sphinx binary-dist-dir

after building stage0, and running "configure" in libraries/base,
reports an error finding HsFFI.h:

    Reading parameters from $HOME/dev/buildghc/stage1/libraries/base/build/base.buildinfo
    /usr/bin/cc '-fuse-ld=gold' /tmp/2303653-4.c -o /tmp/2303653-5
        '-D__GLASGOW_HASKELL__=906' \
        '-Dlinux_BUILD_OS=1' \
        '-Dx86_64_BUILD_ARCH=1' \
        '-Dlinux_HOST_OS=1' \
        '-Dx86_64_HOST_ARCH=1' \
        -I$HOME/dev/buildghc/stage1/libraries/base/build/autogen \
        -I$HOME/dev/buildghc/stage1/libraries/base/build/include \
        -Ilibraries/base/include \
        -Ilibraries/base \
        -I/usr/include \
        -I$HOME/dev/buildghc/stage1/inplace/../../..//libraries/ghc-bignum/include/ \
        -I$HOME/dev/buildghc/stage1/libraries/ghc-bignum/build/include/ \
        -I$HOME/dev/buildghc/stage1/inplace/../../..//libraries/ghc-bignum/include \
        -I$HOME/dev/buildghc/stage1/libraries/ghc-bignum/build/include \
        -I$HOME/dev/buildghc/stage1/inplace/../../..//rts/include \
        -I$HOME/dev/buildghc/stage1/rts/build/include \
        '-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/@FFIIncludeDir@' \
        '-I$HOME/dev/buildghc/stage1/rts/build/@FFIIncludeDir@' \
        '-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/@LibdwIncludeDir@' \
        '-I$HOME/dev/buildghc/stage1/rts/build/@LibdwIncludeDir@' \
        -L$HOME/dev/buildghc/stage1/inplace/../libraries/ghc-bignum/build \
        -L$HOME/dev/buildghc/stage1/inplace/../libraries/ghc-prim/build \
        -L$HOME/dev/buildghc/stage1/inplace/../rts/build -iquote \
        $HOME/dev/ghc/libraries/base \
        '-fuse-ld=gold'

There are two issues to note here:

    - "hadrian" fails to substitute @FFIIncludeDir@ and @LibdwIncludeDir at .
      This used to be handled by "configure", but the job of turning
      "rts.cabal.in" into "rts.cabal" seems to have been reassigned to
      "hadrian".

    - With the build output directory a sibling rather than a child of
      the source tree, the path to "rts/include" is not constructed
      correctly.  The path:

        -I$HOME/dev/buildghc/stage1/inplace/../../..//rts/include

      should have been:

        -I$HOME/dev/buildghc/stage1/inplace/../../../ghc/rts/include

Switching to the default path proved to be a viable work-around, but
perhaps other choices should also work.

-- 
    Viktor.

From ghc-devs at schmits.me  Sun Apr 30 09:18:07 2023
From: ghc-devs at schmits.me (Torsten Schmits)
Date: Sun, 30 Apr 2023 11:18:07 +0200
Subject: Build of GHC 9.6 fails when the build directory is not a child of
 the source directory
In-Reply-To: <ZE29x4k45gD2HYaq@straasha.imrryr.org>
References: <ZE29x4k45gD2HYaq@straasha.imrryr.org>
Message-ID: <b7c2de12-cb49-6fe3-afad-86601650c9d2@schmits.me>

Hi Viktor,

I created an issue for this: 
https://gitlab.haskell.org/ghc/ghc/-/issues/22741

You can share your insights there!

On 4/30/23 03:00, Viktor Dukhovni wrote:
> For some time now I'd been unable to build GHC 9.6 from source. The
> reason turned out to be that my hadrian command-line selected an
> explicit build directory that was not an immediate child of the source
> directory (default it seems is "_build").
>
> With the source tree under "$HOME/dev/ghc/", the hardrian command
>
> $ hadrian/build -V -V -o"$HOME/dev/buildghc" --docs=no-sphinx 
> binary-dist-dir
>
> after building stage0, and running "configure" in libraries/base,
> reports an error finding HsFFI.h:
>
> Reading parameters from 
> $HOME/dev/buildghc/stage1/libraries/base/build/base.buildinfo
> /usr/bin/cc '-fuse-ld=gold' /tmp/2303653-4.c -o /tmp/2303653-5
> '-D__GLASGOW_HASKELL__=906' \
> '-Dlinux_BUILD_OS=1' \
> '-Dx86_64_BUILD_ARCH=1' \
> '-Dlinux_HOST_OS=1' \
> '-Dx86_64_HOST_ARCH=1' \
> -I$HOME/dev/buildghc/stage1/libraries/base/build/autogen \
> -I$HOME/dev/buildghc/stage1/libraries/base/build/include \
> -Ilibraries/base/include \
> -Ilibraries/base \
> -I/usr/include \
> -I$HOME/dev/buildghc/stage1/inplace/../../..//libraries/ghc-bignum/include/ 
> \
> -I$HOME/dev/buildghc/stage1/libraries/ghc-bignum/build/include/ \
> -I$HOME/dev/buildghc/stage1/inplace/../../..//libraries/ghc-bignum/include 
> \
> -I$HOME/dev/buildghc/stage1/libraries/ghc-bignum/build/include \
> -I$HOME/dev/buildghc/stage1/inplace/../../..//rts/include \
> -I$HOME/dev/buildghc/stage1/rts/build/include \
> '-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/@FFIIncludeDir@' \
> '-I$HOME/dev/buildghc/stage1/rts/build/@FFIIncludeDir@' \
> '-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/@LibdwIncludeDir@' \
> '-I$HOME/dev/buildghc/stage1/rts/build/@LibdwIncludeDir@' \
> -L$HOME/dev/buildghc/stage1/inplace/../libraries/ghc-bignum/build \
> -L$HOME/dev/buildghc/stage1/inplace/../libraries/ghc-prim/build \
> -L$HOME/dev/buildghc/stage1/inplace/../rts/build -iquote \
> $HOME/dev/ghc/libraries/base \
> '-fuse-ld=gold'
>
> There are two issues to note here:
>
> - "hadrian" fails to substitute @FFIIncludeDir@ and @LibdwIncludeDir at .
> This used to be handled by "configure", but the job of turning
> "rts.cabal.in" into "rts.cabal" seems to have been reassigned to
> "hadrian".
>
> - With the build output directory a sibling rather than a child of
> the source tree, the path to "rts/include" is not constructed
> correctly. The path:
>
> -I$HOME/dev/buildghc/stage1/inplace/../../..//rts/include
>
> should have been:
>
> -I$HOME/dev/buildghc/stage1/inplace/../../../ghc/rts/include
>
> Switching to the default path proved to be a viable work-around, but
> perhaps other choices should also work.
>

From ietf-dane at dukhovni.org  Sun Apr 30 18:57:57 2023
From: ietf-dane at dukhovni.org (Viktor Dukhovni)
Date: Sun, 30 Apr 2023 14:57:57 -0400
Subject: Build of GHC 9.6 fails when the build directory is not a child
 of the source directory
In-Reply-To: <b7c2de12-cb49-6fe3-afad-86601650c9d2@schmits.me>
References: <ZE29x4k45gD2HYaq@straasha.imrryr.org>
 <b7c2de12-cb49-6fe3-afad-86601650c9d2@schmits.me>
Message-ID: <ZE66NbwmmKpDEQeu@straasha.imrryr.org>

On Sun, Apr 30, 2023 at 11:18:07AM +0200, Torsten Schmits via ghc-devs wrote:

> I created an issue for this: 
> https://gitlab.haskell.org/ghc/ghc/-/issues/22741
> 
> You can share your insights there!

Done.  It does look like we encountered the same underlying issue..

-- 
    Viktor.