From george.colpitts at gmail.com Sun Apr 2 11:41:07 2023 From: george.colpitts at gmail.com (George Colpitts) Date: Sun, 2 Apr 2023 08:41:07 -0300 Subject: =?UTF-8?Q?on_my_mac_9=2E6=2E1_can=27t_compile_with_ghc_=2Dprof_=2Dfprof=2D?= =?UTF-8?Q?auto_=2C_gets_error_Could_not_find_module_=E2=80=98Prelude=E2=80=99?= In-Reply-To: <87edpupvhh.fsf@smart-cactus.org> References: <87edpupvhh.fsf@smart-cactus.org> Message-ID: Hello On my mac, in ghc 9.6.1 can't compile with ghc -prof -fprof-auto , I get the error: Could not find module ‘Prelude’ Perhaps you haven't installed the "p_dyn" libraries for package ‘base-4.18.0.0’? Use -v (or `:set -v` in ghci) to see a list of the files searched for. Is anybody else experiencing this? In more detail: > ghc -prof -fprof-auto hello.hsLoaded package environment from /Users/gcolpitts/.ghc/x86_64-darwin-9.6.1/environments/default[1 of 2] Compiling Main ( hello.hs, hello.o )hello.hs:1:1: error: Could not find module ‘Prelude’ Perhaps you haven't installed the "p_dyn" libraries for package ‘base-4.18.0.0’? Use -v (or `:set -v` in ghci) to see a list of the files searched for. |1 | main = print "hello" | ^ hello.hs consists of main = print "hello" I reported this in 9.2.3 , in #21709 . At that time there was a workaround of adding -static but that no longer works. It gives a slightly different error message: ghc -prof -fprof-auto -static hello.hs Loaded package environment from /Users/gcolpitts/.ghc/x86_64-darwin-9.6.1/environments/default [2 of 2] Linking hello ld: warning: directory not found for option '-L/opt/local/lib/' ld: library not found for -lHStyp-qlty-1-186ccc78_p clang: error: linker command failed with exit code 1 (use -v to see invocation) ghc-9.6.1: `gcc' failed in phase `Linker'. (Exit code: 1) I have updated 21709 with the details of the problem on 9.6.1 Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From harendra.kumar at gmail.com Wed Apr 5 01:50:30 2023 From: harendra.kumar at gmail.com (Harendra Kumar) Date: Wed, 5 Apr 2023 07:20:30 +0530 Subject: Performance of small allocations via prim ops Message-ID: I was looking at the RTS code for allocating small objects via prim ops e.g. newByteArray# . The code looks like: stg_newByteArrayzh ( W_ n ) { MAYBE_GC_N(stg_newByteArrayzh, n); payload_words = ROUNDUP_BYTES_TO_WDS(n); words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words; ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words); We are making a foreign call here (ccall). I am wondering how much overhead a ccall adds? I guess it may have to save and restore registers. Would it be better to do the fast path case of allocating small objects from the nursery using cmm code like in stg_gc_noregs? -harendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From carter.schonwald at gmail.com Thu Apr 6 20:47:51 2023 From: carter.schonwald at gmail.com (Carter Schonwald) Date: Thu, 6 Apr 2023 16:47:51 -0400 Subject: Performance of small allocations via prim ops In-Reply-To: References: Message-ID: That sounds like a worthy experiment! I guess that would look like having an inline macro’d up path that checks if it can get the job done that falls back to the general code? Last I checked, the overhead for this sort of c call was on the order of 10nanoseconds or less which seems like it’d be very unlikely to be a bottleneck, but do you have any natural or artificial benchmark programs that would show case this? For this sortah code, extra branching for that optimization could easily have a larger performance impact than the known function call on modern hardware. (Though take my intuitions about these things with a grain of salt. ) On Tue, Apr 4, 2023 at 9:50 PM Harendra Kumar wrote: > I was looking at the RTS code for allocating small objects via prim ops > e.g. newByteArray# . The code looks like: > > stg_newByteArrayzh ( W_ n ) > { > MAYBE_GC_N(stg_newByteArrayzh, n); > > payload_words = ROUNDUP_BYTES_TO_WDS(n); > words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words; > ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words); > > We are making a foreign call here (ccall). I am wondering how much > overhead a ccall adds? I guess it may have to save and restore registers. > Would it be better to do the fast path case of allocating small objects > from the nursery using cmm code like in stg_gc_noregs? > > -harendra > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben at smart-cactus.org Thu Apr 6 22:02:08 2023 From: ben at smart-cactus.org (Ben Gamari) Date: Thu, 06 Apr 2023 18:02:08 -0400 Subject: Performance of small allocations via prim ops In-Reply-To: References: Message-ID: <87fs9cllg1.fsf@smart-cactus.org> Harendra Kumar writes: > I was looking at the RTS code for allocating small objects via prim ops > e.g. newByteArray# . The code looks like: > > stg_newByteArrayzh ( W_ n ) > { > MAYBE_GC_N(stg_newByteArrayzh, n); > > payload_words = ROUNDUP_BYTES_TO_WDS(n); > words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words; > ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words); > > We are making a foreign call here (ccall). I am wondering how much overhead > a ccall adds? I guess it may have to save and restore registers. Would it > be better to do the fast path case of allocating small objects from the > nursery using cmm code like in stg_gc_noregs? > GHC's operational model is designed in such a way that foreign calls are fairly cheap (e.g. we don't need to switch stacks, which can be quite costly). Judging by the assembler produced for newByteArray# in one random x86-64 tree that I have lying around, it's only a couple of data-movement instructions, an %eax clear, and a stack pop: 36: 48 89 ce mov %rcx,%rsi 39: 48 89 c7 mov %rax,%rdi 3c: 31 c0 xor %eax,%eax 3e: e8 00 00 00 00 call 43 43: 48 83 c4 08 add $0x8,%rsp The data movement operations in particular are quite cheap on most microarchitectures where GHC would run due to register renaming. I doubt that this overhead would be noticable in anything but a synthetic benchmark. However, it never hurts to measure. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 905 bytes Desc: not available URL: From harendra.kumar at gmail.com Fri Apr 7 05:19:59 2023 From: harendra.kumar at gmail.com (Harendra Kumar) Date: Fri, 7 Apr 2023 10:49:59 +0530 Subject: Performance of small allocations via prim ops In-Reply-To: <87fs9cllg1.fsf@smart-cactus.org> References: <87fs9cllg1.fsf@smart-cactus.org> Message-ID: Thanks Ben and Carter. I compiled the following to Cmm: {-# LANGUAGE MagicHash #-} {-# LANGUAGE UnboxedTuples #-} import GHC.IO import GHC.Exts data M = M (MutableByteArray# RealWorld) main = do _ <- IO (\s -> case newByteArray# 1# s of (# s1, arr #) -> (# s1, M arr #)) return () It produced the following Cmm: {offset c1k3: // global Hp = Hp + 24; if (Hp > HpLim) (likely: False) goto c1k7; else goto c1k6; c1k7: // global HpAlloc = 24; R1 = Main.main1_closure; call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8; c1k6: // global I64[Hp - 16] = stg_ARR_WORDS_info; I64[Hp - 8] = 1; R1 = GHC.Tuple.()_closure+1; call (P64[Sp])(R1) args: 8, res: 0, upd: 8; } It seems to be as good as it gets. There is absolutely no scope for improvement in this. -harendra On Fri, 7 Apr 2023 at 03:32, Ben Gamari wrote: > Harendra Kumar writes: > > > I was looking at the RTS code for allocating small objects via prim ops > > e.g. newByteArray# . The code looks like: > > > > stg_newByteArrayzh ( W_ n ) > > { > > MAYBE_GC_N(stg_newByteArrayzh, n); > > > > payload_words = ROUNDUP_BYTES_TO_WDS(n); > > words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words; > > ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words); > > > > We are making a foreign call here (ccall). I am wondering how much > overhead > > a ccall adds? I guess it may have to save and restore registers. Would it > > be better to do the fast path case of allocating small objects from the > > nursery using cmm code like in stg_gc_noregs? > > > GHC's operational model is designed in such a way that foreign calls are > fairly cheap (e.g. we don't need to switch stacks, which can be quite > costly). Judging by the assembler produced for newByteArray# in one > random x86-64 tree that I have lying around, it's only a couple of > data-movement instructions, an %eax clear, and a stack pop: > > 36: 48 89 ce mov %rcx,%rsi > 39: 48 89 c7 mov %rax,%rdi > 3c: 31 c0 xor %eax,%eax > 3e: e8 00 00 00 00 call 43 > 43: 48 83 c4 08 add $0x8,%rsp > > The data movement operations in particular are quite cheap on most > microarchitectures where GHC would run due to register renaming. I doubt > that this overhead would be noticable in anything but a synthetic > benchmark. However, it never hurts to measure. > > Cheers, > > - Ben > -------------- next part -------------- An HTML attachment was scrubbed... URL: From harendra.kumar at gmail.com Fri Apr 7 05:38:16 2023 From: harendra.kumar at gmail.com (Harendra Kumar) Date: Fri, 7 Apr 2023 11:08:16 +0530 Subject: Performance of small allocations via prim ops In-Reply-To: References: <87fs9cllg1.fsf@smart-cactus.org> Message-ID: Ah, some other optimization seems to be kicking in here. When I increase the size of the array to > 128 then I see a call to stg_newByteArray# being emitted: {offset c1kb: // global if ((Sp + -8) < SpLim) (likely: False) goto c1kc; else goto c1kd; c1kc: // global R1 = Main.main1_closure; call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8; c1kd: // global I64[Sp - 8] = c1k9; R1 = 129; Sp = Sp - 8; call stg_newByteArray#(R1) returns to c1k9, args: 8, res: 8, upd: 8; -harendra On Fri, 7 Apr 2023 at 10:49, Harendra Kumar wrote: > Thanks Ben and Carter. > > I compiled the following to Cmm: > > {-# LANGUAGE MagicHash #-} > {-# LANGUAGE UnboxedTuples #-} > > import GHC.IO > import GHC.Exts > > data M = M (MutableByteArray# RealWorld) > > main = do > _ <- IO (\s -> case newByteArray# 1# s of (# s1, arr #) -> (# s1, M > arr #)) > return () > > It produced the following Cmm: > > {offset > c1k3: // global > Hp = Hp + 24; > if (Hp > HpLim) (likely: False) goto c1k7; else goto c1k6; > c1k7: // global > HpAlloc = 24; > R1 = Main.main1_closure; > call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8; > c1k6: // global > I64[Hp - 16] = stg_ARR_WORDS_info; > I64[Hp - 8] = 1; > R1 = GHC.Tuple.()_closure+1; > call (P64[Sp])(R1) args: 8, res: 0, upd: 8; > } > > It seems to be as good as it gets. There is absolutely no scope for > improvement in this. > > -harendra > > On Fri, 7 Apr 2023 at 03:32, Ben Gamari wrote: > >> Harendra Kumar writes: >> >> > I was looking at the RTS code for allocating small objects via prim ops >> > e.g. newByteArray# . The code looks like: >> > >> > stg_newByteArrayzh ( W_ n ) >> > { >> > MAYBE_GC_N(stg_newByteArrayzh, n); >> > >> > payload_words = ROUNDUP_BYTES_TO_WDS(n); >> > words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words; >> > ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words); >> > >> > We are making a foreign call here (ccall). I am wondering how much >> overhead >> > a ccall adds? I guess it may have to save and restore registers. Would >> it >> > be better to do the fast path case of allocating small objects from the >> > nursery using cmm code like in stg_gc_noregs? >> > >> GHC's operational model is designed in such a way that foreign calls are >> fairly cheap (e.g. we don't need to switch stacks, which can be quite >> costly). Judging by the assembler produced for newByteArray# in one >> random x86-64 tree that I have lying around, it's only a couple of >> data-movement instructions, an %eax clear, and a stack pop: >> >> 36: 48 89 ce mov %rcx,%rsi >> 39: 48 89 c7 mov %rax,%rdi >> 3c: 31 c0 xor %eax,%eax >> 3e: e8 00 00 00 00 call 43 >> >> 43: 48 83 c4 08 add $0x8,%rsp >> >> The data movement operations in particular are quite cheap on most >> microarchitectures where GHC would run due to register renaming. I doubt >> that this overhead would be noticable in anything but a synthetic >> benchmark. However, it never hurts to measure. >> >> Cheers, >> >> - Ben >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From harendra.kumar at gmail.com Fri Apr 7 06:07:05 2023 From: harendra.kumar at gmail.com (Harendra Kumar) Date: Fri, 7 Apr 2023 11:37:05 +0530 Subject: Performance of small allocations via prim ops In-Reply-To: References: <87fs9cllg1.fsf@smart-cactus.org> Message-ID: Little bit of grepping in the code gave me this: emitPrimOp cfg primop = let max_inl_alloc_size = fromIntegral (stgToCmmMaxInlAllocSize cfg) in case primop of NewByteArrayOp_Char -> \case [(CmmLit (CmmInt n w))] | asUnsigned w n <= max_inl_alloc_size -- <------------------------------- see this line -> opIntoRegs $ \ [res] -> doNewByteArrayOp res (fromInteger n) _ -> PrimopCmmEmit_External We are emitting a more efficient code when the size of the array is smaller. And the threshold is governed by a compiler flag: , make_ord_flag defGhcFlag "fmax-inline-alloc-size" (intSuffix (\n d -> d { maxInlineAllocSize = n })) This means allocation of smaller arrays is extremely efficient and we can control it using `-fmax-inline-alloc-size`, the default is 128. That's a new thing I learnt today. Given this new finding, my original question now applies only to the case when the array size is bigger than this configurable threshold, which is a little less motivating. And Ben says that the call is not expensive, so we can leave it there. -harendra On Fri, 7 Apr 2023 at 11:08, Harendra Kumar wrote: > Ah, some other optimization seems to be kicking in here. When I increase > the size of the array to > 128 then I see a call to stg_newByteArray# being > emitted: > > {offset > c1kb: // global > if ((Sp + -8) < SpLim) (likely: False) goto c1kc; else goto > c1kd; > c1kc: // global > R1 = Main.main1_closure; > call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8; > c1kd: // global > I64[Sp - 8] = c1k9; > R1 = 129; > Sp = Sp - 8; > call stg_newByteArray#(R1) returns to c1k9, args: 8, res: 8, > upd: 8; > > -harendra > > On Fri, 7 Apr 2023 at 10:49, Harendra Kumar > wrote: > >> Thanks Ben and Carter. >> >> I compiled the following to Cmm: >> >> {-# LANGUAGE MagicHash #-} >> {-# LANGUAGE UnboxedTuples #-} >> >> import GHC.IO >> import GHC.Exts >> >> data M = M (MutableByteArray# RealWorld) >> >> main = do >> _ <- IO (\s -> case newByteArray# 1# s of (# s1, arr #) -> (# s1, M >> arr #)) >> return () >> >> It produced the following Cmm: >> >> {offset >> c1k3: // global >> Hp = Hp + 24; >> if (Hp > HpLim) (likely: False) goto c1k7; else goto c1k6; >> c1k7: // global >> HpAlloc = 24; >> R1 = Main.main1_closure; >> call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8; >> c1k6: // global >> I64[Hp - 16] = stg_ARR_WORDS_info; >> I64[Hp - 8] = 1; >> R1 = GHC.Tuple.()_closure+1; >> call (P64[Sp])(R1) args: 8, res: 0, upd: 8; >> } >> >> It seems to be as good as it gets. There is absolutely no scope for >> improvement in this. >> >> -harendra >> >> On Fri, 7 Apr 2023 at 03:32, Ben Gamari wrote: >> >>> Harendra Kumar writes: >>> >>> > I was looking at the RTS code for allocating small objects via prim ops >>> > e.g. newByteArray# . The code looks like: >>> > >>> > stg_newByteArrayzh ( W_ n ) >>> > { >>> > MAYBE_GC_N(stg_newByteArrayzh, n); >>> > >>> > payload_words = ROUNDUP_BYTES_TO_WDS(n); >>> > words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words; >>> > ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words); >>> > >>> > We are making a foreign call here (ccall). I am wondering how much >>> overhead >>> > a ccall adds? I guess it may have to save and restore registers. Would >>> it >>> > be better to do the fast path case of allocating small objects from the >>> > nursery using cmm code like in stg_gc_noregs? >>> > >>> GHC's operational model is designed in such a way that foreign calls are >>> fairly cheap (e.g. we don't need to switch stacks, which can be quite >>> costly). Judging by the assembler produced for newByteArray# in one >>> random x86-64 tree that I have lying around, it's only a couple of >>> data-movement instructions, an %eax clear, and a stack pop: >>> >>> 36: 48 89 ce mov %rcx,%rsi >>> 39: 48 89 c7 mov %rax,%rdi >>> 3c: 31 c0 xor %eax,%eax >>> 3e: e8 00 00 00 00 call 43 >>> >>> 43: 48 83 c4 08 add $0x8,%rsp >>> >>> The data movement operations in particular are quite cheap on most >>> microarchitectures where GHC would run due to register renaming. I doubt >>> that this overhead would be noticable in anything but a synthetic >>> benchmark. However, it never hurts to measure. >>> >>> Cheers, >>> >>> - Ben >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.peytonjones at gmail.com Fri Apr 7 07:28:56 2023 From: simon.peytonjones at gmail.com (Simon Peyton Jones) Date: Fri, 7 Apr 2023 08:28:56 +0100 Subject: Performance of small allocations via prim ops In-Reply-To: References: <87fs9cllg1.fsf@smart-cactus.org> Message-ID: > We are emitting a more efficient code when the size of the array is smaller. And the threshold is governed by a compiler flag: It would be good if this was documented. Perhaps in the Haddock for `newByteArray#`? Or where? S On Fri, 7 Apr 2023 at 07:07, Harendra Kumar wrote: > Little bit of grepping in the code gave me this: > > emitPrimOp cfg primop = > let max_inl_alloc_size = fromIntegral (stgToCmmMaxInlAllocSize cfg) > in case primop of > NewByteArrayOp_Char -> \case > [(CmmLit (CmmInt n w))] > | asUnsigned w n <= max_inl_alloc_size -- > <------------------------------- see this line > -> opIntoRegs $ \ [res] -> doNewByteArrayOp res (fromInteger n) > _ -> PrimopCmmEmit_External > > We are emitting a more efficient code when the size of the array is > smaller. And the threshold is governed by a compiler flag: > > , make_ord_flag defGhcFlag "fmax-inline-alloc-size" > (intSuffix (\n d -> d { maxInlineAllocSize = n })) > > This means allocation of smaller arrays is extremely efficient and we can > control it using `-fmax-inline-alloc-size`, the default is 128. That's a > new thing I learnt today. > > Given this new finding, my original question now applies only to the case > when the array size is bigger than this configurable threshold, which is a > little less motivating. And Ben says that the call is not expensive, so we > can leave it there. > > -harendra > > On Fri, 7 Apr 2023 at 11:08, Harendra Kumar > wrote: > >> Ah, some other optimization seems to be kicking in here. When I increase >> the size of the array to > 128 then I see a call to stg_newByteArray# being >> emitted: >> >> {offset >> c1kb: // global >> if ((Sp + -8) < SpLim) (likely: False) goto c1kc; else goto >> c1kd; >> c1kc: // global >> R1 = Main.main1_closure; >> call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8; >> c1kd: // global >> I64[Sp - 8] = c1k9; >> R1 = 129; >> Sp = Sp - 8; >> call stg_newByteArray#(R1) returns to c1k9, args: 8, res: 8, >> upd: 8; >> >> -harendra >> >> On Fri, 7 Apr 2023 at 10:49, Harendra Kumar >> wrote: >> >>> Thanks Ben and Carter. >>> >>> I compiled the following to Cmm: >>> >>> {-# LANGUAGE MagicHash #-} >>> {-# LANGUAGE UnboxedTuples #-} >>> >>> import GHC.IO >>> import GHC.Exts >>> >>> data M = M (MutableByteArray# RealWorld) >>> >>> main = do >>> _ <- IO (\s -> case newByteArray# 1# s of (# s1, arr #) -> (# s1, M >>> arr #)) >>> return () >>> >>> It produced the following Cmm: >>> >>> {offset >>> c1k3: // global >>> Hp = Hp + 24; >>> if (Hp > HpLim) (likely: False) goto c1k7; else goto c1k6; >>> c1k7: // global >>> HpAlloc = 24; >>> R1 = Main.main1_closure; >>> call (stg_gc_fun)(R1) args: 8, res: 0, upd: 8; >>> c1k6: // global >>> I64[Hp - 16] = stg_ARR_WORDS_info; >>> I64[Hp - 8] = 1; >>> R1 = GHC.Tuple.()_closure+1; >>> call (P64[Sp])(R1) args: 8, res: 0, upd: 8; >>> } >>> >>> It seems to be as good as it gets. There is absolutely no scope for >>> improvement in this. >>> >>> -harendra >>> >>> On Fri, 7 Apr 2023 at 03:32, Ben Gamari wrote: >>> >>>> Harendra Kumar writes: >>>> >>>> > I was looking at the RTS code for allocating small objects via prim >>>> ops >>>> > e.g. newByteArray# . The code looks like: >>>> > >>>> > stg_newByteArrayzh ( W_ n ) >>>> > { >>>> > MAYBE_GC_N(stg_newByteArrayzh, n); >>>> > >>>> > payload_words = ROUNDUP_BYTES_TO_WDS(n); >>>> > words = BYTES_TO_WDS(SIZEOF_StgArrBytes) + payload_words; >>>> > ("ptr" p) = ccall allocateMightFail(MyCapability() "ptr", words); >>>> > >>>> > We are making a foreign call here (ccall). I am wondering how much >>>> overhead >>>> > a ccall adds? I guess it may have to save and restore registers. >>>> Would it >>>> > be better to do the fast path case of allocating small objects from >>>> the >>>> > nursery using cmm code like in stg_gc_noregs? >>>> > >>>> GHC's operational model is designed in such a way that foreign calls are >>>> fairly cheap (e.g. we don't need to switch stacks, which can be quite >>>> costly). Judging by the assembler produced for newByteArray# in one >>>> random x86-64 tree that I have lying around, it's only a couple of >>>> data-movement instructions, an %eax clear, and a stack pop: >>>> >>>> 36: 48 89 ce mov %rcx,%rsi >>>> 39: 48 89 c7 mov %rax,%rdi >>>> 3c: 31 c0 xor %eax,%eax >>>> 3e: e8 00 00 00 00 call 43 >>>> >>>> 43: 48 83 c4 08 add $0x8,%rsp >>>> >>>> The data movement operations in particular are quite cheap on most >>>> microarchitectures where GHC would run due to register renaming. I doubt >>>> that this overhead would be noticable in anything but a synthetic >>>> benchmark. However, it never hurts to measure. >>>> >>>> Cheers, >>>> >>>> - Ben >>>> >>> _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From harendra.kumar at gmail.com Fri Apr 7 10:15:41 2023 From: harendra.kumar at gmail.com (Harendra Kumar) Date: Fri, 7 Apr 2023 15:45:41 +0530 Subject: Performance of small allocations via prim ops In-Reply-To: References: <87fs9cllg1.fsf@smart-cactus.org> Message-ID: On Fri, 7 Apr 2023 at 12:57, Simon Peyton Jones wrote: > > We are emitting a more efficient code when the size of the array is > smaller. And the threshold is governed by a compiler flag: > > It would be good if this was documented. Perhaps in the Haddock for > `newByteArray#`? Or where? > The flag is documented in the GHC user guide but the behavior would be better discoverable if `newByteArray#` mentions it. -harendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From harendra.kumar at gmail.com Fri Apr 7 10:28:57 2023 From: harendra.kumar at gmail.com (Harendra Kumar) Date: Fri, 7 Apr 2023 15:58:57 +0530 Subject: Performance of small allocations via prim ops In-Reply-To: References: <87fs9cllg1.fsf@smart-cactus.org> Message-ID: I am confused by this flag. This flag allows us to allocate statically known arrays sizes of <= n to be allocated from the current nursery block. But looking at the code in allocateMightFail, as I interpret it, any size array up to LARGE_OBJECT_THRESHOLD is anyway allocated from the current nursery block. So why have this option? Why not fix this to LARGE_OBJECT_THRESHOLD? Maybe I am missing something. -harendra On Fri, 7 Apr 2023 at 15:45, Harendra Kumar wrote: > > > On Fri, 7 Apr 2023 at 12:57, Simon Peyton Jones < > simon.peytonjones at gmail.com> wrote: > >> > We are emitting a more efficient code when the size of the array is >> smaller. And the threshold is governed by a compiler flag: >> >> It would be good if this was documented. Perhaps in the Haddock for >> `newByteArray#`? Or where? >> > > The flag is documented in the GHC user guide but the behavior would be > better discoverable if `newByteArray#` mentions it. > > -harendra > -------------- next part -------------- An HTML attachment was scrubbed... URL: From harendra.kumar at gmail.com Fri Apr 7 11:34:51 2023 From: harendra.kumar at gmail.com (Harendra Kumar) Date: Fri, 7 Apr 2023 17:04:51 +0530 Subject: Performance of small allocations via prim ops In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 at 02:18, Carter Schonwald wrote: > That sounds like a worthy experiment! > > I guess that would look like having an inline macro’d up path that checks > if it can get the job done that falls back to the general code? > > Last I checked, the overhead for this sort of c call was on the order of > 10nanoseconds or less which seems like it’d be very unlikely to be a > bottleneck, but do you have any natural or artificial benchmark programs > that would show case this? > I converted my example code into a loop and ran it a million times with a 1 byte array size (would be 8 bytes after alignment). So roughly 3 words would be allocated per array, including the header and length. It took 5 ms using the statically known size optimization which inlines the alloc completely, and 10 ms using an unknown size (from program arg) which makes a call to newByteArray# . That turns out to be of the order of 5ns more per allocation. It does not sound like a big deal. -harendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From carter.schonwald at gmail.com Fri Apr 7 12:41:20 2023 From: carter.schonwald at gmail.com (Carter Schonwald) Date: Fri, 7 Apr 2023 08:41:20 -0400 Subject: Performance of small allocations via prim ops In-Reply-To: References: Message-ID: Great /fast experimentation! I will admit I’m pleased that my dated intuition is still correct, but more importantly we have more current data! Thanks for the exploration and sharing what you found! On Fri, Apr 7, 2023 at 7:35 AM Harendra Kumar wrote: > > > On Fri, 7 Apr 2023 at 02:18, Carter Schonwald > wrote: > >> That sounds like a worthy experiment! >> >> I guess that would look like having an inline macro’d up path that >> checks if it can get the job done that falls back to the general code? >> >> Last I checked, the overhead for this sort of c call was on the order of >> 10nanoseconds or less which seems like it’d be very unlikely to be a >> bottleneck, but do you have any natural or artificial benchmark programs >> that would show case this? >> > > I converted my example code into a loop and ran it a million times with a > 1 byte array size (would be 8 bytes after alignment). So roughly 3 words > would be allocated per array, including the header and length. It took 5 ms > using the statically known size optimization which inlines the alloc > completely, and 10 ms using an unknown size (from program arg) which makes > a call to newByteArray# . That turns out to be of the order of 5ns more per > allocation. It does not sound like a big deal. > > -harendra > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben at smart-cactus.org Sun Apr 9 22:02:57 2023 From: ben at smart-cactus.org (Ben Gamari) Date: Sun, 09 Apr 2023 18:02:57 -0400 Subject: Performance of small allocations via prim ops In-Reply-To: References: <87fs9cllg1.fsf@smart-cactus.org> Message-ID: <87fs9890ke.fsf@smart-cactus.org> Harendra Kumar writes: > I am confused by this flag. This flag allows us to allocate statically > known arrays sizes of <= n to be allocated from the current nursery block. > But looking at the code in allocateMightFail, as I interpret it, any size > array up to LARGE_OBJECT_THRESHOLD is anyway allocated from the current > nursery block. So why have this option? Why not fix this to > LARGE_OBJECT_THRESHOLD? Maybe I am missing something. > In principle we could do so. The motivation for making this a flag isn't immediately clear from the commit implementing this optimisation (1eece45692fb5d1a5f4ec60c1537f8068237e9c1). One complication is that currently GHC has no way to know the value of LARGE_OBJECT_THRESHOLD (which is a runtime system macro). Typically to handle this sort of thing we use utils/deriveConstants to generate a Haskell binding mirroring the value of the C declaration. However, as GHC becomes runtime-retargetable we may need to revisit this design. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From harendra.kumar at gmail.com Wed Apr 12 09:02:43 2023 From: harendra.kumar at gmail.com (Harendra Kumar) Date: Wed, 12 Apr 2023 14:32:43 +0530 Subject: GHC 9.6.1 rejects previously working code Message-ID: The following code compiles with older compilers but does not compile with GHC 9.6.1: {-# LANGUAGE KindSignatures #-} module A () where import Control.Monad.IO.Class import Control.Monad.Trans.Class data T (m :: * -> *) a = T instance Functor (T m) where fmap f T = undefined instance Applicative (T m) where pure = undefined (<*>) = undefined instance MonadIO m => Monad (T m) where return = pure (>>=) = undefined instance MonadTrans T where lift = undefined It fails with the following error: xx.hs:20:10: error: [GHC-39999] • Could not deduce ‘MonadIO m’ arising from the head of a quantified constraint arising from the superclasses of an instance declaration from the context: Monad m bound by a quantified context at xx.hs:20:10-21 Possible fix: add (MonadIO m) to the context of a quantified context • In the instance declaration for ‘MonadTrans T’ | 20 | instance MonadTrans T where | ^^^^^^^^^^^^ What is the correct resolution for this? -harendra -------------- next part -------------- An HTML attachment was scrubbed... URL: From tom-lists-haskell-cafe-2017 at jaguarpaw.co.uk Wed Apr 12 09:10:10 2023 From: tom-lists-haskell-cafe-2017 at jaguarpaw.co.uk (Tom Ellis) Date: Wed, 12 Apr 2023 10:10:10 +0100 Subject: GHC 9.6.1 rejects previously working code In-Reply-To: References: Message-ID: On Wed, Apr 12, 2023 at 02:32:43PM +0530, Harendra Kumar wrote: > instance MonadIO m => Monad (T m) where > return = pure > (>>=) = undefined > > instance MonadTrans T where > lift = undefined I guess it's nothing to do with 9.6 per se, but rather the difference between * https://hackage.haskell.org/package/transformers-0.5.6.2/docs/Control-Monad-Trans-Class.html#t:MonadTrans * https://hackage.haskell.org/package/transformers-0.6.1.0/docs/Control-Monad-Trans-Class.html#t:MonadTrans I'm not sure I can see any solution for this. A monad transformer `T` must give rise to a monad `T m` regardless of what `m` is. If `T m` is only a monad when `MonadIO m` then `T` can't be a monad transformer (under transformers 0.6). Tom From rodrigo.m.mesquita at gmail.com Wed Apr 12 09:30:50 2023 From: rodrigo.m.mesquita at gmail.com (Rodrigo Mesquita) Date: Wed, 12 Apr 2023 10:30:50 +0100 Subject: GHC 9.6.1 rejects previously working code In-Reply-To: References: Message-ID: <550F8331-4BE0-4DF4-B9D8-12CD8EC4ED03@gmail.com> Indeed, this is included in the GHC 9.6.x Migration Guide . Unfortunately, I’m also not sure there is a solution for this particular where (T m) is only a Monad if m instances MonadIO. As Tom explained, under transformers 0.6 `T` no longer is a monad transformer. A few workarounds I can think of: - No longer instance `MonadTrans T`, and use a instance `MonadIO m => MonadIO (T m)` instead. Rationale: if you always require `m` to be `MonadIO`, perhaps the ability to always lift an `m` to `T m` with `liftIO` is sufficient. - Add the `MonadIO` instance to the `m` field of `T`, GADT style, `data T m a where T :: MonadIO m => m -> T m a` Rational: You would no longer need `MonadIO` in the `Monad` instance, which will make it possible to instance `MonadTrans`. - Redefine your own `lift` regardless of `MonadTrans` Good luck! Rodrigo > On 12 Apr 2023, at 10:10, Tom Ellis wrote: > > On Wed, Apr 12, 2023 at 02:32:43PM +0530, Harendra Kumar wrote: >> instance MonadIO m => Monad (T m) where >> return = pure >> (>>=) = undefined >> >> instance MonadTrans T where >> lift = undefined > > I guess it's nothing to do with 9.6 per se, but rather the difference > between > > * https://hackage.haskell.org/package/transformers-0.5.6.2/docs/Control-Monad-Trans-Class.html#t:MonadTrans > > * https://hackage.haskell.org/package/transformers-0.6.1.0/docs/Control-Monad-Trans-Class.html#t:MonadTrans > > I'm not sure I can see any solution for this. A monad transformer `T` > must give rise to a monad `T m` regardless of what `m` is. If `T m` > is only a monad when `MonadIO m` then `T` can't be a monad transformer > (under transformers 0.6). > > Tom > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -------------- next part -------------- An HTML attachment was scrubbed... URL: From harendra.kumar at gmail.com Wed Apr 12 09:42:26 2023 From: harendra.kumar at gmail.com (Harendra Kumar) Date: Wed, 12 Apr 2023 15:12:26 +0530 Subject: GHC 9.6.1 rejects previously working code In-Reply-To: <550F8331-4BE0-4DF4-B9D8-12CD8EC4ED03@gmail.com> References: <550F8331-4BE0-4DF4-B9D8-12CD8EC4ED03@gmail.com> Message-ID: Thanks Tom and Rodrigo. That clarifies the problem. We will need to think which solution makes better sense. On Wed, 12 Apr 2023 at 15:01, Rodrigo Mesquita wrote: > Indeed, this is included in the GHC 9.6.x Migration Guide > > . > > Unfortunately, I’m also not sure there is a solution for this particular > where (T m) is only a Monad if m instances MonadIO. > As Tom explained, under transformers 0.6 `T` no longer is a monad > transformer. > > A few workarounds I can think of: > > - No longer instance `MonadTrans T`, and use a instance `MonadIO m => > MonadIO (T m)` instead. > Rationale: if you always require `m` to be `MonadIO`, perhaps the > ability to always lift an `m` to `T m` with `liftIO` is sufficient. > > - Add the `MonadIO` instance to the `m` field of `T`, GADT style, `data T > m a where T :: MonadIO m => m -> T m a` > Rational: You would no longer need `MonadIO` in the `Monad` instance, > which will make it possible to instance `MonadTrans`. > > - Redefine your own `lift` regardless of `MonadTrans` > > Good luck! > Rodrigo > > On 12 Apr 2023, at 10:10, Tom Ellis < > tom-lists-haskell-cafe-2017 at jaguarpaw.co.uk> wrote: > > On Wed, Apr 12, 2023 at 02:32:43PM +0530, Harendra Kumar wrote: > > instance MonadIO m => Monad (T m) where > return = pure > (>>=) = undefined > > instance MonadTrans T where > lift = undefined > > > I guess it's nothing to do with 9.6 per se, but rather the difference > between > > * > https://hackage.haskell.org/package/transformers-0.5.6.2/docs/Control-Monad-Trans-Class.html#t:MonadTrans > > * > https://hackage.haskell.org/package/transformers-0.6.1.0/docs/Control-Monad-Trans-Class.html#t:MonadTrans > > I'm not sure I can see any solution for this. A monad transformer `T` > must give rise to a monad `T m` regardless of what `m` is. If `T m` > is only a monad when `MonadIO m` then `T` can't be a monad transformer > (under transformers 0.6). > > Tom > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sylvain at haskus.fr Wed Apr 12 10:06:12 2023 From: sylvain at haskus.fr (Sylvain Henry) Date: Wed, 12 Apr 2023 12:06:12 +0200 Subject: Performance of small allocations via prim ops In-Reply-To: <87fs9890ke.fsf@smart-cactus.org> References: <87fs9cllg1.fsf@smart-cactus.org> <87fs9890ke.fsf@smart-cactus.org> Message-ID: <5c68b310-b980-e8fd-bb90-5fd3d450fb04@haskus.fr> > One complication is that currently GHC has no way to know the value of > LARGE_OBJECT_THRESHOLD (which is a runtime system macro). Typically to > handle this sort of thing we use utils/deriveConstants to generate a > Haskell binding mirroring the value of the C declaration. However, > as GHC becomes runtime-retargetable we may need to revisit this design. Since https://gitlab.haskell.org/ghc/ghc/-/commit/085983e63bfe6af23f8b85fbfcca8db4872d2f60 (2021-03) we don't do this. We only read constants from the header file provided by the RTS unit. Adding one more constant for LARGE_OBJECT_THRESHOLD shouldn't be an issue. Cheers Sylvain From zubin at well-typed.com Tue Apr 18 13:56:48 2023 From: zubin at well-typed.com (Zubin Duggal) Date: Tue, 18 Apr 2023 19:26:48 +0530 Subject: [Haskell] [ANNOUNCE] GHC 9.4.5 released Message-ID: The GHC developers are happy to announce the availability of GHC 9.4.5. Binary distributions, source distributions, and documentation are available at [downloads.haskell.org](https://downloads.haskell.org/ghc/9.4.5). Download Page: https://www.haskell.org/ghc/download_ghc_9.4.5.html Blog Post: https://www.haskell.org/ghc/blog/20230418-ghc-9.4.5-released.html This release is primarily a bugfix release addressing a few issues found in 9.4.4. These include: * Fixes for a number of bug fixes in the simplifier (#22623, #22718, #22913, 22695, #23184, #22998, #22662, #22725). * Many bug fixes to the non-moving and parallel GCs (#22264, #22327, #22926, #22927, #22929, #22930, #17574, #21840, #22528) * A fix a bug with the alignment of RTS data structures that could result in segfaults when compiled with high optimisation settings on certain platforms (#22975 , #22965). * Bumping `gmp-tarballs` to a version which doesn't use the reserved `x18` register on AArch64/Darwin systems, and also has fixes for CVE-2021-43618 (#22497, #22789). * A number of improvements to recompilation avoidance with multiple home units (#22675, #22677, #22669, #22678, #22679, #22680) * Fixes for regressions in the typechecker and constraint solver (#22647, #23134, #22516, #22743) * Easier installation of binary distribution on MacOS platforms by changing the installation Makefile to remove the quarantine attribute when installing. * ... and many more. See the [release notes] for a full accounting. As some of the fixed issues do affect correctness users are encouraged to upgrade promptly. We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release. As always, do give this release a try and open a [ticket][] if you see anything amiss. Happy compiling, - Zubin [ticket]: https://gitlab.haskell.org/ghc/ghc/-/issues/new [release notes]: https://downloads.haskell.org/~ghc/9.4.5/docs/html/users_guide/9.4.5-notes.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From george.colpitts at gmail.com Mon Apr 24 20:49:46 2023 From: george.colpitts at gmail.com (George Colpitts) Date: Mon, 24 Apr 2023 17:49:46 -0300 Subject: does llvm 16 work with ghc 9.6.1 ? Message-ID: Hi Does anybody know if llvm 16 works with ghc 9.6.1 ? Thanks George -------------- next part -------------- An HTML attachment was scrubbed... URL: From moritz.angermann at gmail.com Tue Apr 25 01:01:48 2023 From: moritz.angermann at gmail.com (Moritz Angermann) Date: Tue, 25 Apr 2023 09:01:48 +0800 Subject: does llvm 16 work with ghc 9.6.1 ? In-Reply-To: References: Message-ID: Hi George, while I personally haven’t tried. I’d encourage you to just try. Unless they changed their textual IR (they don’t do that often anymore), it could just work. Whether or not you run into bugs for the specific target you are looking at, is hard to say without knowing the target. My suggestion would be to just try building your configuration with the llvm backend against llvm16, and run validate if you can. Cheers, Moritz On Tue, 25 Apr 2023 at 4:50 AM, George Colpitts wrote: > Hi > > Does anybody know if llvm 16 works with ghc 9.6.1 ? > > Thanks > George > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From godzbanebane at gmail.com Tue Apr 25 06:38:59 2023 From: godzbanebane at gmail.com (Georgi Lyubenov) Date: Tue, 25 Apr 2023 09:38:59 +0300 Subject: GHC 9.6.1 rejects previously working code In-Reply-To: References: <550F8331-4BE0-4DF4-B9D8-12CD8EC4ED03@gmail.com> Message-ID: <51f59bba-c4a3-ef83-bf55-9c11ffaf7534@gmail.com> Out of curiosity, why do you require the `MonadIO` on the `Monad` instance? On 4/12/23 12:42, Harendra Kumar wrote: > Thanks Tom and Rodrigo. > > That clarifies the problem. We will need to think which solution makes > better sense. > > On Wed, 12 Apr 2023 at 15:01, Rodrigo Mesquita > wrote: > > Indeed, this is included in the GHC 9.6.x Migration Guide > . > > > Unfortunately, I’m also not sure there is a solution for this > particular where (T m) is only a Monad if m instances MonadIO. > As Tom explained, under transformers 0.6 `T` no longer is a monad > transformer. > > A few workarounds I can think of: > > - No longer instance `MonadTrans T`, and use a instance `MonadIO m > => MonadIO (T m)` instead. >   Rationale: if you always require `m` to be `MonadIO`, perhaps > the ability to always lift an `m` to `T m` with `liftIO` is > sufficient. > > - Add the `MonadIO` instance to the `m` field of `T`, GADT style, > `data T m a where T :: MonadIO m => m -> T m a` >   Rational: You would no longer need `MonadIO` in the `Monad` > instance, which will make it possible to instance `MonadTrans`. > > - Redefine your own `lift` regardless of `MonadTrans` > > Good luck! > Rodrigo > >> On 12 Apr 2023, at 10:10, Tom Ellis >> wrote: >> >> On Wed, Apr 12, 2023 at 02:32:43PM +0530, Harendra Kumar wrote: >>> instance MonadIO m => Monad (T m) where >>>    return = pure >>>    (>>=) = undefined >>> >>> instance MonadTrans T where >>>    lift = undefined >> >> I guess it's nothing to do with 9.6 per se, but rather the difference >> between >> >> * >> https://hackage.haskell.org/package/transformers-0.5.6.2/docs/Control-Monad-Trans-Class.html#t:MonadTrans >> >> * >> https://hackage.haskell.org/package/transformers-0.6.1.0/docs/Control-Monad-Trans-Class.html#t:MonadTrans >> >> I'm not sure I can see any solution for this.  A monad >> transformer `T` >> must give rise to a monad `T m` regardless of what `m` is.  If `T m` >> is only a monad when `MonadIO m` then `T` can't be a monad >> transformer >> (under transformers 0.6). >> >> Tom >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -------------- next part -------------- An HTML attachment was scrubbed... URL: From george.colpitts at gmail.com Tue Apr 25 13:54:41 2023 From: george.colpitts at gmail.com (George Colpitts) Date: Tue, 25 Apr 2023 10:54:41 -0300 Subject: does llvm 16 work with ghc 9.6.1 ? In-Reply-To: References: Message-ID: Thanks Moritz. I went ahead and tried it. On a very simple smoke test I observed that "-fllvm works" but "-O2 -fllvm" does not. It fails with "Cannot use -O# with legacy PM.". There is already a bug for changing ghc to work with the new pass manager. It wasn't clear to me that this would be needed for llvm 16. It seems that it is. Cheers George On Mon, Apr 24, 2023 at 10:02 PM Moritz Angermann < moritz.angermann at gmail.com> wrote: > Hi George, > > while I personally haven’t tried. I’d encourage you to just try. Unless > they changed their textual IR (they don’t do that often anymore), it could > just work. > > Whether or not you run into bugs for the specific target you are looking > at, is hard to say without knowing the target. > > My suggestion would be to just try building your configuration with the > llvm backend against llvm16, and run validate if you can. > > Cheers, > Moritz > > On Tue, 25 Apr 2023 at 4:50 AM, George Colpitts > wrote: > >> Hi >> >> Does anybody know if llvm 16 works with ghc 9.6.1 ? >> >> Thanks >> George >> >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From george.colpitts at gmail.com Tue Apr 25 14:35:27 2023 From: george.colpitts at gmail.com (George Colpitts) Date: Tue, 25 Apr 2023 11:35:27 -0300 Subject: does llvm 16 work with ghc 9.6.1 ? In-Reply-To: References: Message-ID: the bug for this is https://gitlab.haskell.org/ghc/ghc/-/issues/22954 On Tue, Apr 25, 2023 at 10:54 AM George Colpitts wrote: > Thanks Moritz. I went ahead and tried it. On a very simple smoke test I > observed that "-fllvm works" but "-O2 -fllvm" does not. It fails with > "Cannot use -O# with legacy PM.". There is already a bug for changing ghc > to work with the new pass manager. It wasn't clear to me that this would be > needed for llvm 16. It seems that it is. > > Cheers > George > > > > > On Mon, Apr 24, 2023 at 10:02 PM Moritz Angermann < > moritz.angermann at gmail.com> wrote: > >> Hi George, >> >> while I personally haven’t tried. I’d encourage you to just try. Unless >> they changed their textual IR (they don’t do that often anymore), it could >> just work. >> >> Whether or not you run into bugs for the specific target you are looking >> at, is hard to say without knowing the target. >> >> My suggestion would be to just try building your configuration with the >> llvm backend against llvm16, and run validate if you can. >> >> Cheers, >> Moritz >> >> On Tue, 25 Apr 2023 at 4:50 AM, George Colpitts < >> george.colpitts at gmail.com> wrote: >> >>> Hi >>> >>> Does anybody know if llvm 16 works with ghc 9.6.1 ? >>> >>> Thanks >>> George >>> >>> _______________________________________________ >>> ghc-devs mailing list >>> ghc-devs at haskell.org >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From george.colpitts at gmail.com Tue Apr 25 19:56:52 2023 From: george.colpitts at gmail.com (George Colpitts) Date: Tue, 25 Apr 2023 16:56:52 -0300 Subject: does llvm 16 work with ghc 9.6.1 ? In-Reply-To: References: Message-ID: @duog has documented workarounds in https://gitlab.haskell.org/ghc/ghc/-/issues/21936: These /ghc/ flags reproduce -O0: -optlo-passes='module(default,function(mem2reg),globalopt,function(lower-expect))' -fno-llvm-tbaa -O0 These /ghc/ flags reproduce -O1: -optlo-passes='module(default,globalopt)' -O1 -fno-llvm-tbaa These /ghc/ flags reproduce -O2: -optlo-passes='module(default)' -O2 -fno-llvm-tbaa On Tue, Apr 25, 2023 at 11:35 AM George Colpitts wrote: > the bug for this is https://gitlab.haskell.org/ghc/ghc/-/issues/22954 > > On Tue, Apr 25, 2023 at 10:54 AM George Colpitts < > george.colpitts at gmail.com> wrote: > >> Thanks Moritz. I went ahead and tried it. On a very simple smoke test I >> observed that "-fllvm works" but "-O2 -fllvm" does not. It fails with >> "Cannot use -O# with legacy PM.". There is already a bug for changing ghc >> to work with the new pass manager. It wasn't clear to me that this would be >> needed for llvm 16. It seems that it is. >> >> Cheers >> George >> >> >> >> >> On Mon, Apr 24, 2023 at 10:02 PM Moritz Angermann < >> moritz.angermann at gmail.com> wrote: >> >>> Hi George, >>> >>> while I personally haven’t tried. I’d encourage you to just try. Unless >>> they changed their textual IR (they don’t do that often anymore), it could >>> just work. >>> >>> Whether or not you run into bugs for the specific target you are looking >>> at, is hard to say without knowing the target. >>> >>> My suggestion would be to just try building your configuration with the >>> llvm backend against llvm16, and run validate if you can. >>> >>> Cheers, >>> Moritz >>> >>> On Tue, 25 Apr 2023 at 4:50 AM, George Colpitts < >>> george.colpitts at gmail.com> wrote: >>> >>>> Hi >>>> >>>> Does anybody know if llvm 16 works with ghc 9.6.1 ? >>>> >>>> Thanks >>>> George >>>> >>>> _______________________________________________ >>>> ghc-devs mailing list >>>> ghc-devs at haskell.org >>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From ietf-dane at dukhovni.org Sun Apr 30 01:00:55 2023 From: ietf-dane at dukhovni.org (Viktor Dukhovni) Date: Sat, 29 Apr 2023 21:00:55 -0400 Subject: Build of GHC 9.6 fails when the build directory is not a child of the source directory Message-ID: For some time now I'd been unable to build GHC 9.6 from source. The reason turned out to be that my hadrian command-line selected an explicit build directory that was not an immediate child of the source directory (default it seems is "_build"). With the source tree under "$HOME/dev/ghc/", the hardrian command $ hadrian/build -V -V -o"$HOME/dev/buildghc" --docs=no-sphinx binary-dist-dir after building stage0, and running "configure" in libraries/base, reports an error finding HsFFI.h: Reading parameters from $HOME/dev/buildghc/stage1/libraries/base/build/base.buildinfo /usr/bin/cc '-fuse-ld=gold' /tmp/2303653-4.c -o /tmp/2303653-5 '-D__GLASGOW_HASKELL__=906' \ '-Dlinux_BUILD_OS=1' \ '-Dx86_64_BUILD_ARCH=1' \ '-Dlinux_HOST_OS=1' \ '-Dx86_64_HOST_ARCH=1' \ -I$HOME/dev/buildghc/stage1/libraries/base/build/autogen \ -I$HOME/dev/buildghc/stage1/libraries/base/build/include \ -Ilibraries/base/include \ -Ilibraries/base \ -I/usr/include \ -I$HOME/dev/buildghc/stage1/inplace/../../..//libraries/ghc-bignum/include/ \ -I$HOME/dev/buildghc/stage1/libraries/ghc-bignum/build/include/ \ -I$HOME/dev/buildghc/stage1/inplace/../../..//libraries/ghc-bignum/include \ -I$HOME/dev/buildghc/stage1/libraries/ghc-bignum/build/include \ -I$HOME/dev/buildghc/stage1/inplace/../../..//rts/include \ -I$HOME/dev/buildghc/stage1/rts/build/include \ '-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/@FFIIncludeDir@' \ '-I$HOME/dev/buildghc/stage1/rts/build/@FFIIncludeDir@' \ '-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/@LibdwIncludeDir@' \ '-I$HOME/dev/buildghc/stage1/rts/build/@LibdwIncludeDir@' \ -L$HOME/dev/buildghc/stage1/inplace/../libraries/ghc-bignum/build \ -L$HOME/dev/buildghc/stage1/inplace/../libraries/ghc-prim/build \ -L$HOME/dev/buildghc/stage1/inplace/../rts/build -iquote \ $HOME/dev/ghc/libraries/base \ '-fuse-ld=gold' There are two issues to note here: - "hadrian" fails to substitute @FFIIncludeDir@ and @LibdwIncludeDir at . This used to be handled by "configure", but the job of turning "rts.cabal.in" into "rts.cabal" seems to have been reassigned to "hadrian". - With the build output directory a sibling rather than a child of the source tree, the path to "rts/include" is not constructed correctly. The path: -I$HOME/dev/buildghc/stage1/inplace/../../..//rts/include should have been: -I$HOME/dev/buildghc/stage1/inplace/../../../ghc/rts/include Switching to the default path proved to be a viable work-around, but perhaps other choices should also work. -- Viktor. From ghc-devs at schmits.me Sun Apr 30 09:18:07 2023 From: ghc-devs at schmits.me (Torsten Schmits) Date: Sun, 30 Apr 2023 11:18:07 +0200 Subject: Build of GHC 9.6 fails when the build directory is not a child of the source directory In-Reply-To: References: Message-ID: Hi Viktor, I created an issue for this: https://gitlab.haskell.org/ghc/ghc/-/issues/22741 You can share your insights there! On 4/30/23 03:00, Viktor Dukhovni wrote: > For some time now I'd been unable to build GHC 9.6 from source. The > reason turned out to be that my hadrian command-line selected an > explicit build directory that was not an immediate child of the source > directory (default it seems is "_build"). > > With the source tree under "$HOME/dev/ghc/", the hardrian command > > $ hadrian/build -V -V -o"$HOME/dev/buildghc" --docs=no-sphinx > binary-dist-dir > > after building stage0, and running "configure" in libraries/base, > reports an error finding HsFFI.h: > > Reading parameters from > $HOME/dev/buildghc/stage1/libraries/base/build/base.buildinfo > /usr/bin/cc '-fuse-ld=gold' /tmp/2303653-4.c -o /tmp/2303653-5 > '-D__GLASGOW_HASKELL__=906' \ > '-Dlinux_BUILD_OS=1' \ > '-Dx86_64_BUILD_ARCH=1' \ > '-Dlinux_HOST_OS=1' \ > '-Dx86_64_HOST_ARCH=1' \ > -I$HOME/dev/buildghc/stage1/libraries/base/build/autogen \ > -I$HOME/dev/buildghc/stage1/libraries/base/build/include \ > -Ilibraries/base/include \ > -Ilibraries/base \ > -I/usr/include \ > -I$HOME/dev/buildghc/stage1/inplace/../../..//libraries/ghc-bignum/include/ > \ > -I$HOME/dev/buildghc/stage1/libraries/ghc-bignum/build/include/ \ > -I$HOME/dev/buildghc/stage1/inplace/../../..//libraries/ghc-bignum/include > \ > -I$HOME/dev/buildghc/stage1/libraries/ghc-bignum/build/include \ > -I$HOME/dev/buildghc/stage1/inplace/../../..//rts/include \ > -I$HOME/dev/buildghc/stage1/rts/build/include \ > '-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/@FFIIncludeDir@' \ > '-I$HOME/dev/buildghc/stage1/rts/build/@FFIIncludeDir@' \ > '-I$HOME/dev/buildghc/stage1/inplace/../../..//rts/@LibdwIncludeDir@' \ > '-I$HOME/dev/buildghc/stage1/rts/build/@LibdwIncludeDir@' \ > -L$HOME/dev/buildghc/stage1/inplace/../libraries/ghc-bignum/build \ > -L$HOME/dev/buildghc/stage1/inplace/../libraries/ghc-prim/build \ > -L$HOME/dev/buildghc/stage1/inplace/../rts/build -iquote \ > $HOME/dev/ghc/libraries/base \ > '-fuse-ld=gold' > > There are two issues to note here: > > - "hadrian" fails to substitute @FFIIncludeDir@ and @LibdwIncludeDir at . > This used to be handled by "configure", but the job of turning > "rts.cabal.in" into "rts.cabal" seems to have been reassigned to > "hadrian". > > - With the build output directory a sibling rather than a child of > the source tree, the path to "rts/include" is not constructed > correctly. The path: > > -I$HOME/dev/buildghc/stage1/inplace/../../..//rts/include > > should have been: > > -I$HOME/dev/buildghc/stage1/inplace/../../../ghc/rts/include > > Switching to the default path proved to be a viable work-around, but > perhaps other choices should also work. > From ietf-dane at dukhovni.org Sun Apr 30 18:57:57 2023 From: ietf-dane at dukhovni.org (Viktor Dukhovni) Date: Sun, 30 Apr 2023 14:57:57 -0400 Subject: Build of GHC 9.6 fails when the build directory is not a child of the source directory In-Reply-To: References: Message-ID: On Sun, Apr 30, 2023 at 11:18:07AM +0200, Torsten Schmits via ghc-devs wrote: > I created an issue for this: > https://gitlab.haskell.org/ghc/ghc/-/issues/22741 > > You can share your insights there! Done. It does look like we encountered the same underlying issue.. -- Viktor.