From lexi.lambda at gmail.com Thu Feb 1 11:15:33 2024 From: lexi.lambda at gmail.com (Alexis King) Date: Thu, 1 Feb 2024 05:15:33 -0600 Subject: Why is mixing profiling and dynamic linking unsupported? Message-ID: <2bec12e2-6544-4059-a0c1-c91fd4437cd1@gmail.com> Hello all, GHC does not support combining -prof and -dynamic. Various places note that these are incompatible, but I have never managed to find a rationale for this incompatibility. I am sure there a good reasons exists, but I am not sure what it is. Does anyone know the reason? And, even better, is the reason written down somewhere? I’d love to understand this better. Thanks, Alexis From juhpetersen at gmail.com Fri Feb 2 09:14:50 2024 From: juhpetersen at gmail.com (Jens Petersen) Date: Fri, 2 Feb 2024 17:14:50 +0800 Subject: GHC 9.10 release In-Reply-To: <87msstrtfh.fsf@smart-cactus.org> References: <87msstrtfh.fsf@smart-cactus.org> Message-ID: I really do hope *Or Patterns* will be included. I saw the MR has been slowly progressing. To me at least that is the most compelling new feature. Jens -------------- next part -------------- An HTML attachment was scrubbed... URL: From sgraf1337 at gmail.com Fri Feb 2 09:47:56 2024 From: sgraf1337 at gmail.com (Sebastian Graf) Date: Fri, 2 Feb 2024 09:47:56 +0000 Subject: GHC 9.10 release In-Reply-To: References: <87msstrtfh.fsf@smart-cactus.org> Message-ID: Hi Jens, Or patterns is implemented, but lacks someone to push one the finish line (documentation, review). Unless someone other than me wants to step in, it won't be merged in the next couple of months; I'm sorry. Cheers, Sebastian ________________________________ Von: ghc-devs im Auftrag von Jens Petersen Gesendet: Friday, February 2, 2024 10:14:50 AM An: GHC dev Betreff: Re: GHC 9.10 release I really do hope Or Patterns will be included. I saw the MR has been slowly progressing. To me at least that is the most compelling new feature. Jens -------------- next part -------------- An HTML attachment was scrubbed... URL: From adam at sandbergericsson.se Fri Feb 2 13:44:55 2024 From: adam at sandbergericsson.se (Adam Sandberg Eriksson) Date: Fri, 02 Feb 2024 13:44:55 +0000 Subject: Why is mixing profiling and dynamic linking unsupported? In-Reply-To: <2bec12e2-6544-4059-a0c1-c91fd4437cd1@gmail.com> References: <2bec12e2-6544-4059-a0c1-c91fd4437cd1@gmail.com> Message-ID: I gathered some info about this on https://gitlab.haskell.org/ghc/ghc/-/issues/21329, but there is also an issue where Cabal also doesn't know to build or use dyn+prof things (which caused me to give up at the time). With just ghc I managed to get it working (also noted in the ticket), but that is outdated now since this was before hadrian. I don't think there's any fundamental reason. I would like it to work! Cheers, A On Thu, 1 Feb 2024, at 11:15, Alexis King wrote: > Hello all, > > GHC does not support combining -prof and -dynamic. Various places note > that these are incompatible, but I have never managed to find a > rationale for this incompatibility. I am sure there a good reasons > exists, but I am not sure what it is. > > Does anyone know the reason? And, even better, is the reason written > down somewhere? I’d love to understand this better. > > Thanks, > Alexis > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From brandonchinn178 at gmail.com Sun Feb 4 18:24:19 2024 From: brandonchinn178 at gmail.com (Brandon Chinn) Date: Sun, 4 Feb 2024 10:24:19 -0800 Subject: Help implementing Multiline String Literals Message-ID: Hello! I'm trying to implement #24390 , which implements the multiline string literals proposal (existing work done in wip/multiline-strings ). I originally suggested adding HsMultilineString to HsLit and translating it to HsString in renaming, then Matthew Pickering suggested I translate it in desugaring instead. I tried going down this approach, but I'm running into two main issues: Escaped characters and Overloaded strings. Apologies in advance for a long email. *TL;DR* - The best implementation I could think of involves a complete rewrite of how strings are lexed and modifying HsString instead of adding a new HsMultilineString constructor. If this is absolutely crazy talk, please dissuade me from this :) ===== Problem 1: Escaped characters ===== Currently, Lexer.x resolves escaped characters for string literals. In the Note [Literal source text], we see that this is intentional; HsString should contain a normalized internal representation. However, multiline string literals have a post-processing step that requires distinguishing between the user typing a newline vs the user typing literally a backslash + an `N` (and other things like knowing if a user typed in `\&`, which currently goes away in lexing as well). Fundamentally, the current logic to resolve escaped characters is specific to the Lexer monad and operates on a per-character basis. But the multiline string literals proposal requires post-processing the whole string, then resolving escaped characters all at once. Possible solutions: (1.1) Duplicate the logic for resolving escaped characters * Pro: Leaves normal string lexing untouched * Con: Two sources of truth, possibly divergent behaviors between multiline and normal strings (1.2) Stick the post-processed string back into P, then rerun normal string lexing to resolve escaped characters * Pro: Leaves normal string lexing untouched * Con: Seems roundabout, inefficient, and hacky (1.3) Refactor the resolve-escaped-characters logic to work in both the P monad and as a pure function `String -> String` * Pro: Reuses same escaped-characters logic for both normal + multiline strings * Con: Different overall behavior between the two string types: Normal string still lexed per-character, Multiline strings would lex everything * Con: Small refactor of lexing normal strings, which could introduce regressions (1.4) Read entire string (both normal + multiline) with no preprocessing (including string gaps or anything, except escaping quote delimiters), and define all post-processing steps as pure `String -> String` functions * Pro: Gets out of monadic code quickly, turn bulk of string logic into pure code * Pro: Processes normal + multiline strings exactly the same * Pro: Opens the door for future string behaviors, e.g. raw string could do the same "read entire string" logic, and just not do any post-processing. * Con: Could be less performant * Con: Major refactor of lexing normal strings, which could introduce regressions I like solution 1.4 the best, as it generalizes string processing behavior the best and is more pipeline-style vs the currently more imperative style. But I recognize possible performance or behavior regressions are a real thing, so if anyone has any thoughts here, I'd love to hear them. ===== Problem 2: Overloaded strings ===== Currently, `HsString s` is converted into `HsOverLit (HsIsString s)` in the renaming phase. Following Matthew's suggestion of resolving multiline string literals in the desugar step, this would mean that multiline string literals are post-processed after OverloadedStrings has already been applied. I don't like any of the solutions this approach brings up: * Do post processing both when Desugaring HsMultilineString AND when Renaming HsMultilineString to HsOverLit - seems wrong to process multiline strings in two different phases * Add HsIsStringMultiline and post process when desugaring both HsMultilineString and HsIsStringMultiline - would ideally like to avoid adding a variant of HsIsStringMultiline Instead, I propose we throw away the HsMultilineString idea and reuse HsString. The multiline syntax would still be preserved in the SourceText, and this also leaves the door open for future string features. For example, if we went with HsMultilineString, then adding raw strings would require adding both HsRawString and HsMultilineRawString. Here are two possible solutions for reusing HsString: (2.1) Add a HsStringType parameter to HsString * HsStringType would define the format of the FastString stored in HsString: Normal => processed, Multiline => stores raw string, needs post-processing * Post processing could occur in desugaring, with or without OverloadedStrings * Pro: Shows the parsed multiline string before processing in -ddump-parsed * Con: HsString containing Multiline strings would not contain the normalized representation mentioned in Note [Literal source text] * Con: Breaking change in the GHC API (2.2) Post-process multiline strings in lexer * Lexer would do all the post processing (for example, in conjunction with solution 1.4) and just return a normal HsString * Pro: Multiline string is immediately desugared and behaves as expected for OverloadedStrings (and any other behaviors of string literals, existing or future) for free * Pro: HsString would still always contain the normalized representation * Con: No way of inspecting the raw multiline parse output before processing, e.g. via -ddump-parsed I'm leaning towards solution 2.1, but curious what people's thoughts are. ===== Closing remarks ===== Again, sorry for the long email. My head is spinning trying to figure out this feature. Any help would be greatly appreciated. As an aside, I last worked on GHC back in 2020 or 2021, and my goodness. The Hadrian build is so much smoother (and faster!? Not sure if it's just my new laptop though) than what it was last time I touched the codebase. Huge thanks to the maintainers, both for the tooling and the docs in the wiki. This is a much more enjoyable experience. Thanks, Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From sgraf1337 at gmail.com Thu Feb 8 14:45:37 2024 From: sgraf1337 at gmail.com (Sebastian Graf) Date: Thu, 08 Feb 2024 14:45:37 +0000 Subject: Help implementing Multiline String Literals In-Reply-To: References: Message-ID: Hi Brandon, I'm not following all of the details here, but from my naïve understanding, I would definitely tweak the lexer, do the post-processing and then have a canonical string representation rather than waiting until desugaring. If you like 1.4 best, give it a try. You will seen soon enough if some performance regression test gets worse. It can't hurt to write a few yourself either. I don't think that post-processing the strings would incur too much a hit compared to compiling those strings and serialise them into an executable. I also bet that you can get rid some of the performance problems with list fusion. Cheers, Sebastian ------ Originalnachricht ------ Von: "Brandon Chinn" An: ghc-devs at haskell.org Gesendet: 04.02.2024 19:24:19 Betreff: Help implementing Multiline String Literals > Hello! > >I'm trying to implement #24390 >, which implements >the multiline string literals proposal > >(existing work done in wip/multiline-strings >). >I originally suggested adding HsMultilineString to HsLit and >translating it to HsString in renaming, then Matthew Pickering >suggested I translate it in desugaring instead. I tried going down this >approach, but I'm running into two main issues: Escaped characters and >Overloaded strings. > >Apologies in advance for a long email. TL;DR - The best implementation >I could think of involves a complete rewrite of how strings are lexed >and modifying HsString instead of adding a new HsMultilineString >constructor. If this is absolutely crazy talk, please dissuade me from >this :) > >===== Problem 1: Escaped characters ===== >Currently, Lexer.x resolves escaped characters for string literals. In >the Note [Literal source text], we see that this is intentional; >HsString should contain a normalized internal representation. However, >multiline string literals have a post-processing step that requires >distinguishing between the user typing a newline vs the user typing >literally a backslash + an `N` (and other things like knowing if a user >typed in `\&`, which currently goes away in lexing as well). > >Fundamentally, the current logic to resolve escaped characters is >specific to the Lexer monad and operates on a per-character basis. But >the multiline string literals proposal requires post-processing the >whole string, then resolving escaped characters all at once. > >Possible solutions: > >(1.1) Duplicate the logic for resolving escaped characters > * Pro: Leaves normal string lexing untouched > * Con: Two sources of truth, possibly divergent behaviors between >multiline and normal strings > >(1.2) Stick the post-processed string back into P, then rerun normal >string lexing to resolve escaped characters > * Pro: Leaves normal string lexing untouched > * Con: Seems roundabout, inefficient, and hacky > >(1.3) Refactor the resolve-escaped-characters logic to work in both the >P monad and as a pure function `String -> String` > * Pro: Reuses same escaped-characters logic for both normal + >multiline strings > * Con: Different overall behavior between the two string types: >Normal string still lexed per-character, Multiline strings would lex >everything > * Con: Small refactor of lexing normal strings, which could >introduce regressions > >(1.4) Read entire string (both normal + multiline) with no >preprocessing (including string gaps or anything, except escaping quote >delimiters), and define all post-processing steps as pure `String -> >String` functions > * Pro: Gets out of monadic code quickly, turn bulk of string logic >into pure code > * Pro: Processes normal + multiline strings exactly the same > * Pro: Opens the door for future string behaviors, e.g. raw string >could do the same "read entire string" logic, and just not do any >post-processing. > * Con: Could be less performant > * Con: Major refactor of lexing normal strings, which could >introduce regressions > >I like solution 1.4 the best, as it generalizes string processing >behavior the best and is more pipeline-style vs the currently more >imperative style. But I recognize possible performance or behavior >regressions are a real thing, so if anyone has any thoughts here, I'd >love to hear them. > >===== Problem 2: Overloaded strings ===== >Currently, `HsString s` is converted into `HsOverLit (HsIsString s)` in >the renaming phase. Following Matthew's suggestion of resolving >multiline string literals in the desugar step, this would mean that >multiline string literals are post-processed after OverloadedStrings >has already been applied. > >I don't like any of the solutions this approach brings up: >* Do post processing both when Desugaring HsMultilineString AND when >Renaming HsMultilineString to HsOverLit - seems wrong to process >multiline strings in two different phases >* Add HsIsStringMultiline and post process when desugaring both >HsMultilineString and HsIsStringMultiline - would ideally like to avoid >adding a variant of HsIsStringMultiline > >Instead, I propose we throw away the HsMultilineString idea and reuse >HsString. The multiline syntax would still be preserved in the >SourceText, and this also leaves the door open for future string >features. For example, if we went with HsMultilineString, then adding >raw strings would require adding both HsRawString and >HsMultilineRawString. > >Here are two possible solutions for reusing HsString: > >(2.1) Add a HsStringType parameter to HsString > * HsStringType would define the format of the FastString stored in >HsString: Normal => processed, Multiline => stores raw string, needs >post-processing > * Post processing could occur in desugaring, with or without >OverloadedStrings > * Pro: Shows the parsed multiline string before processing in >-ddump-parsed > * Con: HsString containing Multiline strings would not contain the >normalized representation mentioned in Note [Literal source text] > * Con: Breaking change in the GHC API > >(2.2) Post-process multiline strings in lexer > * Lexer would do all the post processing (for example, in >conjunction with solution 1.4) and just return a normal HsString > * Pro: Multiline string is immediately desugared and behaves as >expected for OverloadedStrings (and any other behaviors of string >literals, existing or future) for free > * Pro: HsString would still always contain the normalized >representation > * Con: No way of inspecting the raw multiline parse output before >processing, e.g. via -ddump-parsed > >I'm leaning towards solution 2.1, but curious what people's thoughts >are. > >===== Closing remarks ===== >Again, sorry for the long email. My head is spinning trying to figure >out this feature. Any help would be greatly appreciated. > >As an aside, I last worked on GHC back in 2020 or 2021, and my >goodness. The Hadrian build is so much smoother (and faster!? Not sure >if it's just my new laptop though) than what it was last time I touched >the codebase. Huge thanks to the maintainers, both for the tooling and >the docs in the wiki. This is a much more enjoyable experience. > >Thanks, >Brandon -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewtpickering at gmail.com Thu Feb 8 15:09:37 2024 From: matthewtpickering at gmail.com (Matthew Pickering) Date: Thu, 8 Feb 2024 15:09:37 +0000 Subject: Help implementing Multiline String Literals In-Reply-To: References: Message-ID: I would imagine you modify the lexer like you describe, but it's not clear to me you want to use the same constructor `HsString` to represent them all the way through the compiler. If you reuse HsString then how to you distinguish between a string which contains a newline and a multi-line string for example? It just seems simpler to me to explicitly represent a multi-line string.. perhaps `HsMultiLineString [String]` rather than trying to shoehorn them together and run into subtle bugs like this. Matt On Thu, Feb 8, 2024 at 2:45 PM Sebastian Graf wrote: > > Hi Brandon, > > I'm not following all of the details here, but from my naïve understanding, I would definitely tweak the lexer, do the post-processing and then have a canonical string representation rather than waiting until desugaring. > If you like 1.4 best, give it a try. You will seen soon enough if some performance regression test gets worse. It can't hurt to write a few yourself either. > I don't think that post-processing the strings would incur too much a hit compared to compiling those strings and serialise them into an executable. > I also bet that you can get rid some of the performance problems with list fusion. > > Cheers, > Sebastian > > ------ Originalnachricht ------ > Von: "Brandon Chinn" > An: ghc-devs at haskell.org > Gesendet: 04.02.2024 19:24:19 > Betreff: Help implementing Multiline String Literals > > Hello! > > I'm trying to implement #24390, which implements the multiline string literals proposal (existing work done in wip/multiline-strings). I originally suggested adding HsMultilineString to HsLit and translating it to HsString in renaming, then Matthew Pickering suggested I translate it in desugaring instead. I tried going down this approach, but I'm running into two main issues: Escaped characters and Overloaded strings. > > Apologies in advance for a long email. TL;DR - The best implementation I could think of involves a complete rewrite of how strings are lexed and modifying HsString instead of adding a new HsMultilineString constructor. If this is absolutely crazy talk, please dissuade me from this :) > > ===== Problem 1: Escaped characters ===== > Currently, Lexer.x resolves escaped characters for string literals. In the Note [Literal source text], we see that this is intentional; HsString should contain a normalized internal representation. However, multiline string literals have a post-processing step that requires distinguishing between the user typing a newline vs the user typing literally a backslash + an `N` (and other things like knowing if a user typed in `\&`, which currently goes away in lexing as well). > > Fundamentally, the current logic to resolve escaped characters is specific to the Lexer monad and operates on a per-character basis. But the multiline string literals proposal requires post-processing the whole string, then resolving escaped characters all at once. > > Possible solutions: > > (1.1) Duplicate the logic for resolving escaped characters > * Pro: Leaves normal string lexing untouched > * Con: Two sources of truth, possibly divergent behaviors between multiline and normal strings > > (1.2) Stick the post-processed string back into P, then rerun normal string lexing to resolve escaped characters > * Pro: Leaves normal string lexing untouched > * Con: Seems roundabout, inefficient, and hacky > > (1.3) Refactor the resolve-escaped-characters logic to work in both the P monad and as a pure function `String -> String` > * Pro: Reuses same escaped-characters logic for both normal + multiline strings > * Con: Different overall behavior between the two string types: Normal string still lexed per-character, Multiline strings would lex everything > * Con: Small refactor of lexing normal strings, which could introduce regressions > > (1.4) Read entire string (both normal + multiline) with no preprocessing (including string gaps or anything, except escaping quote delimiters), and define all post-processing steps as pure `String -> String` functions > * Pro: Gets out of monadic code quickly, turn bulk of string logic into pure code > * Pro: Processes normal + multiline strings exactly the same > * Pro: Opens the door for future string behaviors, e.g. raw string could do the same "read entire string" logic, and just not do any post-processing. > * Con: Could be less performant > * Con: Major refactor of lexing normal strings, which could introduce regressions > > I like solution 1.4 the best, as it generalizes string processing behavior the best and is more pipeline-style vs the currently more imperative style. But I recognize possible performance or behavior regressions are a real thing, so if anyone has any thoughts here, I'd love to hear them. > > ===== Problem 2: Overloaded strings ===== > Currently, `HsString s` is converted into `HsOverLit (HsIsString s)` in the renaming phase. Following Matthew's suggestion of resolving multiline string literals in the desugar step, this would mean that multiline string literals are post-processed after OverloadedStrings has already been applied. > > I don't like any of the solutions this approach brings up: > * Do post processing both when Desugaring HsMultilineString AND when Renaming HsMultilineString to HsOverLit - seems wrong to process multiline strings in two different phases > * Add HsIsStringMultiline and post process when desugaring both HsMultilineString and HsIsStringMultiline - would ideally like to avoid adding a variant of HsIsStringMultiline > > Instead, I propose we throw away the HsMultilineString idea and reuse HsString. The multiline syntax would still be preserved in the SourceText, and this also leaves the door open for future string features. For example, if we went with HsMultilineString, then adding raw strings would require adding both HsRawString and HsMultilineRawString. > > Here are two possible solutions for reusing HsString: > > (2.1) Add a HsStringType parameter to HsString > * HsStringType would define the format of the FastString stored in HsString: Normal => processed, Multiline => stores raw string, needs post-processing > * Post processing could occur in desugaring, with or without OverloadedStrings > * Pro: Shows the parsed multiline string before processing in -ddump-parsed > * Con: HsString containing Multiline strings would not contain the normalized representation mentioned in Note [Literal source text] > * Con: Breaking change in the GHC API > > (2.2) Post-process multiline strings in lexer > * Lexer would do all the post processing (for example, in conjunction with solution 1.4) and just return a normal HsString > * Pro: Multiline string is immediately desugared and behaves as expected for OverloadedStrings (and any other behaviors of string literals, existing or future) for free > * Pro: HsString would still always contain the normalized representation > * Con: No way of inspecting the raw multiline parse output before processing, e.g. via -ddump-parsed > > I'm leaning towards solution 2.1, but curious what people's thoughts are. > > ===== Closing remarks ===== > Again, sorry for the long email. My head is spinning trying to figure out this feature. Any help would be greatly appreciated. > > As an aside, I last worked on GHC back in 2020 or 2021, and my goodness. The Hadrian build is so much smoother (and faster!? Not sure if it's just my new laptop though) than what it was last time I touched the codebase. Huge thanks to the maintainers, both for the tooling and the docs in the wiki. This is a much more enjoyable experience. > > Thanks, > Brandon > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From ietf-dane at dukhovni.org Thu Feb 8 15:14:23 2024 From: ietf-dane at dukhovni.org (Viktor Dukhovni) Date: Thu, 8 Feb 2024 10:14:23 -0500 Subject: Help implementing Multiline String Literals In-Reply-To: References: Message-ID: On Thu, Feb 08, 2024 at 03:09:37PM +0000, Matthew Pickering wrote: > I would imagine you modify the lexer like you describe, but it's not > clear to me you want to use the same constructor `HsString` to > represent them all the way through the compiler. > > If you reuse HsString then how to you distinguish between a string > which contains a newline and a multi-line string for example? It just > seems simpler to me to explicitly represent a multi-line string.. > perhaps `HsMultiLineString [String]` rather than trying to shoehorn > them together and run into subtle bugs like this. Compiler-aside, how first-class are multi-line strings expected to be? Should `Read` instances also be able to handle multi-line strings? Arguably, valid source syntax should be valid syntax for `Read`? -- Viktor. From brandonchinn178 at gmail.com Thu Feb 8 15:35:10 2024 From: brandonchinn178 at gmail.com (Brandon Chinn) Date: Thu, 8 Feb 2024 07:35:10 -0800 Subject: Help implementing Multiline String Literals In-Reply-To: References: Message-ID: Thanks Sebastian and Matt! Matt - can you elaborate, I don't understand your comment. A multiline string is just syntax sugar for a normal string, so if the lexer does the post processing, it can be treated as a normal string the rest of the way. Why does anything else in the compiler need to know if the string was written as a multiline string? Or, to rephrase, a multiline string _should_ be semantically indistinguishable from a normal string with \n characters typed in. On Thu, Feb 8, 2024, 7:09 AM Matthew Pickering wrote: > I would imagine you modify the lexer like you describe, but it's not > clear to me you want to use the same constructor `HsString` to > represent them all the way through the compiler. > > If you reuse HsString then how to you distinguish between a string > which contains a newline and a multi-line string for example? It just > seems simpler to me to explicitly represent a multi-line string.. > perhaps `HsMultiLineString [String]` rather than trying to shoehorn > them together and run into subtle bugs like this. > > Matt > > On Thu, Feb 8, 2024 at 2:45 PM Sebastian Graf wrote: > > > > Hi Brandon, > > > > I'm not following all of the details here, but from my naïve > understanding, I would definitely tweak the lexer, do the post-processing > and then have a canonical string representation rather than waiting until > desugaring. > > If you like 1.4 best, give it a try. You will seen soon enough if some > performance regression test gets worse. It can't hurt to write a few > yourself either. > > I don't think that post-processing the strings would incur too much a > hit compared to compiling those strings and serialise them into an > executable. > > I also bet that you can get rid some of the performance problems with > list fusion. > > > > Cheers, > > Sebastian > > > > ------ Originalnachricht ------ > > Von: "Brandon Chinn" > > An: ghc-devs at haskell.org > > Gesendet: 04.02.2024 19:24:19 > > Betreff: Help implementing Multiline String Literals > > > > Hello! > > > > I'm trying to implement #24390, which implements the multiline string > literals proposal (existing work done in wip/multiline-strings). I > originally suggested adding HsMultilineString to HsLit and translating it > to HsString in renaming, then Matthew Pickering suggested I translate it in > desugaring instead. I tried going down this approach, but I'm running into > two main issues: Escaped characters and Overloaded strings. > > > > Apologies in advance for a long email. TL;DR - The best implementation I > could think of involves a complete rewrite of how strings are lexed and > modifying HsString instead of adding a new HsMultilineString constructor. > If this is absolutely crazy talk, please dissuade me from this :) > > > > ===== Problem 1: Escaped characters ===== > > Currently, Lexer.x resolves escaped characters for string literals. In > the Note [Literal source text], we see that this is intentional; HsString > should contain a normalized internal representation. However, multiline > string literals have a post-processing step that requires distinguishing > between the user typing a newline vs the user typing literally a backslash > + an `N` (and other things like knowing if a user typed in `\&`, which > currently goes away in lexing as well). > > > > Fundamentally, the current logic to resolve escaped characters is > specific to the Lexer monad and operates on a per-character basis. But the > multiline string literals proposal requires post-processing the whole > string, then resolving escaped characters all at once. > > > > Possible solutions: > > > > (1.1) Duplicate the logic for resolving escaped characters > > * Pro: Leaves normal string lexing untouched > > * Con: Two sources of truth, possibly divergent behaviors between > multiline and normal strings > > > > (1.2) Stick the post-processed string back into P, then rerun normal > string lexing to resolve escaped characters > > * Pro: Leaves normal string lexing untouched > > * Con: Seems roundabout, inefficient, and hacky > > > > (1.3) Refactor the resolve-escaped-characters logic to work in both the > P monad and as a pure function `String -> String` > > * Pro: Reuses same escaped-characters logic for both normal + > multiline strings > > * Con: Different overall behavior between the two string types: > Normal string still lexed per-character, Multiline strings would lex > everything > > * Con: Small refactor of lexing normal strings, which could > introduce regressions > > > > (1.4) Read entire string (both normal + multiline) with no preprocessing > (including string gaps or anything, except escaping quote delimiters), and > define all post-processing steps as pure `String -> String` functions > > * Pro: Gets out of monadic code quickly, turn bulk of string logic > into pure code > > * Pro: Processes normal + multiline strings exactly the same > > * Pro: Opens the door for future string behaviors, e.g. raw string > could do the same "read entire string" logic, and just not do any > post-processing. > > * Con: Could be less performant > > * Con: Major refactor of lexing normal strings, which could > introduce regressions > > > > I like solution 1.4 the best, as it generalizes string processing > behavior the best and is more pipeline-style vs the currently more > imperative style. But I recognize possible performance or behavior > regressions are a real thing, so if anyone has any thoughts here, I'd love > to hear them. > > > > ===== Problem 2: Overloaded strings ===== > > Currently, `HsString s` is converted into `HsOverLit (HsIsString s)` in > the renaming phase. Following Matthew's suggestion of resolving multiline > string literals in the desugar step, this would mean that multiline string > literals are post-processed after OverloadedStrings has already been > applied. > > > > I don't like any of the solutions this approach brings up: > > * Do post processing both when Desugaring HsMultilineString AND when > Renaming HsMultilineString to HsOverLit - seems wrong to process multiline > strings in two different phases > > * Add HsIsStringMultiline and post process when desugaring both > HsMultilineString and HsIsStringMultiline - would ideally like to avoid > adding a variant of HsIsStringMultiline > > > > Instead, I propose we throw away the HsMultilineString idea and reuse > HsString. The multiline syntax would still be preserved in the SourceText, > and this also leaves the door open for future string features. For example, > if we went with HsMultilineString, then adding raw strings would require > adding both HsRawString and HsMultilineRawString. > > > > Here are two possible solutions for reusing HsString: > > > > (2.1) Add a HsStringType parameter to HsString > > * HsStringType would define the format of the FastString stored in > HsString: Normal => processed, Multiline => stores raw string, needs > post-processing > > * Post processing could occur in desugaring, with or without > OverloadedStrings > > * Pro: Shows the parsed multiline string before processing in > -ddump-parsed > > * Con: HsString containing Multiline strings would not contain the > normalized representation mentioned in Note [Literal source text] > > * Con: Breaking change in the GHC API > > > > (2.2) Post-process multiline strings in lexer > > * Lexer would do all the post processing (for example, in > conjunction with solution 1.4) and just return a normal HsString > > * Pro: Multiline string is immediately desugared and behaves as > expected for OverloadedStrings (and any other behaviors of string literals, > existing or future) for free > > * Pro: HsString would still always contain the normalized > representation > > * Con: No way of inspecting the raw multiline parse output before > processing, e.g. via -ddump-parsed > > > > I'm leaning towards solution 2.1, but curious what people's thoughts are. > > > > ===== Closing remarks ===== > > Again, sorry for the long email. My head is spinning trying to figure > out this feature. Any help would be greatly appreciated. > > > > As an aside, I last worked on GHC back in 2020 or 2021, and my goodness. > The Hadrian build is so much smoother (and faster!? Not sure if it's just > my new laptop though) than what it was last time I touched the codebase. > Huge thanks to the maintainers, both for the tooling and the docs in the > wiki. This is a much more enjoyable experience. > > > > Thanks, > > Brandon > > > > _______________________________________________ > > ghc-devs mailing list > > ghc-devs at haskell.org > > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewtpickering at gmail.com Thu Feb 8 15:54:53 2024 From: matthewtpickering at gmail.com (Matthew Pickering) Date: Thu, 8 Feb 2024 15:54:53 +0000 Subject: Help implementing Multiline String Literals In-Reply-To: References: Message-ID: I don't think that is the right way to go. They are different syntactic forms so they should be distinguished in the syntax tree. If I want to generate HsSyn directly, and print it out, how does the compiler know whether I meant to print a normal string literal or a multi line string literal? What about if the compiler tries to print out an expression containing a string literal in an error message, multi or normal? Matt On Thu, Feb 8, 2024 at 3:35 PM Brandon Chinn wrote: > > Thanks Sebastian and Matt! > > Matt - can you elaborate, I don't understand your comment. A multiline string is just syntax sugar for a normal string, so if the lexer does the post processing, it can be treated as a normal string the rest of the way. Why does anything else in the compiler need to know if the string was written as a multiline string? > > Or, to rephrase, a multiline string _should_ be semantically indistinguishable from a normal string with \n characters typed in. > > On Thu, Feb 8, 2024, 7:09 AM Matthew Pickering wrote: >> >> I would imagine you modify the lexer like you describe, but it's not >> clear to me you want to use the same constructor `HsString` to >> represent them all the way through the compiler. >> >> If you reuse HsString then how to you distinguish between a string >> which contains a newline and a multi-line string for example? It just >> seems simpler to me to explicitly represent a multi-line string.. >> perhaps `HsMultiLineString [String]` rather than trying to shoehorn >> them together and run into subtle bugs like this. >> >> Matt >> >> On Thu, Feb 8, 2024 at 2:45 PM Sebastian Graf wrote: >> > >> > Hi Brandon, >> > >> > I'm not following all of the details here, but from my naïve understanding, I would definitely tweak the lexer, do the post-processing and then have a canonical string representation rather than waiting until desugaring. >> > If you like 1.4 best, give it a try. You will seen soon enough if some performance regression test gets worse. It can't hurt to write a few yourself either. >> > I don't think that post-processing the strings would incur too much a hit compared to compiling those strings and serialise them into an executable. >> > I also bet that you can get rid some of the performance problems with list fusion. >> > >> > Cheers, >> > Sebastian >> > >> > ------ Originalnachricht ------ >> > Von: "Brandon Chinn" >> > An: ghc-devs at haskell.org >> > Gesendet: 04.02.2024 19:24:19 >> > Betreff: Help implementing Multiline String Literals >> > >> > Hello! >> > >> > I'm trying to implement #24390, which implements the multiline string literals proposal (existing work done in wip/multiline-strings). I originally suggested adding HsMultilineString to HsLit and translating it to HsString in renaming, then Matthew Pickering suggested I translate it in desugaring instead. I tried going down this approach, but I'm running into two main issues: Escaped characters and Overloaded strings. >> > >> > Apologies in advance for a long email. TL;DR - The best implementation I could think of involves a complete rewrite of how strings are lexed and modifying HsString instead of adding a new HsMultilineString constructor. If this is absolutely crazy talk, please dissuade me from this :) >> > >> > ===== Problem 1: Escaped characters ===== >> > Currently, Lexer.x resolves escaped characters for string literals. In the Note [Literal source text], we see that this is intentional; HsString should contain a normalized internal representation. However, multiline string literals have a post-processing step that requires distinguishing between the user typing a newline vs the user typing literally a backslash + an `N` (and other things like knowing if a user typed in `\&`, which currently goes away in lexing as well). >> > >> > Fundamentally, the current logic to resolve escaped characters is specific to the Lexer monad and operates on a per-character basis. But the multiline string literals proposal requires post-processing the whole string, then resolving escaped characters all at once. >> > >> > Possible solutions: >> > >> > (1.1) Duplicate the logic for resolving escaped characters >> > * Pro: Leaves normal string lexing untouched >> > * Con: Two sources of truth, possibly divergent behaviors between multiline and normal strings >> > >> > (1.2) Stick the post-processed string back into P, then rerun normal string lexing to resolve escaped characters >> > * Pro: Leaves normal string lexing untouched >> > * Con: Seems roundabout, inefficient, and hacky >> > >> > (1.3) Refactor the resolve-escaped-characters logic to work in both the P monad and as a pure function `String -> String` >> > * Pro: Reuses same escaped-characters logic for both normal + multiline strings >> > * Con: Different overall behavior between the two string types: Normal string still lexed per-character, Multiline strings would lex everything >> > * Con: Small refactor of lexing normal strings, which could introduce regressions >> > >> > (1.4) Read entire string (both normal + multiline) with no preprocessing (including string gaps or anything, except escaping quote delimiters), and define all post-processing steps as pure `String -> String` functions >> > * Pro: Gets out of monadic code quickly, turn bulk of string logic into pure code >> > * Pro: Processes normal + multiline strings exactly the same >> > * Pro: Opens the door for future string behaviors, e.g. raw string could do the same "read entire string" logic, and just not do any post-processing. >> > * Con: Could be less performant >> > * Con: Major refactor of lexing normal strings, which could introduce regressions >> > >> > I like solution 1.4 the best, as it generalizes string processing behavior the best and is more pipeline-style vs the currently more imperative style. But I recognize possible performance or behavior regressions are a real thing, so if anyone has any thoughts here, I'd love to hear them. >> > >> > ===== Problem 2: Overloaded strings ===== >> > Currently, `HsString s` is converted into `HsOverLit (HsIsString s)` in the renaming phase. Following Matthew's suggestion of resolving multiline string literals in the desugar step, this would mean that multiline string literals are post-processed after OverloadedStrings has already been applied. >> > >> > I don't like any of the solutions this approach brings up: >> > * Do post processing both when Desugaring HsMultilineString AND when Renaming HsMultilineString to HsOverLit - seems wrong to process multiline strings in two different phases >> > * Add HsIsStringMultiline and post process when desugaring both HsMultilineString and HsIsStringMultiline - would ideally like to avoid adding a variant of HsIsStringMultiline >> > >> > Instead, I propose we throw away the HsMultilineString idea and reuse HsString. The multiline syntax would still be preserved in the SourceText, and this also leaves the door open for future string features. For example, if we went with HsMultilineString, then adding raw strings would require adding both HsRawString and HsMultilineRawString. >> > >> > Here are two possible solutions for reusing HsString: >> > >> > (2.1) Add a HsStringType parameter to HsString >> > * HsStringType would define the format of the FastString stored in HsString: Normal => processed, Multiline => stores raw string, needs post-processing >> > * Post processing could occur in desugaring, with or without OverloadedStrings >> > * Pro: Shows the parsed multiline string before processing in -ddump-parsed >> > * Con: HsString containing Multiline strings would not contain the normalized representation mentioned in Note [Literal source text] >> > * Con: Breaking change in the GHC API >> > >> > (2.2) Post-process multiline strings in lexer >> > * Lexer would do all the post processing (for example, in conjunction with solution 1.4) and just return a normal HsString >> > * Pro: Multiline string is immediately desugared and behaves as expected for OverloadedStrings (and any other behaviors of string literals, existing or future) for free >> > * Pro: HsString would still always contain the normalized representation >> > * Con: No way of inspecting the raw multiline parse output before processing, e.g. via -ddump-parsed >> > >> > I'm leaning towards solution 2.1, but curious what people's thoughts are. >> > >> > ===== Closing remarks ===== >> > Again, sorry for the long email. My head is spinning trying to figure out this feature. Any help would be greatly appreciated. >> > >> > As an aside, I last worked on GHC back in 2020 or 2021, and my goodness. The Hadrian build is so much smoother (and faster!? Not sure if it's just my new laptop though) than what it was last time I touched the codebase. Huge thanks to the maintainers, both for the tooling and the docs in the wiki. This is a much more enjoyable experience. >> > >> > Thanks, >> > Brandon >> > >> > _______________________________________________ >> > ghc-devs mailing list >> > ghc-devs at haskell.org >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From sgraf1337 at gmail.com Thu Feb 8 16:03:13 2024 From: sgraf1337 at gmail.com (Sebastian Graf) Date: Thu, 8 Feb 2024 16:03:13 +0000 Subject: Help implementing Multiline String Literals In-Reply-To: References: Message-ID: That's a good point. You want distinguishable lexemes and then one CST constructor per string type. I.e., maintain the separation for longer, until desugaring. ________________________________ Von: Matthew Pickering Gesendet: Thursday, February 8, 2024 4:54:53 PM An: Brandon Chinn Cc: Sebastian Graf ; ghc-devs at haskell.org Betreff: Re: Help implementing Multiline String Literals I don't think that is the right way to go. They are different syntactic forms so they should be distinguished in the syntax tree. If I want to generate HsSyn directly, and print it out, how does the compiler know whether I meant to print a normal string literal or a multi line string literal? What about if the compiler tries to print out an expression containing a string literal in an error message, multi or normal? Matt On Thu, Feb 8, 2024 at 3:35 PM Brandon Chinn wrote: > > Thanks Sebastian and Matt! > > Matt - can you elaborate, I don't understand your comment. A multiline string is just syntax sugar for a normal string, so if the lexer does the post processing, it can be treated as a normal string the rest of the way. Why does anything else in the compiler need to know if the string was written as a multiline string? > > Or, to rephrase, a multiline string _should_ be semantically indistinguishable from a normal string with \n characters typed in. > > On Thu, Feb 8, 2024, 7:09 AM Matthew Pickering wrote: >> >> I would imagine you modify the lexer like you describe, but it's not >> clear to me you want to use the same constructor `HsString` to >> represent them all the way through the compiler. >> >> If you reuse HsString then how to you distinguish between a string >> which contains a newline and a multi-line string for example? It just >> seems simpler to me to explicitly represent a multi-line string.. >> perhaps `HsMultiLineString [String]` rather than trying to shoehorn >> them together and run into subtle bugs like this. >> >> Matt >> >> On Thu, Feb 8, 2024 at 2:45 PM Sebastian Graf wrote: >> > >> > Hi Brandon, >> > >> > I'm not following all of the details here, but from my naïve understanding, I would definitely tweak the lexer, do the post-processing and then have a canonical string representation rather than waiting until desugaring. >> > If you like 1.4 best, give it a try. You will seen soon enough if some performance regression test gets worse. It can't hurt to write a few yourself either. >> > I don't think that post-processing the strings would incur too much a hit compared to compiling those strings and serialise them into an executable. >> > I also bet that you can get rid some of the performance problems with list fusion. >> > >> > Cheers, >> > Sebastian >> > >> > ------ Originalnachricht ------ >> > Von: "Brandon Chinn" >> > An: ghc-devs at haskell.org >> > Gesendet: 04.02.2024 19:24:19 >> > Betreff: Help implementing Multiline String Literals >> > >> > Hello! >> > >> > I'm trying to implement #24390, which implements the multiline string literals proposal (existing work done in wip/multiline-strings). I originally suggested adding HsMultilineString to HsLit and translating it to HsString in renaming, then Matthew Pickering suggested I translate it in desugaring instead. I tried going down this approach, but I'm running into two main issues: Escaped characters and Overloaded strings. >> > >> > Apologies in advance for a long email. TL;DR - The best implementation I could think of involves a complete rewrite of how strings are lexed and modifying HsString instead of adding a new HsMultilineString constructor. If this is absolutely crazy talk, please dissuade me from this :) >> > >> > ===== Problem 1: Escaped characters ===== >> > Currently, Lexer.x resolves escaped characters for string literals. In the Note [Literal source text], we see that this is intentional; HsString should contain a normalized internal representation. However, multiline string literals have a post-processing step that requires distinguishing between the user typing a newline vs the user typing literally a backslash + an `N` (and other things like knowing if a user typed in `\&`, which currently goes away in lexing as well). >> > >> > Fundamentally, the current logic to resolve escaped characters is specific to the Lexer monad and operates on a per-character basis. But the multiline string literals proposal requires post-processing the whole string, then resolving escaped characters all at once. >> > >> > Possible solutions: >> > >> > (1.1) Duplicate the logic for resolving escaped characters >> > * Pro: Leaves normal string lexing untouched >> > * Con: Two sources of truth, possibly divergent behaviors between multiline and normal strings >> > >> > (1.2) Stick the post-processed string back into P, then rerun normal string lexing to resolve escaped characters >> > * Pro: Leaves normal string lexing untouched >> > * Con: Seems roundabout, inefficient, and hacky >> > >> > (1.3) Refactor the resolve-escaped-characters logic to work in both the P monad and as a pure function `String -> String` >> > * Pro: Reuses same escaped-characters logic for both normal + multiline strings >> > * Con: Different overall behavior between the two string types: Normal string still lexed per-character, Multiline strings would lex everything >> > * Con: Small refactor of lexing normal strings, which could introduce regressions >> > >> > (1.4) Read entire string (both normal + multiline) with no preprocessing (including string gaps or anything, except escaping quote delimiters), and define all post-processing steps as pure `String -> String` functions >> > * Pro: Gets out of monadic code quickly, turn bulk of string logic into pure code >> > * Pro: Processes normal + multiline strings exactly the same >> > * Pro: Opens the door for future string behaviors, e.g. raw string could do the same "read entire string" logic, and just not do any post-processing. >> > * Con: Could be less performant >> > * Con: Major refactor of lexing normal strings, which could introduce regressions >> > >> > I like solution 1.4 the best, as it generalizes string processing behavior the best and is more pipeline-style vs the currently more imperative style. But I recognize possible performance or behavior regressions are a real thing, so if anyone has any thoughts here, I'd love to hear them. >> > >> > ===== Problem 2: Overloaded strings ===== >> > Currently, `HsString s` is converted into `HsOverLit (HsIsString s)` in the renaming phase. Following Matthew's suggestion of resolving multiline string literals in the desugar step, this would mean that multiline string literals are post-processed after OverloadedStrings has already been applied. >> > >> > I don't like any of the solutions this approach brings up: >> > * Do post processing both when Desugaring HsMultilineString AND when Renaming HsMultilineString to HsOverLit - seems wrong to process multiline strings in two different phases >> > * Add HsIsStringMultiline and post process when desugaring both HsMultilineString and HsIsStringMultiline - would ideally like to avoid adding a variant of HsIsStringMultiline >> > >> > Instead, I propose we throw away the HsMultilineString idea and reuse HsString. The multiline syntax would still be preserved in the SourceText, and this also leaves the door open for future string features. For example, if we went with HsMultilineString, then adding raw strings would require adding both HsRawString and HsMultilineRawString. >> > >> > Here are two possible solutions for reusing HsString: >> > >> > (2.1) Add a HsStringType parameter to HsString >> > * HsStringType would define the format of the FastString stored in HsString: Normal => processed, Multiline => stores raw string, needs post-processing >> > * Post processing could occur in desugaring, with or without OverloadedStrings >> > * Pro: Shows the parsed multiline string before processing in -ddump-parsed >> > * Con: HsString containing Multiline strings would not contain the normalized representation mentioned in Note [Literal source text] >> > * Con: Breaking change in the GHC API >> > >> > (2.2) Post-process multiline strings in lexer >> > * Lexer would do all the post processing (for example, in conjunction with solution 1.4) and just return a normal HsString >> > * Pro: Multiline string is immediately desugared and behaves as expected for OverloadedStrings (and any other behaviors of string literals, existing or future) for free >> > * Pro: HsString would still always contain the normalized representation >> > * Con: No way of inspecting the raw multiline parse output before processing, e.g. via -ddump-parsed >> > >> > I'm leaning towards solution 2.1, but curious what people's thoughts are. >> > >> > ===== Closing remarks ===== >> > Again, sorry for the long email. My head is spinning trying to figure out this feature. Any help would be greatly appreciated. >> > >> > As an aside, I last worked on GHC back in 2020 or 2021, and my goodness. The Hadrian build is so much smoother (and faster!? Not sure if it's just my new laptop though) than what it was last time I touched the codebase. Huge thanks to the maintainers, both for the tooling and the docs in the wiki. This is a much more enjoyable experience. >> > >> > Thanks, >> > Brandon >> > >> > _______________________________________________ >> > ghc-devs mailing list >> > ghc-devs at haskell.org >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewtpickering at gmail.com Thu Feb 8 16:10:52 2024 From: matthewtpickering at gmail.com (Matthew Pickering) Date: Thu, 8 Feb 2024 16:10:52 +0000 Subject: Help implementing Multiline String Literals In-Reply-To: References: Message-ID: 1. Push a branch to the haddock repo https://gitlab.haskell.org/ghc/haddock 2. Update the submodule on your tree to point to this branch < work on your patch > When it is time to merge. 3. Rebase your haddock branch onto ghc-head 4. Modify your GHC patch to point to this branch in the haddock submodule 5. Make sure CI passes, when it does, assign to marge both AND fast-forward merge your haddock patch into ghc-head (it is important this is atomic) If you do things this way you don't need to rebase your patch after merging your changes into the haddock ghc-head branch. On Thu, Feb 8, 2024 at 4:04 PM Brandon Chinn wrote: > > hm okay, thanks. I'll try combining both approaches: modify the lexer (to handle escape characters correctly), but also put multiline strings in a separate HsLit constructor. > > Followup question: when I did this before, I added ITmultilinestring to the lexer, but then I had to add it to haddock-api as well. How do I merge the change in both the ghc repo and the haddock repo and change them both at the same time? I don't see how I can make either change first without a compiler error. > > On Thu, Feb 8, 2024, 7:55 AM Matthew Pickering wrote: >> >> I don't think that is the right way to go. >> >> They are different syntactic forms so they should be distinguished in >> the syntax tree. >> >> If I want to generate HsSyn directly, and print it out, how does the >> compiler know whether I meant to print a normal string literal or a >> multi line string literal? What about if the compiler tries to print >> out an expression containing a string literal in an error message, >> multi or normal? >> >> Matt >> >> On Thu, Feb 8, 2024 at 3:35 PM Brandon Chinn wrote: >> > >> > Thanks Sebastian and Matt! >> > >> > Matt - can you elaborate, I don't understand your comment. A multiline string is just syntax sugar for a normal string, so if the lexer does the post processing, it can be treated as a normal string the rest of the way. Why does anything else in the compiler need to know if the string was written as a multiline string? >> > >> > Or, to rephrase, a multiline string _should_ be semantically indistinguishable from a normal string with \n characters typed in. >> > >> > On Thu, Feb 8, 2024, 7:09 AM Matthew Pickering wrote: >> >> >> >> I would imagine you modify the lexer like you describe, but it's not >> >> clear to me you want to use the same constructor `HsString` to >> >> represent them all the way through the compiler. >> >> >> >> If you reuse HsString then how to you distinguish between a string >> >> which contains a newline and a multi-line string for example? It just >> >> seems simpler to me to explicitly represent a multi-line string.. >> >> perhaps `HsMultiLineString [String]` rather than trying to shoehorn >> >> them together and run into subtle bugs like this. >> >> >> >> Matt >> >> >> >> On Thu, Feb 8, 2024 at 2:45 PM Sebastian Graf wrote: >> >> > >> >> > Hi Brandon, >> >> > >> >> > I'm not following all of the details here, but from my naïve understanding, I would definitely tweak the lexer, do the post-processing and then have a canonical string representation rather than waiting until desugaring. >> >> > If you like 1.4 best, give it a try. You will seen soon enough if some performance regression test gets worse. It can't hurt to write a few yourself either. >> >> > I don't think that post-processing the strings would incur too much a hit compared to compiling those strings and serialise them into an executable. >> >> > I also bet that you can get rid some of the performance problems with list fusion. >> >> > >> >> > Cheers, >> >> > Sebastian >> >> > >> >> > ------ Originalnachricht ------ >> >> > Von: "Brandon Chinn" >> >> > An: ghc-devs at haskell.org >> >> > Gesendet: 04.02.2024 19:24:19 >> >> > Betreff: Help implementing Multiline String Literals >> >> > >> >> > Hello! >> >> > >> >> > I'm trying to implement #24390, which implements the multiline string literals proposal (existing work done in wip/multiline-strings). I originally suggested adding HsMultilineString to HsLit and translating it to HsString in renaming, then Matthew Pickering suggested I translate it in desugaring instead. I tried going down this approach, but I'm running into two main issues: Escaped characters and Overloaded strings. >> >> > >> >> > Apologies in advance for a long email. TL;DR - The best implementation I could think of involves a complete rewrite of how strings are lexed and modifying HsString instead of adding a new HsMultilineString constructor. If this is absolutely crazy talk, please dissuade me from this :) >> >> > >> >> > ===== Problem 1: Escaped characters ===== >> >> > Currently, Lexer.x resolves escaped characters for string literals. In the Note [Literal source text], we see that this is intentional; HsString should contain a normalized internal representation. However, multiline string literals have a post-processing step that requires distinguishing between the user typing a newline vs the user typing literally a backslash + an `N` (and other things like knowing if a user typed in `\&`, which currently goes away in lexing as well). >> >> > >> >> > Fundamentally, the current logic to resolve escaped characters is specific to the Lexer monad and operates on a per-character basis. But the multiline string literals proposal requires post-processing the whole string, then resolving escaped characters all at once. >> >> > >> >> > Possible solutions: >> >> > >> >> > (1.1) Duplicate the logic for resolving escaped characters >> >> > * Pro: Leaves normal string lexing untouched >> >> > * Con: Two sources of truth, possibly divergent behaviors between multiline and normal strings >> >> > >> >> > (1.2) Stick the post-processed string back into P, then rerun normal string lexing to resolve escaped characters >> >> > * Pro: Leaves normal string lexing untouched >> >> > * Con: Seems roundabout, inefficient, and hacky >> >> > >> >> > (1.3) Refactor the resolve-escaped-characters logic to work in both the P monad and as a pure function `String -> String` >> >> > * Pro: Reuses same escaped-characters logic for both normal + multiline strings >> >> > * Con: Different overall behavior between the two string types: Normal string still lexed per-character, Multiline strings would lex everything >> >> > * Con: Small refactor of lexing normal strings, which could introduce regressions >> >> > >> >> > (1.4) Read entire string (both normal + multiline) with no preprocessing (including string gaps or anything, except escaping quote delimiters), and define all post-processing steps as pure `String -> String` functions >> >> > * Pro: Gets out of monadic code quickly, turn bulk of string logic into pure code >> >> > * Pro: Processes normal + multiline strings exactly the same >> >> > * Pro: Opens the door for future string behaviors, e.g. raw string could do the same "read entire string" logic, and just not do any post-processing. >> >> > * Con: Could be less performant >> >> > * Con: Major refactor of lexing normal strings, which could introduce regressions >> >> > >> >> > I like solution 1.4 the best, as it generalizes string processing behavior the best and is more pipeline-style vs the currently more imperative style. But I recognize possible performance or behavior regressions are a real thing, so if anyone has any thoughts here, I'd love to hear them. >> >> > >> >> > ===== Problem 2: Overloaded strings ===== >> >> > Currently, `HsString s` is converted into `HsOverLit (HsIsString s)` in the renaming phase. Following Matthew's suggestion of resolving multiline string literals in the desugar step, this would mean that multiline string literals are post-processed after OverloadedStrings has already been applied. >> >> > >> >> > I don't like any of the solutions this approach brings up: >> >> > * Do post processing both when Desugaring HsMultilineString AND when Renaming HsMultilineString to HsOverLit - seems wrong to process multiline strings in two different phases >> >> > * Add HsIsStringMultiline and post process when desugaring both HsMultilineString and HsIsStringMultiline - would ideally like to avoid adding a variant of HsIsStringMultiline >> >> > >> >> > Instead, I propose we throw away the HsMultilineString idea and reuse HsString. The multiline syntax would still be preserved in the SourceText, and this also leaves the door open for future string features. For example, if we went with HsMultilineString, then adding raw strings would require adding both HsRawString and HsMultilineRawString. >> >> > >> >> > Here are two possible solutions for reusing HsString: >> >> > >> >> > (2.1) Add a HsStringType parameter to HsString >> >> > * HsStringType would define the format of the FastString stored in HsString: Normal => processed, Multiline => stores raw string, needs post-processing >> >> > * Post processing could occur in desugaring, with or without OverloadedStrings >> >> > * Pro: Shows the parsed multiline string before processing in -ddump-parsed >> >> > * Con: HsString containing Multiline strings would not contain the normalized representation mentioned in Note [Literal source text] >> >> > * Con: Breaking change in the GHC API >> >> > >> >> > (2.2) Post-process multiline strings in lexer >> >> > * Lexer would do all the post processing (for example, in conjunction with solution 1.4) and just return a normal HsString >> >> > * Pro: Multiline string is immediately desugared and behaves as expected for OverloadedStrings (and any other behaviors of string literals, existing or future) for free >> >> > * Pro: HsString would still always contain the normalized representation >> >> > * Con: No way of inspecting the raw multiline parse output before processing, e.g. via -ddump-parsed >> >> > >> >> > I'm leaning towards solution 2.1, but curious what people's thoughts are. >> >> > >> >> > ===== Closing remarks ===== >> >> > Again, sorry for the long email. My head is spinning trying to figure out this feature. Any help would be greatly appreciated. >> >> > >> >> > As an aside, I last worked on GHC back in 2020 or 2021, and my goodness. The Hadrian build is so much smoother (and faster!? Not sure if it's just my new laptop though) than what it was last time I touched the codebase. Huge thanks to the maintainers, both for the tooling and the docs in the wiki. This is a much more enjoyable experience. >> >> > >> >> > Thanks, >> >> > Brandon >> >> > >> >> > _______________________________________________ >> >> > ghc-devs mailing list >> >> > ghc-devs at haskell.org >> >> > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From simon.peytonjones at gmail.com Thu Feb 8 16:18:06 2024 From: simon.peytonjones at gmail.com (Simon Peyton Jones) Date: Thu, 8 Feb 2024 16:18:06 +0000 Subject: HEAD breakage Message-ID: I have rebased a patch onto HEAD, and I'm getting this Reading interface for ghc-internal:GHC.Exception.Type; reason: Need decl for SomeException ...not found updating EPS /home/simonpj/code/HEAD-12/_build/stage1/inplace/../libraries/ghc-internal/build/GHC/Err.hi Declaration for errorWithoutStackTrace Unfolding of errorWithoutStackTrace: SomeException ErrorWithoutFlag Failed to load interface for ‘GHC.Exception.Type’. There are files missing in the ‘ghc-internal-0.1.0.0’ package, try running 'ghc-pkg check'. Use -v to see a list of the files searched for. forkM failed: Unfolding of errorWithoutStackTrace IOEnv failure } ending fork (badly) Unfolding of errorWithoutStackTrace Cannot continue after interface file error Does anyone have any idea what is going on? I have made no changes to GHC.Err or exceptions etc. Thanks Simon -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewtpickering at gmail.com Thu Feb 8 16:22:15 2024 From: matthewtpickering at gmail.com (Matthew Pickering) Date: Thu, 8 Feb 2024 16:22:15 +0000 Subject: HEAD breakage In-Reply-To: References: Message-ID: Is this from a clean tree? On Thu, Feb 8, 2024 at 4:18 PM Simon Peyton Jones wrote: > > I have rebased a patch onto HEAD, and I'm getting this > > Reading interface for ghc-internal:GHC.Exception.Type; > reason: Need decl for SomeException > ...not found > updating EPS > /home/simonpj/code/HEAD-12/_build/stage1/inplace/../libraries/ghc-internal/build/GHC/Err.hi > Declaration for errorWithoutStackTrace > Unfolding of errorWithoutStackTrace: > SomeException ErrorWithoutFlag > Failed to load interface for ‘GHC.Exception.Type’. > There are files missing in the ‘ghc-internal-0.1.0.0’ package, > try running 'ghc-pkg check'. > Use -v to see a list of the files searched for. > forkM failed: Unfolding of errorWithoutStackTrace IOEnv failure > } ending fork (badly) Unfolding of errorWithoutStackTrace > Cannot continue after interface file error > > Does anyone have any idea what is going on? I have made no changes to GHC.Err or exceptions etc. > > > Thanks > > Simon > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From mpg at mpg.is Fri Feb 9 02:35:02 2024 From: mpg at mpg.is (=?UTF-8?Q?Matth=C3=ADas_P=C3=A1ll_Gissurarson?=) Date: Fri, 9 Feb 2024 03:35:02 +0100 Subject: Trouble with package imports Message-ID: Hey all! I’m in the process of updating PropR from 8.10 to 9.8, which uses the GHC API extensively. Before, I could simply add a package to DynFlags, then setSessionDynFlags and the packages would be recognized and imported, provided I had `cabal install --lib ` before. However, after the update, it doesn’t seem to find packages, even when I manually point the package database to one with the package in question! Is there some additional loading that must be done? The package is picked up in GHCi and normal compilation, but not from within `runGhc` in the API. I’m using GHC via ghcup if that’s relevant. Thanks for any pointers you can provide! Best regards, Matthías Páll Gissurarson. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgbailey at gmail.com Thu Feb 15 19:07:13 2024 From: jgbailey at gmail.com (Justin Bailey) Date: Thu, 15 Feb 2024 11:07:13 -0800 Subject: Diagnosing excessive memory usage / crash when compiling - 9.8.1 Message-ID: Hi! I'm trying to upgrade our (large) codebase to use 9.8.1. (I'm on an M2). When building with -01, memory on the GHC process climbs until it reaches the limit of my machine (64G) and then crashes with a segfault. With -00, that does not happen. How would I go about diagnosing what's happening? Using RTS flags to limit the heap to 32G produced the same behavior, just faster. Strangely, `-v5` does not produce any more output in the console (passed via cabal's --ghc-options). Maybe I'm doing it wrong? Pointers to existing issues or documentation welcome! Thank you! Justin From teofilcamarasu at gmail.com Thu Feb 15 19:36:10 2024 From: teofilcamarasu at gmail.com (Teofil Camarasu) Date: Thu, 15 Feb 2024 19:36:10 +0000 Subject: Diagnosing excessive memory usage / crash when compiling - 9.8.1 In-Reply-To: References: Message-ID: Hi Justin, >From your description, it sounds to me like there's something in your source code that's causing the optimiser to generate too much code, which then causes the crash because of memory exhaustion (though I might be wrong about this). In the past, when I've run into similar things. I've followed the following vague process to help find a minimal reproducer of the issue. - pass `-ddump-simpl -ddump-timings -ddump-to-file` to GHC. (See here for docs on these flags: https://downloads.haskell.org/ghc/latest/docs/users_guide/debugging.html) These will write some extra debugging information to either your `dist-newstyle` or `.stack-work` directory depending on whether you use cabal or stack. They will create for each source file a `.dump-simpl` file that will give you the compiler's intermediate output. And a `.dump-timings` file that will show timings information about how long each phase of compilation took. - The first step is to hone down on the problematic module or modules. Maybe you already have a good idea from where in your build the compiler crashes. But if not, you can use the `.dump-timings` files and/or a tool that summarises them like https://github.com/codedownio/time-ghc-modules. To get a sense of where the problem lies. - Once you've found your worst module, the next step is to determine what about that module is causing the issue. I find that often you can just try to find what top level identifiers in your `.dump-simpl` file are big. This will give a good idea of which part of your source code is to blame. Then I tend to try to delete everything that is irrelevant, and check again. Incrementally you get something that is smaller and smaller, and in time you tend to end up with something that is small enough to write up as a ticket. I hope that helps. I've found this process to work quite well for hunting down issues where GHC's optimiser goes wrong, but it is a bit of a labour intensive process. One last thing. You mention that you are on M2. If it's easily doable for you, try to reproduce on x86_64 just to make sure it's not some bug specific to M2. Cheers, Teo On Thu, Feb 15, 2024 at 7:08 PM Justin Bailey wrote: > Hi! > > I'm trying to upgrade our (large) codebase to use 9.8.1. (I'm on an M2). > > When building with -01, memory on the GHC process climbs until it > reaches the limit of my machine (64G) and then crashes with a > segfault. > > With -00, that does not happen. > > How would I go about diagnosing what's happening? Using RTS flags to > limit the heap to 32G produced the same behavior, just faster. > > Strangely, `-v5` does not produce any more output in the console > (passed via cabal's --ghc-options). Maybe I'm doing it wrong? > > Pointers to existing issues or documentation welcome! Thank you! > > Justin > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgbailey at gmail.com Thu Feb 15 21:31:13 2024 From: jgbailey at gmail.com (Justin Bailey) Date: Thu, 15 Feb 2024 13:31:13 -0800 Subject: Diagnosing excessive memory usage / crash when compiling - 9.8.1 In-Reply-To: References: Message-ID: I did notice this in CI (which are linux machines running x86_64) so at least it is not limited to M2. Great tips! Much appreciated! On Thu, Feb 15, 2024 at 11:36 AM Teofil Camarasu wrote: > > Hi Justin, > > From your description, it sounds to me like there's something in your source code that's causing the optimiser to generate too much code, which then causes the crash because of memory exhaustion (though I might be wrong about this). > In the past, when I've run into similar things. I've followed the following vague process to help find a minimal reproducer of the issue. > > - pass `-ddump-simpl -ddump-timings -ddump-to-file` to GHC. (See here for docs on these flags: https://downloads.haskell.org/ghc/latest/docs/users_guide/debugging.html) > These will write some extra debugging information to either your `dist-newstyle` or `.stack-work` directory depending on whether you use cabal or stack. > They will create for each source file a `.dump-simpl` file that will give you the compiler's intermediate output. And a `.dump-timings` file that will show timings information about how long each phase of compilation took. > > - The first step is to hone down on the problematic module or modules. Maybe you already have a good idea from where in your build the compiler crashes. > But if not, you can use the `.dump-timings` files and/or a tool that summarises them like https://github.com/codedownio/time-ghc-modules. To get a sense of where the problem lies. > > - Once you've found your worst module, the next step is to determine what about that module is causing the issue. > I find that often you can just try to find what top level identifiers in your `.dump-simpl` file are big. This will give a good idea of which part of your source code is to blame. > Then I tend to try to delete everything that is irrelevant, and check again. Incrementally you get something that is smaller and smaller, and in time you tend to end up with something that is small enough to write up as a ticket. > > I hope that helps. I've found this process to work quite well for hunting down issues where GHC's optimiser goes wrong, but it is a bit of a labour intensive process. > > One last thing. You mention that you are on M2. If it's easily doable for you, try to reproduce on x86_64 just to make sure it's not some bug specific to M2. > > Cheers, > Teo > > On Thu, Feb 15, 2024 at 7:08 PM Justin Bailey wrote: >> >> Hi! >> >> I'm trying to upgrade our (large) codebase to use 9.8.1. (I'm on an M2). >> >> When building with -01, memory on the GHC process climbs until it >> reaches the limit of my machine (64G) and then crashes with a >> segfault. >> >> With -00, that does not happen. >> >> How would I go about diagnosing what's happening? Using RTS flags to >> limit the heap to 32G produced the same behavior, just faster. >> >> Strangely, `-v5` does not produce any more output in the console >> (passed via cabal's --ghc-options). Maybe I'm doing it wrong? >> >> Pointers to existing issues or documentation welcome! Thank you! >> >> Justin >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From simon.peytonjones at gmail.com Thu Feb 15 23:56:05 2024 From: simon.peytonjones at gmail.com (Simon Peyton Jones) Date: Thu, 15 Feb 2024 23:56:05 +0000 Subject: Diagnosing excessive memory usage / crash when compiling - 9.8.1 In-Reply-To: References: Message-ID: Using `-dshow-passes` is very helpful too. It shows the program size after each pass of the compiler. Simon On Thu, 15 Feb 2024 at 19:36, Teofil Camarasu wrote: > Hi Justin, > > From your description, it sounds to me like there's something in your > source code that's causing the optimiser to generate too much code, which > then causes the crash because of memory exhaustion (though I might be wrong > about this). > In the past, when I've run into similar things. I've followed the > following vague process to help find a minimal reproducer of the issue. > > - pass `-ddump-simpl -ddump-timings -ddump-to-file` to GHC. (See here for > docs on these flags: > https://downloads.haskell.org/ghc/latest/docs/users_guide/debugging.html) > These will write some extra debugging information to either your > `dist-newstyle` or `.stack-work` directory depending on whether you use > cabal or stack. > They will create for each source file a `.dump-simpl` file that will give > you the compiler's intermediate output. And a `.dump-timings` file that > will show timings information about how long each phase of compilation took. > > - The first step is to hone down on the problematic module or modules. > Maybe you already have a good idea from where in your build the compiler > crashes. > But if not, you can use the `.dump-timings` files and/or a tool that > summarises them like https://github.com/codedownio/time-ghc-modules. To > get a sense of where the problem lies. > > - Once you've found your worst module, the next step is to determine what > about that module is causing the issue. > I find that often you can just try to find what top level identifiers in > your `.dump-simpl` file are big. This will give a good idea of which part > of your source code is to blame. > Then I tend to try to delete everything that is irrelevant, and check > again. Incrementally you get something that is smaller and smaller, and in > time you tend to end up with something that is small enough to write up as > a ticket. > > I hope that helps. I've found this process to work quite well for hunting > down issues where GHC's optimiser goes wrong, but it is a bit of a labour > intensive process. > > One last thing. You mention that you are on M2. If it's easily doable for > you, try to reproduce on x86_64 just to make sure it's not some bug > specific to M2. > > Cheers, > Teo > > On Thu, Feb 15, 2024 at 7:08 PM Justin Bailey wrote: > >> Hi! >> >> I'm trying to upgrade our (large) codebase to use 9.8.1. (I'm on an M2). >> >> When building with -01, memory on the GHC process climbs until it >> reaches the limit of my machine (64G) and then crashes with a >> segfault. >> >> With -00, that does not happen. >> >> How would I go about diagnosing what's happening? Using RTS flags to >> limit the heap to 32G produced the same behavior, just faster. >> >> Strangely, `-v5` does not produce any more output in the console >> (passed via cabal's --ghc-options). Maybe I'm doing it wrong? >> >> Pointers to existing issues or documentation welcome! Thank you! >> >> Justin >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jgbailey at gmail.com Fri Feb 16 01:35:44 2024 From: jgbailey at gmail.com (Justin Bailey) Date: Thu, 15 Feb 2024 17:35:44 -0800 Subject: Diagnosing excessive memory usage / crash when compiling - 9.8.1 In-Reply-To: References: Message-ID: Well, after running with these flags, one of the `.dump-simpl` files is 26 GB! That's also the module it seems to hang on, so pretty sure something is going wrong there! I was seeing output indicating GHC had allocated 146GB during some of the passes ??? ``` *** Simplifier [xxx.AirTrafficControl.Types.ATCMessage]: Result size of Simplifier iteration=1 = {terms: 9,134, types: 49,937, coercions: 388,802,399, joins: 53/289} Result size of Simplifier iteration=2 = {terms: 8,368, types: 46,864, coercions: 176,356,474, joins: 25/200} Result size of Simplifier = {terms: 8,363, types: 46,848, coercions: 176,356,474, joins: 25/200} !!! Simplifier [xxx.AirTrafficControl.Types.ATCMessage]: finished in 294595.62 milliseconds, allocated 146497.087 megabytes ``` So anyways I'll continue whittling this down. This module does use a lot of higher-kinded types and fancy stuff. On Thu, Feb 15, 2024 at 3:56 PM Simon Peyton Jones wrote: > > Using `-dshow-passes` is very helpful too. It shows the program size after each pass of the compiler. > > Simon > > On Thu, 15 Feb 2024 at 19:36, Teofil Camarasu wrote: >> >> Hi Justin, >> >> From your description, it sounds to me like there's something in your source code that's causing the optimiser to generate too much code, which then causes the crash because of memory exhaustion (though I might be wrong about this). >> In the past, when I've run into similar things. I've followed the following vague process to help find a minimal reproducer of the issue. >> >> - pass `-ddump-simpl -ddump-timings -ddump-to-file` to GHC. (See here for docs on these flags: https://downloads.haskell.org/ghc/latest/docs/users_guide/debugging.html) >> These will write some extra debugging information to either your `dist-newstyle` or `.stack-work` directory depending on whether you use cabal or stack. >> They will create for each source file a `.dump-simpl` file that will give you the compiler's intermediate output. And a `.dump-timings` file that will show timings information about how long each phase of compilation took. >> >> - The first step is to hone down on the problematic module or modules. Maybe you already have a good idea from where in your build the compiler crashes. >> But if not, you can use the `.dump-timings` files and/or a tool that summarises them like https://github.com/codedownio/time-ghc-modules. To get a sense of where the problem lies. >> >> - Once you've found your worst module, the next step is to determine what about that module is causing the issue. >> I find that often you can just try to find what top level identifiers in your `.dump-simpl` file are big. This will give a good idea of which part of your source code is to blame. >> Then I tend to try to delete everything that is irrelevant, and check again. Incrementally you get something that is smaller and smaller, and in time you tend to end up with something that is small enough to write up as a ticket. >> >> I hope that helps. I've found this process to work quite well for hunting down issues where GHC's optimiser goes wrong, but it is a bit of a labour intensive process. >> >> One last thing. You mention that you are on M2. If it's easily doable for you, try to reproduce on x86_64 just to make sure it's not some bug specific to M2. >> >> Cheers, >> Teo >> >> On Thu, Feb 15, 2024 at 7:08 PM Justin Bailey wrote: >>> >>> Hi! >>> >>> I'm trying to upgrade our (large) codebase to use 9.8.1. (I'm on an M2). >>> >>> When building with -01, memory on the GHC process climbs until it >>> reaches the limit of my machine (64G) and then crashes with a >>> segfault. >>> >>> With -00, that does not happen. >>> >>> How would I go about diagnosing what's happening? Using RTS flags to >>> limit the heap to 32G produced the same behavior, just faster. >>> >>> Strangely, `-v5` does not produce any more output in the console >>> (passed via cabal's --ghc-options). Maybe I'm doing it wrong? >>> >>> Pointers to existing issues or documentation welcome! Thank you! >>> >>> Justin >>> _______________________________________________ >>> ghc-devs mailing list >>> ghc-devs at haskell.org >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From ben at smart-cactus.org Fri Feb 16 02:23:02 2024 From: ben at smart-cactus.org (Ben Gamari) Date: Thu, 15 Feb 2024 21:23:02 -0500 Subject: Diagnosing excessive memory usage / crash when compiling - 9.8.1 In-Reply-To: References: Message-ID: <87jzn5du64.fsf@smart-cactus.org> Justin Bailey writes: > Well, after running with these flags, one of the `.dump-simpl` files > is 26 GB! That's also the module it seems to hang on, so pretty sure > something is going wrong there! > > I was seeing output indicating GHC had allocated 146GB during some of > the passes ??? > The high coercion sizes here suggest that this is some variant of #8095. Having another minimal reproducer is always useful. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: From simon.peytonjones at gmail.com Fri Feb 16 08:47:55 2024 From: simon.peytonjones at gmail.com (Simon Peyton Jones) Date: Fri, 16 Feb 2024 08:47:55 +0000 Subject: Diagnosing excessive memory usage / crash when compiling - 9.8.1 In-Reply-To: References: Message-ID: Sorry about that! Maybe you have a giant data type with deriving(Generic)? GHC tends to behave badly on those. And yes, you seem to have a lot of type-family stuff going on! Usually we see 10k coercion sizes; you have 400k. Quite a lot of improvements have happened in this area, which may (or may not) help. Once you have whittled a bit, perhaps it'd be possible to test with HEAD? This was better with ... 9.6? 9.4? Simon On Fri, 16 Feb 2024 at 01:36, Justin Bailey wrote: > Well, after running with these flags, one of the `.dump-simpl` files > is 26 GB! That's also the module it seems to hang on, so pretty sure > something is going wrong there! > > I was seeing output indicating GHC had allocated 146GB during some of > the passes ??? > > ``` > > *** Simplifier [xxx.AirTrafficControl.Types.ATCMessage]: > Result size of Simplifier iteration=1 > = {terms: 9,134, > types: 49,937, > coercions: 388,802,399, > joins: 53/289} > Result size of Simplifier iteration=2 > = {terms: 8,368, > types: 46,864, > coercions: 176,356,474, > joins: 25/200} > Result size of Simplifier > = {terms: 8,363, > types: 46,848, > coercions: 176,356,474, > joins: 25/200} > !!! Simplifier [xxx.AirTrafficControl.Types.ATCMessage]: finished in > 294595.62 milliseconds, allocated 146497.087 megabytes > ``` > > So anyways I'll continue whittling this down. This module does use a > lot of higher-kinded types and fancy stuff. > > On Thu, Feb 15, 2024 at 3:56 PM Simon Peyton Jones > wrote: > > > > Using `-dshow-passes` is very helpful too. It shows the program size > after each pass of the compiler. > > > > Simon > > > > On Thu, 15 Feb 2024 at 19:36, Teofil Camarasu > wrote: > >> > >> Hi Justin, > >> > >> From your description, it sounds to me like there's something in your > source code that's causing the optimiser to generate too much code, which > then causes the crash because of memory exhaustion (though I might be wrong > about this). > >> In the past, when I've run into similar things. I've followed the > following vague process to help find a minimal reproducer of the issue. > >> > >> - pass `-ddump-simpl -ddump-timings -ddump-to-file` to GHC. (See here > for docs on these flags: > https://downloads.haskell.org/ghc/latest/docs/users_guide/debugging.html) > >> These will write some extra debugging information to either your > `dist-newstyle` or `.stack-work` directory depending on whether you use > cabal or stack. > >> They will create for each source file a `.dump-simpl` file that will > give you the compiler's intermediate output. And a `.dump-timings` file > that will show timings information about how long each phase of compilation > took. > >> > >> - The first step is to hone down on the problematic module or modules. > Maybe you already have a good idea from where in your build the compiler > crashes. > >> But if not, you can use the `.dump-timings` files and/or a tool that > summarises them like https://github.com/codedownio/time-ghc-modules. To > get a sense of where the problem lies. > >> > >> - Once you've found your worst module, the next step is to determine > what about that module is causing the issue. > >> I find that often you can just try to find what top level identifiers > in your `.dump-simpl` file are big. This will give a good idea of which > part of your source code is to blame. > >> Then I tend to try to delete everything that is irrelevant, and check > again. Incrementally you get something that is smaller and smaller, and in > time you tend to end up with something that is small enough to write up as > a ticket. > >> > >> I hope that helps. I've found this process to work quite well for > hunting down issues where GHC's optimiser goes wrong, but it is a bit of a > labour intensive process. > >> > >> One last thing. You mention that you are on M2. If it's easily doable > for you, try to reproduce on x86_64 just to make sure it's not some bug > specific to M2. > >> > >> Cheers, > >> Teo > >> > >> On Thu, Feb 15, 2024 at 7:08 PM Justin Bailey > wrote: > >>> > >>> Hi! > >>> > >>> I'm trying to upgrade our (large) codebase to use 9.8.1. (I'm on an > M2). > >>> > >>> When building with -01, memory on the GHC process climbs until it > >>> reaches the limit of my machine (64G) and then crashes with a > >>> segfault. > >>> > >>> With -00, that does not happen. > >>> > >>> How would I go about diagnosing what's happening? Using RTS flags to > >>> limit the heap to 32G produced the same behavior, just faster. > >>> > >>> Strangely, `-v5` does not produce any more output in the console > >>> (passed via cabal's --ghc-options). Maybe I'm doing it wrong? > >>> > >>> Pointers to existing issues or documentation welcome! Thank you! > >>> > >>> Justin > >>> _______________________________________________ > >>> ghc-devs mailing list > >>> ghc-devs at haskell.org > >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> > >> _______________________________________________ > >> ghc-devs mailing list > >> ghc-devs at haskell.org > >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nr at cs.tufts.edu Sun Feb 18 19:06:57 2024 From: nr at cs.tufts.edu (Norman Ramsey) Date: Sun, 18 Feb 2024 14:06:57 -0500 Subject: git rerere considered helpful Message-ID: <20240218190657.8061F2C5672@homedog.cs.tufts.edu> Today I learned that git has a feature that will replay resolutions of conflicts you've already resolved in a previous merge or rebase. When I was working on GHC and frequently rebasing on master, I would have killed for this feature. https://git-scm.com/book/en/v2/Git-Tools-Rerere Norman From evan_greenup at protonmail.com Mon Feb 19 04:43:40 2024 From: evan_greenup at protonmail.com (Evan Greenup) Date: Mon, 19 Feb 2024 04:43:40 +0000 Subject: Who is in charge for approve the gitlab.haskell.org registration request? Message-ID: I have successfully signed up for a account for code contribution and bug report. But when I try to login with this account, it said that my account need to be approved by administrator of this GitLab instance. How can I get in touch with this administrator. And what procedures do I need to take to get my account be approved? From matthewtpickering at gmail.com Mon Feb 19 09:23:27 2024 From: matthewtpickering at gmail.com (Matthew Pickering) Date: Mon, 19 Feb 2024 09:23:27 +0000 Subject: Who is in charge for approve the gitlab.haskell.org registration request? In-Reply-To: References: Message-ID: I have approved your account now. On Mon, Feb 19, 2024 at 4:44 AM Evan Greenup via ghc-devs wrote: > > I have successfully signed up for a account for code contribution and bug report. But when I try to login with this account, it said that my account need to be approved by administrator of this GitLab instance. > > How can I get in touch with this administrator. And what procedures do I need to take to get my account be approved? > > > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From jgbailey at gmail.com Wed Feb 21 21:59:21 2024 From: jgbailey at gmail.com (Justin Bailey) Date: Wed, 21 Feb 2024 13:59:21 -0800 Subject: Diagnosing excessive memory usage / crash when compiling - 9.8.1 In-Reply-To: References: Message-ID: I've narrowed this down to a pretty small example, and can show that, as the number of fields in my data type increases, compilation takes longer and longer (seems exponential). For example, on my M2, with GHC 9.8, I get these timings & peak memory usage: * 1 field - 2.5s, 198MB peak * 2 fields - 7s, 1.0GB peak * 3 fields - 26.8s, 4.6GB peak * 4 fields - 82.9s, 14.5GB peak For GHC 9.6, those stay pretty much flat up to 10 fields. I didn't test past 4 with GHC 9.8. The project does use `UndecidableInstances` which worries me. In any case I reported a bug at https://gitlab.haskell.org/ghc/ghc/-/issues/24462. Thanks for help narrowing the problem! On Fri, Feb 16, 2024 at 12:48 AM Simon Peyton Jones wrote: > > Sorry about that! > > Maybe you have a giant data type with deriving(Generic)? GHC tends to behave badly on those. And yes, you seem to have a lot of type-family stuff going on! Usually we see 10k coercion sizes; you have 400k. > > Quite a lot of improvements have happened in this area, which may (or may not) help. Once you have whittled a bit, perhaps it'd be possible to test with HEAD? > > This was better with ... 9.6? 9.4? > > Simon > > On Fri, 16 Feb 2024 at 01:36, Justin Bailey wrote: >> >> Well, after running with these flags, one of the `.dump-simpl` files >> is 26 GB! That's also the module it seems to hang on, so pretty sure >> something is going wrong there! >> >> I was seeing output indicating GHC had allocated 146GB during some of >> the passes ??? >> >> ``` >> >> *** Simplifier [xxx.AirTrafficControl.Types.ATCMessage]: >> Result size of Simplifier iteration=1 >> = {terms: 9,134, >> types: 49,937, >> coercions: 388,802,399, >> joins: 53/289} >> Result size of Simplifier iteration=2 >> = {terms: 8,368, >> types: 46,864, >> coercions: 176,356,474, >> joins: 25/200} >> Result size of Simplifier >> = {terms: 8,363, >> types: 46,848, >> coercions: 176,356,474, >> joins: 25/200} >> !!! Simplifier [xxx.AirTrafficControl.Types.ATCMessage]: finished in >> 294595.62 milliseconds, allocated 146497.087 megabytes >> ``` >> >> So anyways I'll continue whittling this down. This module does use a >> lot of higher-kinded types and fancy stuff. >> >> On Thu, Feb 15, 2024 at 3:56 PM Simon Peyton Jones >> wrote: >> > >> > Using `-dshow-passes` is very helpful too. It shows the program size after each pass of the compiler. >> > >> > Simon >> > >> > On Thu, 15 Feb 2024 at 19:36, Teofil Camarasu wrote: >> >> >> >> Hi Justin, >> >> >> >> From your description, it sounds to me like there's something in your source code that's causing the optimiser to generate too much code, which then causes the crash because of memory exhaustion (though I might be wrong about this). >> >> In the past, when I've run into similar things. I've followed the following vague process to help find a minimal reproducer of the issue. >> >> >> >> - pass `-ddump-simpl -ddump-timings -ddump-to-file` to GHC. (See here for docs on these flags: https://downloads.haskell.org/ghc/latest/docs/users_guide/debugging.html) >> >> These will write some extra debugging information to either your `dist-newstyle` or `.stack-work` directory depending on whether you use cabal or stack. >> >> They will create for each source file a `.dump-simpl` file that will give you the compiler's intermediate output. And a `.dump-timings` file that will show timings information about how long each phase of compilation took. >> >> >> >> - The first step is to hone down on the problematic module or modules. Maybe you already have a good idea from where in your build the compiler crashes. >> >> But if not, you can use the `.dump-timings` files and/or a tool that summarises them like https://github.com/codedownio/time-ghc-modules. To get a sense of where the problem lies. >> >> >> >> - Once you've found your worst module, the next step is to determine what about that module is causing the issue. >> >> I find that often you can just try to find what top level identifiers in your `.dump-simpl` file are big. This will give a good idea of which part of your source code is to blame. >> >> Then I tend to try to delete everything that is irrelevant, and check again. Incrementally you get something that is smaller and smaller, and in time you tend to end up with something that is small enough to write up as a ticket. >> >> >> >> I hope that helps. I've found this process to work quite well for hunting down issues where GHC's optimiser goes wrong, but it is a bit of a labour intensive process. >> >> >> >> One last thing. You mention that you are on M2. If it's easily doable for you, try to reproduce on x86_64 just to make sure it's not some bug specific to M2. >> >> >> >> Cheers, >> >> Teo >> >> >> >> On Thu, Feb 15, 2024 at 7:08 PM Justin Bailey wrote: >> >>> >> >>> Hi! >> >>> >> >>> I'm trying to upgrade our (large) codebase to use 9.8.1. (I'm on an M2). >> >>> >> >>> When building with -01, memory on the GHC process climbs until it >> >>> reaches the limit of my machine (64G) and then crashes with a >> >>> segfault. >> >>> >> >>> With -00, that does not happen. >> >>> >> >>> How would I go about diagnosing what's happening? Using RTS flags to >> >>> limit the heap to 32G produced the same behavior, just faster. >> >>> >> >>> Strangely, `-v5` does not produce any more output in the console >> >>> (passed via cabal's --ghc-options). Maybe I'm doing it wrong? >> >>> >> >>> Pointers to existing issues or documentation welcome! Thank you! >> >>> >> >>> Justin >> >>> _______________________________________________ >> >>> ghc-devs mailing list >> >>> ghc-devs at haskell.org >> >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> >> >> >> _______________________________________________ >> >> ghc-devs mailing list >> >> ghc-devs at haskell.org >> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs From carter.schonwald at gmail.com Thu Feb 22 02:13:21 2024 From: carter.schonwald at gmail.com (Carter Schonwald) Date: Wed, 21 Feb 2024 21:13:21 -0500 Subject: Diagnosing excessive memory usage / crash when compiling - 9.8.1 In-Reply-To: References: Message-ID: Undecidable instances should never be the issue. It looks like te example code you shared Is using generic deriving and then using some generic deriving code gen on the Barbie package classes FunctorB and ConstraintB So it seems like, given than O0 doesn’t trigger that problem, that some of the generic deriving code in Barbie trips up ghcs optimizer. On Wed, Feb 21, 2024 at 5:00 PM Justin Bailey wrote: > I've narrowed this down to a pretty small example, and can show that, > as the number of fields in my data type increases, compilation takes > longer and longer (seems exponential). > > For example, on my M2, with GHC 9.8, I get these timings & peak memory > usage: > > * 1 field - 2.5s, 198MB peak > * 2 fields - 7s, 1.0GB peak > * 3 fields - 26.8s, 4.6GB peak > * 4 fields - 82.9s, 14.5GB peak > > For GHC 9.6, those stay pretty much flat up to 10 fields. I didn't > test past 4 with GHC 9.8. > > The project does use `UndecidableInstances` which worries me. > > In any case I reported a bug at > https://gitlab.haskell.org/ghc/ghc/-/issues/24462. Thanks for help > narrowing the problem! > > On Fri, Feb 16, 2024 at 12:48 AM Simon Peyton Jones > wrote: > > > > Sorry about that! > > > > Maybe you have a giant data type with deriving(Generic)? GHC tends to > behave badly on those. And yes, you seem to have a lot of type-family > stuff going on! Usually we see 10k coercion sizes; you have 400k. > > > > Quite a lot of improvements have happened in this area, which may (or > may not) help. Once you have whittled a bit, perhaps it'd be possible to > test with HEAD? > > > > This was better with ... 9.6? 9.4? > > > > Simon > > > > On Fri, 16 Feb 2024 at 01:36, Justin Bailey wrote: > >> > >> Well, after running with these flags, one of the `.dump-simpl` files > >> is 26 GB! That's also the module it seems to hang on, so pretty sure > >> something is going wrong there! > >> > >> I was seeing output indicating GHC had allocated 146GB during some of > >> the passes ??? > >> > >> ``` > >> > >> *** Simplifier [xxx.AirTrafficControl.Types.ATCMessage]: > >> Result size of Simplifier iteration=1 > >> = {terms: 9,134, > >> types: 49,937, > >> coercions: 388,802,399, > >> joins: 53/289} > >> Result size of Simplifier iteration=2 > >> = {terms: 8,368, > >> types: 46,864, > >> coercions: 176,356,474, > >> joins: 25/200} > >> Result size of Simplifier > >> = {terms: 8,363, > >> types: 46,848, > >> coercions: 176,356,474, > >> joins: 25/200} > >> !!! Simplifier [xxx.AirTrafficControl.Types.ATCMessage]: finished in > >> 294595.62 milliseconds, allocated 146497.087 megabytes > >> ``` > >> > >> So anyways I'll continue whittling this down. This module does use a > >> lot of higher-kinded types and fancy stuff. > >> > >> On Thu, Feb 15, 2024 at 3:56 PM Simon Peyton Jones > >> wrote: > >> > > >> > Using `-dshow-passes` is very helpful too. It shows the program size > after each pass of the compiler. > >> > > >> > Simon > >> > > >> > On Thu, 15 Feb 2024 at 19:36, Teofil Camarasu < > teofilcamarasu at gmail.com> wrote: > >> >> > >> >> Hi Justin, > >> >> > >> >> From your description, it sounds to me like there's something in > your source code that's causing the optimiser to generate too much code, > which then causes the crash because of memory exhaustion (though I might be > wrong about this). > >> >> In the past, when I've run into similar things. I've followed the > following vague process to help find a minimal reproducer of the issue. > >> >> > >> >> - pass `-ddump-simpl -ddump-timings -ddump-to-file` to GHC. (See > here for docs on these flags: > https://downloads.haskell.org/ghc/latest/docs/users_guide/debugging.html) > >> >> These will write some extra debugging information to either your > `dist-newstyle` or `.stack-work` directory depending on whether you use > cabal or stack. > >> >> They will create for each source file a `.dump-simpl` file that will > give you the compiler's intermediate output. And a `.dump-timings` file > that will show timings information about how long each phase of compilation > took. > >> >> > >> >> - The first step is to hone down on the problematic module or > modules. Maybe you already have a good idea from where in your build the > compiler crashes. > >> >> But if not, you can use the `.dump-timings` files and/or a tool that > summarises them like https://github.com/codedownio/time-ghc-modules. To > get a sense of where the problem lies. > >> >> > >> >> - Once you've found your worst module, the next step is to determine > what about that module is causing the issue. > >> >> I find that often you can just try to find what top level > identifiers in your `.dump-simpl` file are big. This will give a good idea > of which part of your source code is to blame. > >> >> Then I tend to try to delete everything that is irrelevant, and > check again. Incrementally you get something that is smaller and smaller, > and in time you tend to end up with something that is small enough to write > up as a ticket. > >> >> > >> >> I hope that helps. I've found this process to work quite well for > hunting down issues where GHC's optimiser goes wrong, but it is a bit of a > labour intensive process. > >> >> > >> >> One last thing. You mention that you are on M2. If it's easily > doable for you, try to reproduce on x86_64 just to make sure it's not some > bug specific to M2. > >> >> > >> >> Cheers, > >> >> Teo > >> >> > >> >> On Thu, Feb 15, 2024 at 7:08 PM Justin Bailey > wrote: > >> >>> > >> >>> Hi! > >> >>> > >> >>> I'm trying to upgrade our (large) codebase to use 9.8.1. (I'm on an > M2). > >> >>> > >> >>> When building with -01, memory on the GHC process climbs until it > >> >>> reaches the limit of my machine (64G) and then crashes with a > >> >>> segfault. > >> >>> > >> >>> With -00, that does not happen. > >> >>> > >> >>> How would I go about diagnosing what's happening? Using RTS flags to > >> >>> limit the heap to 32G produced the same behavior, just faster. > >> >>> > >> >>> Strangely, `-v5` does not produce any more output in the console > >> >>> (passed via cabal's --ghc-options). Maybe I'm doing it wrong? > >> >>> > >> >>> Pointers to existing issues or documentation welcome! Thank you! > >> >>> > >> >>> Justin > >> >>> _______________________________________________ > >> >>> ghc-devs mailing list > >> >>> ghc-devs at haskell.org > >> >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > >> >> > >> >> _______________________________________________ > >> >> ghc-devs mailing list > >> >> ghc-devs at haskell.org > >> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From me at amelia.how Thu Feb 22 17:06:03 2024 From: me at amelia.how (=?UTF-8?B?QW3DqWxpYSBMaWFv?=) Date: Thu, 22 Feb 2024 14:06:03 -0300 Subject: Understanding the restrictions on instance head shapes Message-ID: Dear ghc-devs, Prompted by a toot by Conor McBride [1], I got sent on a rabbit-chase down the hole that are the restrictions on instance head shapes. Re-assembling the relevant parts, Conor's trouble arises from code of the form data Nat data CdB (p :: Nat -> *) (m :: Nat) :: * where type Shown (p :: Nat -> *) = forall n. Show (p n) instance Shown p => Shown (CdB p) where show = ... After some shallow investigation, a similar issue came up as #13267, whose fix adds a note mentioning that the "show is not a method" error pops up in the renamer, and so is a bit tricky to improve. However, even after expanding Conor's type synonyms, the type checker now complains that an instance head can not have a forall. I do understand the reasons behind forbidding 'forall's in instance heads, but I don't see a good reason to forbid instances headed by a type synonym --- the original concern, writing an instance for a constraint tuple, is (now) forbidden regardless of whether 'checkValidInstance' uses 'splitTyConApp_maybe', by the call to 'checkValidInstanceHead'. Would a patch allowing instances headed by a type synonym be welcome, or would this need to go through a proposal? I think that it would be mostly straightforward, though laborious, to implement this; Essentially, changing 'rnMethodBinds' to annotate method bindings with a set of /all possible/ method 'Name's that match what the user wrote, together with their classes, and only deciding on the actual 'Name' in 'tcMethods', where we know the actual class name for the instance head. If this work would be welcome, I'd appreciate some pointers on dealing with TTG before I write a massive pull request that wastes everyone's time. In particular, my immediate thought for storing the extra information needed is to change the type of 'cid_binds' for 'ClsInstDecl' in *'GhcRn' only*, but trying this out locally causes quite a bit of breakage. It might make more sense to add a 'MethodBind' constructor to 'HsBindLR' which is inserted by the renamer, like 'VarBind', and removed by the type checker; but this /also/ requires quite a bit of code motion. I'm happy to open an issue on GitLab if discussing possible implementation ideas would be easier over there. Of course, simply allowing instances headed by a type synonym wouldn't fix Conor's original example, since having 'forall' in an instance head would still be forbidden. It's my understanding that the reason those are forbidden is due to nasty interactions with ScopedTypeVariables, where instance forall a. C a => forall b. D a b where brings 'b' into scope in the methods' RHSes, whereas the RHS of a top-level function with the same signature would /not/ have 'b' in scope. But given that a higher-ranked type synonym would not bring the variables it quantifies over into scope /anyway/, is there a reason to forbid it then? Cheers, Amélia [1]: https://types.pl/@pigworker/111975161248256507 From zubin at well-typed.com Fri Feb 23 10:55:32 2024 From: zubin at well-typed.com (Zubin Duggal) Date: Fri, 23 Feb 2024 16:25:32 +0530 Subject: [ANNOUNCE] GHC 9.8.2 is now available Message-ID: The GHC developers are happy to announce the availability of GHC 9.8.2. Binary distributions, source distributions, and documentation are available on the [release page](https://www.haskell.org/ghc/download_ghc_9_8_2.html). GHC Blog Post: https://www.haskell.org/ghc/blog/20240223-ghc-9.8.2-released.html This release is primarily a bugfix release addressing many issues found in the 9.8 series. These include: * A fix for a bug where certain warnings flags were not recognised (#24071) * Fixes for bugs in the renamer and typechecker (#24084, #24134, #24279, #24083) * Fixes for bugs in the simplifier and code generator (#24160, #24242, #23628, #23659, #24160, #23862, #24295, #24370) * Fixes for some memory leaks in GHCi (#24107, #24118) * Improvements to error messages (#21097, #16996, #11050, #24196, #24275, #23768, #23784, #23778) * A fix for a recompilation checking bug where GHC may miss changes in transitive dependencies when deciding to relink a program (#23724). * And many more fixes A full accounting of changes can be found in the [release notes]. As some of the fixed issues do affect correctness users are encouraged to upgrade promptly. We would like to thank Microsoft Azure, GitHub, IOG, the Zw3rk stake pool, Well-Typed, Tweag I/O, Serokell, Equinix, SimSpace, Haskell Foundation, and other anonymous contributors whose on-going financial and in-kind support has facilitated GHC maintenance and release management over the years. Finally, this release would not have been possible without the hundreds of open-source contributors whose work comprise this release. As always, do give this release a try and open a [ticket][] if you see anything amiss. Enjoy! -Zubin [ticket]: https://gitlab.haskell.org/ghc/ghc/-/issues/new [release notes]: https://downloads.haskell.org/~ghc/9.8.2/docs/users_guide/9.8.2-notes.html -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From elijahiff at gmail.com Sat Feb 24 00:23:55 2024 From: elijahiff at gmail.com (Eli F-F) Date: Fri, 23 Feb 2024 16:23:55 -0800 Subject: Gitlab Account Message-ID: Hello! I'm requesting my account be approved. I put my username down as HexTheDragon Eli F-F -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewtpickering at gmail.com Sat Feb 24 07:39:51 2024 From: matthewtpickering at gmail.com (Matthew Pickering) Date: Sat, 24 Feb 2024 07:39:51 +0000 Subject: Gitlab Account In-Reply-To: References: Message-ID: I have confirmed your account On Sat, 24 Feb 2024, 00:24 Eli F-F, wrote: > Hello! I'm requesting my account be approved. I put my username down as > HexTheDragon > > Eli F-F > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From adi.obilisetty at gmail.com Sun Feb 25 12:58:45 2024 From: adi.obilisetty at gmail.com (Adithya Kumar) Date: Sun, 25 Feb 2024 18:28:45 +0530 Subject: Modifying the "StgHeader" struct Message-ID: Hello, I want to add a field for collecting information about the number of GCs a particular heap object survived. I've done the following to implement this: - Add a field "StgWord gc_id" to struct "StgHeader" and "StgThunkHeader" - Update "SET_HDR" macro in "ClosureMacros.h" to update the "gc_id" - Update the "closureTypeHeaderSize" function in "Heap/ClosureTypes.hs" to account for that 1 extra word. Compilation leads to a segmentation fault. The stage 2 compiler is built. But usage of it leads to a segmentation fault. Build output: ``` chmod +x inplace/bin/ghc-stage2 "inplace/bin/ghc-stage2" -hisuf dyn_hi -osuf dyn_o -hcsuf dyn_hc -fPIC -dynamic -O0 -H64m -Wall -hide-all-packages -package-env - -i -iutils/check-api-annotations/. -iutils/check-api-annotations/dist-install/build -Iutils/check-api-annotations/dist-install/build -iutils/check-api-annotations/dist-install/build/check-api-annotations/autogen -Iutils/check-api-annotations/dist-install/build/check-api-annotations/autogen -optP-include -optPutils/check-api-annotations/dist-install/build/check-api-annotations/autogen/cabal_macros.h -package-id Cabal-3.2.1.0 -package-id base-4.14.3.0 -package-id containers-0.6.5.1 -package-id directory-1.3.6.0 -package-id ghc-8.10.7.20240224 -Wall -XHaskell2010 -no-user-package-db -rtsopts -Wnoncanonical-monad-instances -outputdir utils/check-api-annotations/dist-install/build -c utils/check-api-annotations/./Main.hs -o utils/check-api-annotations/dist-install/build/Main.dyn_o utils/check-api-annotations/ghc.mk:18: recipe for target 'utils/check-api-annotations/dist-install/build/Main.dyn_o' failed make[1]: *** [utils/check-api-annotations/dist-install/build/Main.dyn_o] Segmentation fault (core dumped) Makefile:123: recipe for target 'all' failed make: *** [all] Error 2 ``` Is there any other place I've missed follow-up modifications? I appreciate any help provided. Best, Adithya -------------- next part -------------- An HTML attachment was scrubbed... URL: From csaba.hruska at gmail.com Sun Feb 25 15:42:57 2024 From: csaba.hruska at gmail.com (Csaba Hruska) Date: Sun, 25 Feb 2024 16:42:57 +0100 Subject: Modifying the "StgHeader" struct In-Reply-To: References: Message-ID: Hi, Are you using profiling mode? Regards, Csaba On Sun, Feb 25, 2024 at 1:59 PM Adithya Kumar wrote: > Hello, > > I want to add a field for collecting information about the number of GCs a > particular heap object survived. > > I've done the following to implement this: > - Add a field "StgWord gc_id" to struct "StgHeader" and "StgThunkHeader" > - Update "SET_HDR" macro in "ClosureMacros.h" to update the "gc_id" > - Update the "closureTypeHeaderSize" function in "Heap/ClosureTypes.hs" to > account for that 1 extra word. > > Compilation leads to a segmentation fault. The stage 2 compiler is built. > But > usage of it leads to a segmentation fault. > > Build output: > ``` > chmod +x > inplace/bin/ghc-stage2 > "inplace/bin/ghc-stage2" -hisuf dyn_hi -osuf dyn_o -hcsuf dyn_hc -fPIC > -dynamic -O0 -H64m -Wall -hide-all-packages -package-env - -i > -iutils/check-api-annotations/. > -iutils/check-api-annotations/dist-install/build > -Iutils/check-api-annotations/dist-install/build > -iutils/check-api-annotations/dist-install/build/check-api-annotations/autogen > -Iutils/check-api-annotations/dist-install/build/check-api-annotations/autogen > -optP-include > -optPutils/check-api-annotations/dist-install/build/check-api-annotations/autogen/cabal_macros.h > -package-id Cabal-3.2.1.0 -package-id base-4.14.3.0 -package-id > containers-0.6.5.1 -package-id directory-1.3.6.0 -package-id > ghc-8.10.7.20240224 -Wall -XHaskell2010 -no-user-package-db -rtsopts > -Wnoncanonical-monad-instances -outputdir > utils/check-api-annotations/dist-install/build -c > utils/check-api-annotations/./Main.hs -o > utils/check-api-annotations/dist-install/build/Main.dyn_o > utils/check-api-annotations/ghc.mk:18: recipe for target > 'utils/check-api-annotations/dist-install/build/Main.dyn_o' failed > make[1]: *** [utils/check-api-annotations/dist-install/build/Main.dyn_o] > Segmentation fault (core dumped) > Makefile:123: recipe for target 'all' failed > make: *** [all] Error 2 > ``` > > Is there any other place I've missed follow-up modifications? > > I appreciate any help provided. > > Best, > Adithya > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthewtpickering at gmail.com Sun Feb 25 16:02:12 2024 From: matthewtpickering at gmail.com (Matthew Pickering) Date: Sun, 25 Feb 2024 16:02:12 +0000 Subject: Modifying the "StgHeader" struct In-Reply-To: References: Message-ID: Hi Adithya, It seems that eras profiling mode allows you to determine this information. In eras profiling the era the object is created is stored in the profiling header, you can automatically increment the era with the --automatic-era-increment at each major collection, you can work out "how many collections an object has been live" by subtracting the current era from the era the object was created. Cheers, Matt On Sun, 25 Feb 2024, 15:43 Csaba Hruska, wrote: > Hi, > Are you using profiling mode? > Regards, > Csaba > > On Sun, Feb 25, 2024 at 1:59 PM Adithya Kumar > wrote: > >> Hello, >> >> I want to add a field for collecting information about the number of GCs a >> particular heap object survived. >> >> I've done the following to implement this: >> - Add a field "StgWord gc_id" to struct "StgHeader" and "StgThunkHeader" >> - Update "SET_HDR" macro in "ClosureMacros.h" to update the "gc_id" >> - Update the "closureTypeHeaderSize" function in "Heap/ClosureTypes.hs" to >> account for that 1 extra word. >> >> Compilation leads to a segmentation fault. The stage 2 compiler is built. >> But >> usage of it leads to a segmentation fault. >> >> Build output: >> ``` >> chmod +x >> inplace/bin/ghc-stage2 >> "inplace/bin/ghc-stage2" -hisuf dyn_hi -osuf dyn_o -hcsuf dyn_hc -fPIC >> -dynamic -O0 -H64m -Wall -hide-all-packages -package-env - -i >> -iutils/check-api-annotations/. >> -iutils/check-api-annotations/dist-install/build >> -Iutils/check-api-annotations/dist-install/build >> -iutils/check-api-annotations/dist-install/build/check-api-annotations/autogen >> -Iutils/check-api-annotations/dist-install/build/check-api-annotations/autogen >> -optP-include >> -optPutils/check-api-annotations/dist-install/build/check-api-annotations/autogen/cabal_macros.h >> -package-id Cabal-3.2.1.0 -package-id base-4.14.3.0 -package-id >> containers-0.6.5.1 -package-id directory-1.3.6.0 -package-id >> ghc-8.10.7.20240224 -Wall -XHaskell2010 -no-user-package-db -rtsopts >> -Wnoncanonical-monad-instances -outputdir >> utils/check-api-annotations/dist-install/build -c >> utils/check-api-annotations/./Main.hs -o >> utils/check-api-annotations/dist-install/build/Main.dyn_o >> utils/check-api-annotations/ghc.mk:18: recipe for target >> 'utils/check-api-annotations/dist-install/build/Main.dyn_o' failed >> make[1]: *** [utils/check-api-annotations/dist-install/build/Main.dyn_o] >> Segmentation fault (core dumped) >> Makefile:123: recipe for target 'all' failed >> make: *** [all] Error 2 >> ``` >> >> Is there any other place I've missed follow-up modifications? >> >> I appreciate any help provided. >> >> Best, >> Adithya >> _______________________________________________ >> ghc-devs mailing list >> ghc-devs at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs >> > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodrigo.m.mesquita at gmail.com Sun Feb 25 16:03:20 2024 From: rodrigo.m.mesquita at gmail.com (Rodrigo Mesquita) Date: Sun, 25 Feb 2024 16:03:20 +0000 Subject: Modifying the "StgHeader" struct In-Reply-To: References: Message-ID: Hello Adithya, This sounds just like what has been implemented as the default mode for eras profiling: https://well-typed.com/blog/2024/01/ghc-eras-profiling — which tracks an “era” for each heap object — and, by default, uses a major garbage collection as the increment trigger. Could you check it out and see if it would supersede what you are trying to do? Cheers, Rodrigo > On 25 Feb 2024, at 12:58, Adithya Kumar wrote: > > Hello, > > I want to add a field for collecting information about the number of GCs a > particular heap object survived. > > I've done the following to implement this: > - Add a field "StgWord gc_id" to struct "StgHeader" and "StgThunkHeader" > - Update "SET_HDR" macro in "ClosureMacros.h" to update the "gc_id" > - Update the "closureTypeHeaderSize" function in "Heap/ClosureTypes.hs" to > account for that 1 extra word. > > Compilation leads to a segmentation fault. The stage 2 compiler is built. But > usage of it leads to a segmentation fault. > > Build output: > ``` > chmod +x inplace/bin/ghc-stage2 > "inplace/bin/ghc-stage2" -hisuf dyn_hi -osuf dyn_o -hcsuf dyn_hc -fPIC -dynamic -O0 -H64m -Wall -hide-all-packages -package-env - -i -iutils/check-api-annotations/. -iutils/check-api-annotations/dist-install/build -Iutils/check-api-annotations/dist-install/build -iutils/check-api-annotations/dist-install/build/check-api-annotations/autogen -Iutils/check-api-annotations/dist-install/build/check-api-annotations/autogen -optP-include -optPutils/check-api-annotations/dist-install/build/check-api-annotations/autogen/cabal_macros.h -package-id Cabal-3.2.1.0 -package-id base-4.14.3.0 -package-id containers-0.6.5.1 -package-id directory-1.3.6.0 -package-id ghc-8.10.7.20240224 -Wall -XHaskell2010 -no-user-package-db -rtsopts -Wnoncanonical-monad-instances -outputdir utils/check-api-annotations/dist-install/build -c utils/check-api-annotations/./Main.hs -o utils/check-api-annotations/dist-install/build/Main.dyn_o > utils/check-api-annotations/ghc.mk:18 : recipe for target 'utils/check-api-annotations/dist-install/build/Main.dyn_o' failed > make[1]: *** [utils/check-api-annotations/dist-install/build/Main.dyn_o] Segmentation fault (core dumped) > Makefile:123: recipe for target 'all' failed > make: *** [all] Error 2 > ``` > > Is there any other place I've missed follow-up modifications? > > I appreciate any help provided. > > Best, > Adithya > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs -------------- next part -------------- An HTML attachment was scrubbed... URL: From adi.obilisetty at gmail.com Mon Feb 26 03:22:05 2024 From: adi.obilisetty at gmail.com (Adithya Kumar) Date: Mon, 26 Feb 2024 08:52:05 +0530 Subject: Modifying the "StgHeader" struct In-Reply-To: References: Message-ID: Hello, Appreciate the swift responses. I need this feature in GHC 8.10.7. This, I believe is only available post 9.10. I can try replicating the same kind of behavior in GHC 8.10.7. Some more context to the problem at hand: The overhead of the profiling I want to enable should be as minimal as possible as we plan to run this build in production. The memory leak I'm trying to debug happens over a few days after over a million requests and we aren't able to replicate it offline. I can take inspiration from the work done in 9.10 and replicate it in 8.10.7 in a way that it can be used without other features of profiling. This is what I was trying to do. Ie. Add a new field "era" in the "StgHeader" instead of "StgProfHeader". Making the follow-up changes was the problem as I might've missed the bookkeeping in a few places. But now I can look at 9.10. So I have a baseline to compare with. I will try this, and get back to this thread. Best, Adithya On Sun, Feb 25, 2024 at 9:33 PM Rodrigo Mesquita < rodrigo.m.mesquita at gmail.com> wrote: > Hello Adithya, > > This sounds just like what has been implemented as the default mode for > eras profiling: https://well-typed.com/blog/2024/01/ghc-eras-profiling — > which tracks an “era” for each heap object — and, by default, uses a major > garbage collection as the increment trigger. > Could you check it out and see if it would supersede what you are trying > to do? > > Cheers, > Rodrigo > > On 25 Feb 2024, at 12:58, Adithya Kumar wrote: > > Hello, > > I want to add a field for collecting information about the number of GCs a > particular heap object survived. > > I've done the following to implement this: > - Add a field "StgWord gc_id" to struct "StgHeader" and "StgThunkHeader" > - Update "SET_HDR" macro in "ClosureMacros.h" to update the "gc_id" > - Update the "closureTypeHeaderSize" function in "Heap/ClosureTypes.hs" to > account for that 1 extra word. > > Compilation leads to a segmentation fault. The stage 2 compiler is built. > But > usage of it leads to a segmentation fault. > > Build output: > ``` > chmod +x > inplace/bin/ghc-stage2 > "inplace/bin/ghc-stage2" -hisuf dyn_hi -osuf dyn_o -hcsuf dyn_hc -fPIC > -dynamic -O0 -H64m -Wall -hide-all-packages -package-env - -i > -iutils/check-api-annotations/. > -iutils/check-api-annotations/dist-install/build > -Iutils/check-api-annotations/dist-install/build > -iutils/check-api-annotations/dist-install/build/check-api-annotations/autogen > -Iutils/check-api-annotations/dist-install/build/check-api-annotations/autogen > -optP-include > -optPutils/check-api-annotations/dist-install/build/check-api-annotations/autogen/cabal_macros.h > -package-id Cabal-3.2.1.0 -package-id base-4.14.3.0 -package-id > containers-0.6.5.1 -package-id directory-1.3.6.0 -package-id > ghc-8.10.7.20240224 -Wall -XHaskell2010 -no-user-package-db -rtsopts > -Wnoncanonical-monad-instances -outputdir > utils/check-api-annotations/dist-install/build -c > utils/check-api-annotations/./Main.hs -o > utils/check-api-annotations/dist-install/build/Main.dyn_o > utils/check-api-annotations/ghc.mk:18: recipe for target > 'utils/check-api-annotations/dist-install/build/Main.dyn_o' failed > make[1]: *** [utils/check-api-annotations/dist-install/build/Main.dyn_o] > Segmentation fault (core dumped) > Makefile:123: recipe for target 'all' failed > make: *** [all] Error 2 > ``` > > Is there any other place I've missed follow-up modifications? > > I appreciate any help provided. > > Best, > Adithya > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From simon.peytonjones at gmail.com Mon Feb 26 13:15:17 2024 From: simon.peytonjones at gmail.com (Simon Peyton Jones) Date: Mon, 26 Feb 2024 13:15:17 +0000 Subject: Understanding the restrictions on instance head shapes In-Reply-To: References: Message-ID: Hi Amelia I do understand the reasons behind forbidding 'forall's in instance heads, but I don't see a good reason to forbid instances headed by a type synonym --- the original concern, writing an instance for a constraint tuple, is (now) forbidden regardless of whether 'checkValidInstance' uses 'splitTyConApp_maybe', by the call to 'checkValidInstanceHead'. Would a patch allowing instances headed by a type synonym be welcome, or would this need to go through a proposal? If you want to change the source language, you definitely need a proposal. I must say that I'm very un-clear what specific change you propose, and why it would be a desirable change. But elucidating all that is precisely what the GHC proposals process is all about! Simon On Thu, 22 Feb 2024 at 17:06, Amélia Liao wrote: > Dear ghc-devs, > > Prompted by a toot by Conor McBride [1], I got sent on a rabbit-chase > down the hole that are the restrictions on instance head shapes. > Re-assembling the relevant parts, Conor's trouble arises from code of > the form > > data Nat > data CdB (p :: Nat -> *) (m :: Nat) :: * where > > type Shown (p :: Nat -> *) = forall n. Show (p n) > instance Shown p => Shown (CdB p) where > show = ... > > After some shallow investigation, a similar issue came up as #13267, > whose fix adds a note mentioning that the "show is not a method" error > pops up in the renamer, and so is a bit tricky to improve. However, even > after expanding Conor's type synonyms, the type checker now complains > that an instance head can not have a forall. > > I do understand the reasons behind forbidding 'forall's in instance > heads, but I don't see a good reason to forbid instances headed by a > type synonym --- the original concern, writing an instance for a > constraint tuple, is (now) forbidden regardless of whether > 'checkValidInstance' uses 'splitTyConApp_maybe', by the call to > 'checkValidInstanceHead'. Would a patch allowing instances headed by a > type synonym be welcome, or would this need to go through a proposal? > > I think that it would be mostly straightforward, though laborious, to > implement this; Essentially, changing 'rnMethodBinds' to annotate method > bindings with a set of /all possible/ method 'Name's that match what the > user wrote, together with their classes, and only deciding on the actual > 'Name' in 'tcMethods', where we know the actual class name for the > instance head. > > If this work would be welcome, I'd appreciate some pointers on dealing > with TTG before I write a massive pull request that wastes everyone's > time. > > In particular, my immediate thought for storing the extra information > needed is to change the type of 'cid_binds' for 'ClsInstDecl' in > *'GhcRn' only*, but trying this out locally causes quite a bit of > breakage. It might make more sense to add a 'MethodBind' constructor to > 'HsBindLR' which is inserted by the renamer, like 'VarBind', and removed > by the type checker; but this /also/ requires quite a bit of code > motion. I'm happy to open an issue on GitLab if discussing possible > implementation ideas would be easier over there. > > Of course, simply allowing instances headed by a type synonym wouldn't > fix Conor's original example, since having 'forall' in an instance head > would still be forbidden. It's my understanding that the reason those > are forbidden is due to nasty interactions with ScopedTypeVariables, > where > > instance forall a. C a => forall b. D a b where > > brings 'b' into scope in the methods' RHSes, whereas the RHS of a > top-level function with the same signature would /not/ have 'b' in > scope. But given that a higher-ranked type synonym would not bring the > variables it quantifies over into scope /anyway/, is there a reason to > forbid it then? > > Cheers, > Amélia > > [1]: https://types.pl/@pigworker/111975161248256507 > _______________________________________________ > ghc-devs mailing list > ghc-devs at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben at well-typed.com Thu Feb 29 22:57:21 2024 From: ben at well-typed.com (Ben Gamari) Date: Thu, 29 Feb 2024 17:57:21 -0500 Subject: GHC 9.10.1 alpha 1 delay Message-ID: <87o7bydgkx.fsf@smart-cactus.org> Hi all, Unfortunately due to persistent trouble merging the last few patches destined for 9.10.1-alpha1 I will need to delay the release by a few days. As I will be away for the first two days of next week I will be delaying by one week, placing the new alpha 1 release date on 7 March 2024. Cheers, - Ben -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 487 bytes Desc: not available URL: