From clintonmead at gmail.com Wed Feb 23 02:08:20 2022 From: clintonmead at gmail.com (Clinton Mead) Date: Wed, 23 Feb 2022 13:08:20 +1100 Subject: Are constructors matched against using "switch" or chained if-else Message-ID: Hi All I'm developing an unbounded integer type, which I won't go into the details here but in some circumstances has better performance than the standard "Integer". Anyway, whilst there are complex cases, the most common case is a standard machine int multiplication. Hence I want the type to be optimised for that case. I'm going to have a few constructors, anyway, so I first considered something like this: `data MyInt = BasicZero | BasicPos Word | BasicNeg Word | ComplexPosA ... | ComplexNegA ... | ComplexPosB ... | ComplexNegB ...` I'd naturally make the "Word"s in "BasicPos" and "BasicNeg" strict/unpack, hopefully eliminating the indirection, or perhaps just making them primitive directly. This has 7 constructors, which quite nicely I believe fits into the three spare bits in a 64 bit pointer which GHC optimises I believe. However, this approach I thought of because I assumed that GHC would do a switch style statement on the constructors, so once I have more than one I might as well have as many as I want (well, up to 7, until I lose the optimisation). But if GHC compiles this to a "if ... else if..." chain, a better representation is the following: `data MyInt = BasicInt Int | ComplexPosA ... | ComplexNegA ... | ComplexPosB ... | ComplexNegB ...` That way I can match against BasicInt first, and knock that out of the way as my "hot" case. However, using "Int" instead of "Word" does make the logic a bit more complex, but it may be worth it if I'm reducing the number of patterns I have to check against for all arguments. I was just wondering if anyone could share some insight on what GHC does in these circumstances? For example, does the order I list my cases in a case statement matter if they're non-overlapping? Will GHC match them in the order I list, or will it just make them into a switch statement so it doesn't matter if I reorder them? I guess I could benchmark all this (and probably will) but some insights and general guidance would be good. Thanks, Clinton -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.feuer at gmail.com Wed Feb 23 02:34:13 2022 From: david.feuer at gmail.com (David Feuer) Date: Tue, 22 Feb 2022 21:34:13 -0500 Subject: Are constructors matched against using "switch" or chained if-else In-Reply-To: References: Message-ID: I can answer one of your questions for sure: the order of your case branches doesn't matter at all. However, the order of the data constructors in the type declaration does matter. Put your most likely one first. On Tue, Feb 22, 2022, 9:09 PM Clinton Mead wrote: > Hi All > > I'm developing an unbounded integer type, which I won't go into the > details here but in some circumstances has better performance than the > standard "Integer". > > Anyway, whilst there are complex cases, the most common case is a standard > machine int multiplication. > > Hence I want the type to be optimised for that case. > > I'm going to have a few constructors, anyway, so I first considered > something like this: > > `data MyInt = BasicZero | BasicPos Word | BasicNeg Word | ComplexPosA ... > | ComplexNegA ... | ComplexPosB ... | ComplexNegB ...` > > I'd naturally make the "Word"s in "BasicPos" and "BasicNeg" strict/unpack, > hopefully eliminating the indirection, or perhaps just making them > primitive directly. > > This has 7 constructors, which quite nicely I believe fits into the three > spare bits in a 64 bit pointer which GHC optimises I believe. > > However, this approach I thought of because I assumed that GHC would do a > switch style statement on the constructors, so once I have more than one I > might as well have as many as I want (well, up to 7, until I lose the > optimisation). > > But if GHC compiles this to a "if ... else if..." chain, a better > representation is the following: > > `data MyInt = BasicInt Int | ComplexPosA ... | ComplexNegA ... | > ComplexPosB ... | ComplexNegB ...` > > That way I can match against BasicInt first, and knock that out of the way > as my "hot" case. However, using "Int" instead of "Word" does make the > logic a bit more complex, but it may be worth it if I'm reducing the number > of patterns I have to check against for all arguments. > > I was just wondering if anyone could share some insight on what GHC does > in these circumstances? For example, does the order I list my cases in a > case statement matter if they're non-overlapping? Will GHC match them in > the order I list, or will it just make them into a switch statement so it > doesn't matter if I reorder them? > > I guess I could benchmark all this (and probably will) but some insights > and general guidance would be good. > > Thanks, > Clinton > _______________________________________________ > Glasgow-haskell-users mailing list > Glasgow-haskell-users at haskell.org > http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From clintonmead at gmail.com Wed Feb 23 02:52:00 2022 From: clintonmead at gmail.com (Clinton Mead) Date: Wed, 23 Feb 2022 13:52:00 +1100 Subject: Are constructors matched against using "switch" or chained if-else In-Reply-To: References: Message-ID: Thanks David. Can I ask why? Is it because the first constructor is treated specially? (perhaps because it has zeroed tag bits)? Or is it just because there's always an if/else chain in order of the constructor definition regardless of the order of the case statement so the higher up the list the better? On Wed, Feb 23, 2022 at 1:34 PM David Feuer wrote: > I can answer one of your questions for sure: the order of your case > branches doesn't matter at all. However, the order of the data constructors > in the type declaration does matter. Put your most likely one first. > > On Tue, Feb 22, 2022, 9:09 PM Clinton Mead wrote: > >> Hi All >> >> I'm developing an unbounded integer type, which I won't go into the >> details here but in some circumstances has better performance than the >> standard "Integer". >> >> Anyway, whilst there are complex cases, the most common case is a >> standard machine int multiplication. >> >> Hence I want the type to be optimised for that case. >> >> I'm going to have a few constructors, anyway, so I first considered >> something like this: >> >> `data MyInt = BasicZero | BasicPos Word | BasicNeg Word | ComplexPosA ... >> | ComplexNegA ... | ComplexPosB ... | ComplexNegB ...` >> >> I'd naturally make the "Word"s in "BasicPos" and "BasicNeg" >> strict/unpack, hopefully eliminating the indirection, or perhaps just >> making them primitive directly. >> >> This has 7 constructors, which quite nicely I believe fits into the three >> spare bits in a 64 bit pointer which GHC optimises I believe. >> >> However, this approach I thought of because I assumed that GHC would do a >> switch style statement on the constructors, so once I have more than one I >> might as well have as many as I want (well, up to 7, until I lose the >> optimisation). >> >> But if GHC compiles this to a "if ... else if..." chain, a better >> representation is the following: >> >> `data MyInt = BasicInt Int | ComplexPosA ... | ComplexNegA ... | >> ComplexPosB ... | ComplexNegB ...` >> >> That way I can match against BasicInt first, and knock that out of the >> way as my "hot" case. However, using "Int" instead of "Word" does make the >> logic a bit more complex, but it may be worth it if I'm reducing the number >> of patterns I have to check against for all arguments. >> >> I was just wondering if anyone could share some insight on what GHC does >> in these circumstances? For example, does the order I list my cases in a >> case statement matter if they're non-overlapping? Will GHC match them in >> the order I list, or will it just make them into a switch statement so it >> doesn't matter if I reorder them? >> >> I guess I could benchmark all this (and probably will) but some insights >> and general guidance would be good. >> >> Thanks, >> Clinton >> _______________________________________________ >> Glasgow-haskell-users mailing list >> Glasgow-haskell-users at haskell.org >> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.feuer at gmail.com Wed Feb 23 02:59:18 2022 From: david.feuer at gmail.com (David Feuer) Date: Tue, 22 Feb 2022 21:59:18 -0500 Subject: Are constructors matched against using "switch" or chained if-else In-Reply-To: References: Message-ID: You can ask, but someone else will have to answer. Sorry. On Tue, Feb 22, 2022 at 9:52 PM Clinton Mead wrote: > > Thanks David. Can I ask why? Is it because the first constructor is treated specially? (perhaps because it has zeroed tag bits)? Or is it just because there's always an if/else chain in order of the constructor definition regardless of the order of the case statement so the higher up the list the better? > > On Wed, Feb 23, 2022 at 1:34 PM David Feuer wrote: >> >> I can answer one of your questions for sure: the order of your case branches doesn't matter at all. However, the order of the data constructors in the type declaration does matter. Put your most likely one first. >> >> On Tue, Feb 22, 2022, 9:09 PM Clinton Mead wrote: >>> >>> Hi All >>> >>> I'm developing an unbounded integer type, which I won't go into the details here but in some circumstances has better performance than the standard "Integer". >>> >>> Anyway, whilst there are complex cases, the most common case is a standard machine int multiplication. >>> >>> Hence I want the type to be optimised for that case. >>> >>> I'm going to have a few constructors, anyway, so I first considered something like this: >>> >>> `data MyInt = BasicZero | BasicPos Word | BasicNeg Word | ComplexPosA ... | ComplexNegA ... | ComplexPosB ... | ComplexNegB ...` >>> >>> I'd naturally make the "Word"s in "BasicPos" and "BasicNeg" strict/unpack, hopefully eliminating the indirection, or perhaps just making them primitive directly. >>> >>> This has 7 constructors, which quite nicely I believe fits into the three spare bits in a 64 bit pointer which GHC optimises I believe. >>> >>> However, this approach I thought of because I assumed that GHC would do a switch style statement on the constructors, so once I have more than one I might as well have as many as I want (well, up to 7, until I lose the optimisation). >>> >>> But if GHC compiles this to a "if ... else if..." chain, a better representation is the following: >>> >>> `data MyInt = BasicInt Int | ComplexPosA ... | ComplexNegA ... | ComplexPosB ... | ComplexNegB ...` >>> >>> That way I can match against BasicInt first, and knock that out of the way as my "hot" case. However, using "Int" instead of "Word" does make the logic a bit more complex, but it may be worth it if I'm reducing the number of patterns I have to check against for all arguments. >>> >>> I was just wondering if anyone could share some insight on what GHC does in these circumstances? For example, does the order I list my cases in a case statement matter if they're non-overlapping? Will GHC match them in the order I list, or will it just make them into a switch statement so it doesn't matter if I reorder them? >>> >>> I guess I could benchmark all this (and probably will) but some insights and general guidance would be good. >>> >>> Thanks, >>> Clinton >>> _______________________________________________ >>> Glasgow-haskell-users mailing list >>> Glasgow-haskell-users at haskell.org >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users From clintonmead at gmail.com Wed Feb 23 05:08:29 2022 From: clintonmead at gmail.com (Clinton Mead) Date: Wed, 23 Feb 2022 16:08:29 +1100 Subject: Are constructors matched against using "switch" or chained if-else In-Reply-To: References: Message-ID: David, A random google search has revealed this StackOverflow answer , presumably by yourself or your evil twin, which mentions a binary search being performed. However the particular case you mention is a case match on "Ints", which are far from a dense set, whereas presumably you could do something nicer on the tag bits of constructors, which are a dense set? On Wed, Feb 23, 2022 at 1:59 PM David Feuer wrote: > You can ask, but someone else will have to answer. Sorry. > > On Tue, Feb 22, 2022 at 9:52 PM Clinton Mead > wrote: > > > > Thanks David. Can I ask why? Is it because the first constructor is > treated specially? (perhaps because it has zeroed tag bits)? Or is it just > because there's always an if/else chain in order of the constructor > definition regardless of the order of the case statement so the higher up > the list the better? > > > > On Wed, Feb 23, 2022 at 1:34 PM David Feuer > wrote: > >> > >> I can answer one of your questions for sure: the order of your case > branches doesn't matter at all. However, the order of the data constructors > in the type declaration does matter. Put your most likely one first. > >> > >> On Tue, Feb 22, 2022, 9:09 PM Clinton Mead > wrote: > >>> > >>> Hi All > >>> > >>> I'm developing an unbounded integer type, which I won't go into the > details here but in some circumstances has better performance than the > standard "Integer". > >>> > >>> Anyway, whilst there are complex cases, the most common case is a > standard machine int multiplication. > >>> > >>> Hence I want the type to be optimised for that case. > >>> > >>> I'm going to have a few constructors, anyway, so I first considered > something like this: > >>> > >>> `data MyInt = BasicZero | BasicPos Word | BasicNeg Word | ComplexPosA > ... | ComplexNegA ... | ComplexPosB ... | ComplexNegB ...` > >>> > >>> I'd naturally make the "Word"s in "BasicPos" and "BasicNeg" > strict/unpack, hopefully eliminating the indirection, or perhaps just > making them primitive directly. > >>> > >>> This has 7 constructors, which quite nicely I believe fits into the > three spare bits in a 64 bit pointer which GHC optimises I believe. > >>> > >>> However, this approach I thought of because I assumed that GHC would > do a switch style statement on the constructors, so once I have more than > one I might as well have as many as I want (well, up to 7, until I lose the > optimisation). > >>> > >>> But if GHC compiles this to a "if ... else if..." chain, a better > representation is the following: > >>> > >>> `data MyInt = BasicInt Int | ComplexPosA ... | ComplexNegA ... | > ComplexPosB ... | ComplexNegB ...` > >>> > >>> That way I can match against BasicInt first, and knock that out of the > way as my "hot" case. However, using "Int" instead of "Word" does make the > logic a bit more complex, but it may be worth it if I'm reducing the number > of patterns I have to check against for all arguments. > >>> > >>> I was just wondering if anyone could share some insight on what GHC > does in these circumstances? For example, does the order I list my cases in > a case statement matter if they're non-overlapping? Will GHC match them in > the order I list, or will it just make them into a switch statement so it > doesn't matter if I reorder them? > >>> > >>> I guess I could benchmark all this (and probably will) but some > insights and general guidance would be good. > >>> > >>> Thanks, > >>> Clinton > >>> _______________________________________________ > >>> Glasgow-haskell-users mailing list > >>> Glasgow-haskell-users at haskell.org > >>> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.feuer at gmail.com Wed Feb 23 05:13:37 2022 From: david.feuer at gmail.com (David Feuer) Date: Wed, 23 Feb 2022 00:13:37 -0500 Subject: Are constructors matched against using "switch" or chained if-else In-Reply-To: References: Message-ID: That answer (which I did indeed write once upon a time) is somewhat incomplete. Case matches on Ints actually work in about three different ways: 1. Linear search 2. Binary search 3. Jump table There's some amount of mix and match available for gappy things. I don't know all the details. As for tag matching, I know none of the mechanics. I just know that I've always been told to make the first constructor hot. On Wed, Feb 23, 2022, 12:08 AM Clinton Mead wrote: > David, > > A random google search has revealed this StackOverflow answer > , presumably by yourself or > your evil twin, which mentions a binary search being performed. However the > particular case you mention is a case match on "Ints", which are far from a > dense set, whereas presumably you could do something nicer on the tag bits > of constructors, which are a dense set? > > > On Wed, Feb 23, 2022 at 1:59 PM David Feuer wrote: > >> You can ask, but someone else will have to answer. Sorry. >> >> On Tue, Feb 22, 2022 at 9:52 PM Clinton Mead >> wrote: >> > >> > Thanks David. Can I ask why? Is it because the first constructor is >> treated specially? (perhaps because it has zeroed tag bits)? Or is it just >> because there's always an if/else chain in order of the constructor >> definition regardless of the order of the case statement so the higher up >> the list the better? >> > >> > On Wed, Feb 23, 2022 at 1:34 PM David Feuer >> wrote: >> >> >> >> I can answer one of your questions for sure: the order of your case >> branches doesn't matter at all. However, the order of the data constructors >> in the type declaration does matter. Put your most likely one first. >> >> >> >> On Tue, Feb 22, 2022, 9:09 PM Clinton Mead >> wrote: >> >>> >> >>> Hi All >> >>> >> >>> I'm developing an unbounded integer type, which I won't go into the >> details here but in some circumstances has better performance than the >> standard "Integer". >> >>> >> >>> Anyway, whilst there are complex cases, the most common case is a >> standard machine int multiplication. >> >>> >> >>> Hence I want the type to be optimised for that case. >> >>> >> >>> I'm going to have a few constructors, anyway, so I first considered >> something like this: >> >>> >> >>> `data MyInt = BasicZero | BasicPos Word | BasicNeg Word | ComplexPosA >> ... | ComplexNegA ... | ComplexPosB ... | ComplexNegB ...` >> >>> >> >>> I'd naturally make the "Word"s in "BasicPos" and "BasicNeg" >> strict/unpack, hopefully eliminating the indirection, or perhaps just >> making them primitive directly. >> >>> >> >>> This has 7 constructors, which quite nicely I believe fits into the >> three spare bits in a 64 bit pointer which GHC optimises I believe. >> >>> >> >>> However, this approach I thought of because I assumed that GHC would >> do a switch style statement on the constructors, so once I have more than >> one I might as well have as many as I want (well, up to 7, until I lose the >> optimisation). >> >>> >> >>> But if GHC compiles this to a "if ... else if..." chain, a better >> representation is the following: >> >>> >> >>> `data MyInt = BasicInt Int | ComplexPosA ... | ComplexNegA ... | >> ComplexPosB ... | ComplexNegB ...` >> >>> >> >>> That way I can match against BasicInt first, and knock that out of >> the way as my "hot" case. However, using "Int" instead of "Word" does make >> the logic a bit more complex, but it may be worth it if I'm reducing the >> number of patterns I have to check against for all arguments. >> >>> >> >>> I was just wondering if anyone could share some insight on what GHC >> does in these circumstances? For example, does the order I list my cases in >> a case statement matter if they're non-overlapping? Will GHC match them in >> the order I list, or will it just make them into a switch statement so it >> doesn't matter if I reorder them? >> >>> >> >>> I guess I could benchmark all this (and probably will) but some >> insights and general guidance would be good. >> >>> >> >>> Thanks, >> >>> Clinton >> >>> _______________________________________________ >> >>> Glasgow-haskell-users mailing list >> >>> Glasgow-haskell-users at haskell.org >> >>> >> http://mail.haskell.org/cgi-bin/mailman/listinfo/glasgow-haskell-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: