[Git][ghc/ghc][wip/general-catgeory] unicode: Don't inline bitmap in generalCategory

Matthew Pickering (@mpickering) gitlab at gitlab.haskell.org
Mon Feb 13 12:00:33 UTC 2023



Matthew Pickering pushed to branch wip/general-catgeory at Glasgow Haskell Compiler / GHC


Commits:
400894c6 by Matthew Pickering at 2023-02-13T11:59:25+00:00
unicode: Don't inline bitmap in generalCategory

generalCategory contains a huge literal string but is marked INLINE,
this will duplicate the string into any use site of generalCategory. In
particular generalCategory is used in functions like isSpace and the
literal gets inlined into this function which makes it massive.

https://github.com/haskell/core-libraries-committee/issues/130

Fixes #22949

-------------------------
Metric Decrease:
    T4029
-------------------------

- - - - -


3 changed files:

- libraries/base/GHC/Unicode/Internal/Char/UnicodeData/GeneralCategory.hs
- libraries/base/changelog.md
- libraries/base/tools/ucd2haskell/exe/Parser/Text.hs


Changes:

=====================================
libraries/base/GHC/Unicode/Internal/Char/UnicodeData/GeneralCategory.hs
=====================================
The diff for this file was not included because it is too large.

=====================================
libraries/base/changelog.md
=====================================
@@ -4,6 +4,8 @@
   * Add `Data.List.!?` ([CLC proposal #110](https://github.com/haskell/core-libraries-committee/issues/110))
   * `maximumBy`/`minimumBy` are now marked as `INLINE` improving performance for unpackable
     types significantly.
+  * Refactor `generalCategory` to stop very large literal string being inlined to call-sites.
+      ([CLC proposal #130](https://github.com/haskell/core-libraries-committee/issues/130))
 
 ## 4.18.0.0 *TBA*
   * `Foreign.C.ConstPtr.ConstrPtr` was added to encode `const`-qualified


=====================================
libraries/base/tools/ucd2haskell/exe/Parser/Text.hs
=====================================
@@ -205,7 +205,11 @@ genEnumBitmap funcName def as = unlines
                <> show (length as)
                <> " then "
                <> show (fromEnum def)
-               <> " else lookupIntN bitmap# n"
+               <> " else lookup_bitmap n"
+
+    , "{-# NOINLINE lookup_bitmap #-}"
+    , "lookup_bitmap :: Int -> Int"
+    , "lookup_bitmap n = lookupIntN bitmap# n"
     , "  where"
     , "    bitmap# = \"" <> enumMapToAddrLiteral as "\"#"
     ]



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/400894c68cbf93bcdcbdca2705ccadb8d2ddabff

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/400894c68cbf93bcdcbdca2705ccadb8d2ddabff
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230213/e39f9155/attachment-0001.html>


More information about the ghc-commits mailing list