[Git][ghc/ghc][wip/fabu/T24026-early-reject-type-failing-rules] 6 commits: ucd2haskell: remove Streamly dependency + misc

Fabricio Nascimento (@fabu) gitlab at gitlab.haskell.org
Fri Jun 14 08:43:17 UTC 2024



Fabricio Nascimento pushed to branch wip/fabu/T24026-early-reject-type-failing-rules at Glasgow Haskell Compiler / GHC


Commits:
7b23ce8b by Pierre Le Marre at 2024-06-13T15:35:04-04:00
ucd2haskell: remove Streamly dependency + misc

- Remove dead code.
- Remove `streamly` dependency.
- Process files with `bytestring`.
- Replace Unicode files parsers with the corresponding ones from the
  package `unicode-data-parser`.
- Simplify cabal file and rename module
- Regenerate `ghc-internal` Unicode files with new header

- - - - -
4570319f by Jacco Krijnen at 2024-06-13T15:35:41-04:00
Document how to run haddocks tests (#24976)

Also remove ghc 9.7 requirement

- - - - -
fb629e24 by amesgen at 2024-06-14T00:28:20-04:00
compiler: refactor lower_CmmExpr_Ptr

- - - - -
def46c8c by amesgen at 2024-06-14T00:28:20-04:00
compiler: handle CmmRegOff in lower_CmmExpr_Ptr

- - - - -
ce76bf78 by Simon Peyton Jones at 2024-06-14T00:28:56-04:00
Small documentation update in Quick Look

- - - - -
7677c545 by Fabricio de Sousa Nascimento at 2024-06-14T08:42:55+00:00
compiler: Rejects RULES whose LHS immediately fails to type-check

Fixes GHC crashing on `decomposeRuleLhs` due to ignoring coercion values. This
happens when we have a RULE that does not type check, and enable
`-fdefer-type-errors`. We prevent this to happen by rejecting RULES with an
immediately LHS type error.

Fixes #24026

- - - - -


21 changed files:

- compiler/GHC/CmmToAsm/Wasm/FromCmm.hs
- compiler/GHC/HsToCore/Binds.hs
- compiler/GHC/Tc/Gen/App.hs
- compiler/GHC/Tc/Gen/Rule.hs
- libraries/ghc-internal/src/GHC/Internal/Unicode/Char/DerivedCoreProperties.hs
- libraries/ghc-internal/src/GHC/Internal/Unicode/Char/UnicodeData/GeneralCategory.hs
- libraries/ghc-internal/src/GHC/Internal/Unicode/Char/UnicodeData/SimpleLowerCaseMapping.hs
- libraries/ghc-internal/src/GHC/Internal/Unicode/Char/UnicodeData/SimpleTitleCaseMapping.hs
- libraries/ghc-internal/src/GHC/Internal/Unicode/Char/UnicodeData/SimpleUpperCaseMapping.hs
- libraries/ghc-internal/src/GHC/Internal/Unicode/Version.hs
- − libraries/ghc-internal/tools/ucd2haskell/exe/Parser/Text.hs
- libraries/ghc-internal/tools/ucd2haskell/exe/UCD2Haskell.hs
- + libraries/ghc-internal/tools/ucd2haskell/exe/UCD2Haskell/ModuleGenerators.hs
- libraries/ghc-internal/tools/ucd2haskell/ucd2haskell.cabal
- + testsuite/tests/typecheck/T24026/T24026a.hs
- + testsuite/tests/typecheck/T24026/T24026a.stderr
- + testsuite/tests/typecheck/T24026/T24026b.hs
- + testsuite/tests/typecheck/T24026/T24026b.stderr
- + testsuite/tests/typecheck/T24026/all.T
- utils/haddock/CONTRIBUTING.md
- utils/haddock/cabal.project


Changes:

=====================================
compiler/GHC/CmmToAsm/Wasm/FromCmm.hs
=====================================
@@ -1045,22 +1045,16 @@ lower_CmmExpr_Typed lbl ty expr = do
 lower_CmmExpr_Ptr :: CLabel -> CmmExpr -> WasmCodeGenM w (WasmExpr w w, Int)
 lower_CmmExpr_Ptr lbl ptr = do
   ty_word <- wasmWordTypeM
-  case ptr of
-    CmmLit (CmmLabelOff lbl o)
-      | o >= 0 -> do
-          instrs <-
-            lower_CmmExpr_Typed
-              lbl
-              ty_word
-              (CmmLit $ CmmLabel lbl)
-          pure (instrs, o)
-    CmmMachOp (MO_Add _) [base, CmmLit (CmmInt o _)]
-      | o >= 0 -> do
-          instrs <- lower_CmmExpr_Typed lbl ty_word base
-          pure (instrs, fromInteger o)
-    _ -> do
-      instrs <- lower_CmmExpr_Typed lbl ty_word ptr
-      pure (instrs, 0)
+  let (ptr', o) = case ptr of
+        CmmLit (CmmLabelOff lbl o)
+          | o >= 0 -> (CmmLit $ CmmLabel lbl, o)
+        CmmRegOff reg o
+          | o >= 0 -> (CmmReg reg, o)
+        CmmMachOp (MO_Add _) [base, CmmLit (CmmInt o _)]
+          | o >= 0 -> (base, fromInteger o)
+        _ -> (ptr, 0)
+  instrs <- lower_CmmExpr_Typed lbl ty_word ptr'
+  pure (instrs, o)
 
 -- | Push a series of values onto the wasm value stack, returning the
 -- result stack type.


=====================================
compiler/GHC/HsToCore/Binds.hs
=====================================
@@ -1031,9 +1031,12 @@ decomposeRuleLhs dflags orig_bndrs orig_lhs rhs_fvs
                 extra_bndrs = scopedSort extra_tvs ++ extra_dicts
                   where
                     extra_tvs   = [ v | v <- extra_vars, isTyVar v ]
+
+                -- isEvVar: this includes coercions, matching what
+                --          happens in `split_lets` (isDictId, isCoVar)
                 extra_dicts =
-                  [ mkLocalId (localiseName (idName d)) ManyTy (idType d)
-                  | d <- extra_vars, isDictId d ]
+                  [ mkLocalIdOrCoVar (localiseName (idName d)) ManyTy (idType d)
+                    | d <- extra_vars, isEvVar d ]
                 extra_vars  =
                   [ v
                   | v <- exprsFreeVarsList args


=====================================
compiler/GHC/Tc/Gen/App.hs
=====================================
@@ -271,6 +271,12 @@ tcApp works like this:
    Otherwise, delegate back to tcExpr, which
      infers an (instantiated) TcRhoType
 
+   This isn't perfect. Consider this (which uses visible type application):
+    (let { f :: forall a. a -> a; f x = x } in f) @Int
+   Since 'let' is not among the special cases for tcInferAppHead,
+   we'll delegate back to tcExpr, which will instantiate f's type
+   and the type application to @Int will fail.  Too bad!
+
 3. Use tcInstFun to instantiate the function, Quick-Looking as we go.  This
    implements the |-inst judgement in Fig 4, plus the modification in Fig 5, of
    the QL paper: "A quick look at impredicativity" (ICFP'20).
@@ -325,16 +331,15 @@ application; but it also does a couple of gruesome final checks:
   * Horrible newtype check
   * Special case for tagToEnum
 
-
-Some cases that /won't/ work:
-
-1. Consider this (which uses visible type application):
-
-    (let { f :: forall a. a -> a; f x = x } in f) @Int
-
-   Since 'let' is not among the special cases for tcInferAppHead,
-   we'll delegate back to tcExpr, which will instantiate f's type
-   and the type application to @Int will fail.  Too bad!
+(TCAPP2) There is a lurking difficulty in the above plan:
+  * Before calling tcInstFun, we set the ambient level in the monad
+    to QLInstVar (Step 2 above).
+  * Then, when kind-checking the visible type args of the application,
+    we may perhaps build an implication constraint.
+  * That means we'll try to add 1 to the ambient level; which is a no-op.
+  * So skolem escape checks won't work right.
+  This is pretty exotic, so I'm just deferring it for now, leaving
+  this note to alert you to the possiblity.
 
 Note [Quick Look for particular Ids]
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -401,7 +406,7 @@ tcApp rn_expr exp_res_ty
        -- Step 3: Instantiate the function type (taking a quick look at args)
        ; do_ql <- wantQuickLook rn_fun
        ; (inst_args, app_res_rho)
-              <- setQLInstLevel do_ql $  -- See (TCAPP1) in
+              <- setQLInstLevel do_ql $  -- See (TCAPP1) and (TCAPP2) in
                                          -- Note [tcApp: typechecking applications]
                  tcInstFun do_ql True tc_head fun_sigma rn_args
 
@@ -2008,6 +2013,8 @@ That is the entire point of qlUnify!   Wrinkles:
   discard the constraints and the coercion, and do not update the instantiation
   variable.  But see "Sadly discarded design alternative" below.)
 
+  See also (TCAPP2) in Note [tcApp: typechecking applications].
+
 (UQL3) Instantiation variables don't really have a settled level yet;
   they have level QLInstVar (see Note [The QLInstVar TcLevel] in GHC.Tc.Utils.TcType.
   You might worry that we might unify


=====================================
compiler/GHC/Tc/Gen/Rule.hs
=====================================
@@ -108,11 +108,13 @@ tcRules decls = mapM (wrapLocMA tcRuleDecls) decls
 tcRuleDecls :: RuleDecls GhcRn -> TcM (RuleDecls GhcTc)
 tcRuleDecls (HsRules { rds_ext = src
                      , rds_rules = decls })
-   = do { tc_decls <- mapM (wrapLocMA tcRule) decls
+   = do { maybe_tc_decls <- mapM (wrapLocMA tcRule) decls
+        ; let tc_decls = [L loc rule | (L loc (Just rule)) <- maybe_tc_decls]
         ; return $ HsRules { rds_ext   = src
                            , rds_rules = tc_decls } }
 
-tcRule :: RuleDecl GhcRn -> TcM (RuleDecl GhcTc)
+
+tcRule :: RuleDecl GhcRn -> TcM (Maybe (RuleDecl GhcTc))
 tcRule (HsRule { rd_ext  = ext
                , rd_name = rname@(L _ name)
                , rd_act  = act
@@ -181,7 +183,17 @@ tcRule (HsRule { rd_ext  = ext
        ; (rhs_implic, rhs_binds) <- buildImplicationFor tc_lvl (getSkolemInfo skol_info) qtkvs
                                          lhs_evs rhs_wanted
        ; emitImplications (lhs_implic `unionBags` rhs_implic)
-       ; return $ HsRule { rd_ext = ext
+
+       -- A type error on the LHS of a rule will be reported earlier while solving for
+       -- lhs_implic. However, we should also drop the rule entirely for cases where
+       -- compilation continues regardless of the error. For example with
+       -- `-fdefer-type-errors`, where this ill-typed LHS rule may cause follow-on errors
+       -- (#24026).
+       ; if anyBag insolubleImplic lhs_implic
+        then
+          return Nothing -- The RULE LHS does not type-check and will be dropped.
+        else
+          return . Just $ HsRule { rd_ext = ext
                          , rd_name = rname
                          , rd_act = act
                          , rd_tyvs = ty_bndrs -- preserved for ppr-ing


=====================================
libraries/ghc-internal/src/GHC/Internal/Unicode/Char/DerivedCoreProperties.hs
=====================================
@@ -8,9 +8,8 @@
 -----------------------------------------------------------------------------
 -- |
 -- Module      : GHC.Internal.Unicode.Char.DerivedCoreProperties
--- Copyright   : (c) 2020 Composewell Technologies and Contributors
 -- License     : BSD-3-Clause
--- Maintainer  : streamly at composewell.com
+-- Maintainer  : The GHC Developers <ghc-devs at haskell.org>
 -- Stability   : internal
 -----------------------------------------------------------------------------
 


=====================================
libraries/ghc-internal/src/GHC/Internal/Unicode/Char/UnicodeData/GeneralCategory.hs
=====================================
@@ -8,9 +8,8 @@
 -----------------------------------------------------------------------------
 -- |
 -- Module      : GHC.Internal.Unicode.Char.UnicodeData.GeneralCategory
--- Copyright   : (c) 2020 Composewell Technologies and Contributors
 -- License     : BSD-3-Clause
--- Maintainer  : streamly at composewell.com
+-- Maintainer  : The GHC Developers <ghc-devs at haskell.org>
 -- Stability   : internal
 -----------------------------------------------------------------------------
 


=====================================
libraries/ghc-internal/src/GHC/Internal/Unicode/Char/UnicodeData/SimpleLowerCaseMapping.hs
=====================================
@@ -7,9 +7,8 @@
 -----------------------------------------------------------------------------
 -- |
 -- Module      : GHC.Internal.Unicode.Char.UnicodeData.SimpleLowerCaseMapping
--- Copyright   : (c) 2020 Composewell Technologies and Contributors
 -- License     : BSD-3-Clause
--- Maintainer  : streamly at composewell.com
+-- Maintainer  : The GHC Developers <ghc-devs at haskell.org>
 -- Stability   : internal
 -----------------------------------------------------------------------------
 


=====================================
libraries/ghc-internal/src/GHC/Internal/Unicode/Char/UnicodeData/SimpleTitleCaseMapping.hs
=====================================
@@ -7,9 +7,8 @@
 -----------------------------------------------------------------------------
 -- |
 -- Module      : GHC.Internal.Unicode.Char.UnicodeData.SimpleTitleCaseMapping
--- Copyright   : (c) 2020 Composewell Technologies and Contributors
 -- License     : BSD-3-Clause
--- Maintainer  : streamly at composewell.com
+-- Maintainer  : The GHC Developers <ghc-devs at haskell.org>
 -- Stability   : internal
 -----------------------------------------------------------------------------
 


=====================================
libraries/ghc-internal/src/GHC/Internal/Unicode/Char/UnicodeData/SimpleUpperCaseMapping.hs
=====================================
@@ -7,9 +7,8 @@
 -----------------------------------------------------------------------------
 -- |
 -- Module      : GHC.Internal.Unicode.Char.UnicodeData.SimpleUpperCaseMapping
--- Copyright   : (c) 2020 Composewell Technologies and Contributors
 -- License     : BSD-3-Clause
--- Maintainer  : streamly at composewell.com
+-- Maintainer  : The GHC Developers <ghc-devs at haskell.org>
 -- Stability   : internal
 -----------------------------------------------------------------------------
 


=====================================
libraries/ghc-internal/src/GHC/Internal/Unicode/Version.hs
=====================================
@@ -6,9 +6,8 @@
 -----------------------------------------------------------------------------
 -- |
 -- Module      : GHC.Internal.Unicode.Version
--- Copyright   : (c) 2020 Composewell Technologies and Contributors
 -- License     : BSD-3-Clause
--- Maintainer  : streamly at composewell.com
+-- Maintainer  : The GHC Developers <ghc-devs at haskell.org>
 -- Stability   : internal
 -----------------------------------------------------------------------------
 


=====================================
libraries/ghc-internal/tools/ucd2haskell/exe/Parser/Text.hs deleted
=====================================
@@ -1,1127 +0,0 @@
-{-# LANGUAGE LambdaCase #-}
-
--- |
--- Module      : Parser.Text
--- Copyright   : (c) 2020 Composewell Technologies and Contributors
---               (c) 2016-2017 Harendra Kumar
---               (c) 2014-2015 Antonio Nikishaev
--- License     : BSD-3-Clause
--- Maintainer  : streamly at composewell.com
--- Stability   : internal
-
--- This code was taken from https://github.com/composewell/unicode-data.
--- The original Unicode database parser was taken from
--- https://github.com/composewell/unicode-transforms but was completely
--- rewritten from scratch to parse from UCD text files instead of XML, only
--- some types remain the same. That code in turn was originally taken from
--- https://github.com/llelf/prose (Antonio Nikishaev) and heavily modified by
--- Harendra Kumar.
---
-module Parser.Text (genModules) where
-
-import Control.Exception (catch, IOException)
-import Control.Monad (void)
-import Data.Bits (Bits(..))
-import Data.Word (Word8)
-import Data.Char (chr, ord, isSpace)
-import Data.Functor ((<&>))
-import Data.Function ((&))
-import Data.List (intersperse, unfoldr)
-import Data.List.Split (splitWhen)
-import Numeric (showHex)
-import Streamly.Data.Fold (Fold)
-import System.Directory (createDirectoryIfMissing)
-import System.Environment (getEnv)
-import System.FilePath ((</>), (<.>))
-
--- import qualified Data.Set as Set
-import Streamly.Data.Stream (Stream)
-import qualified Streamly.Data.Stream.Prelude as Stream
-import qualified Streamly.Data.Fold as Fold
-import qualified Streamly.Internal.Data.Fold as Fold
-import qualified Streamly.Data.Unfold as Unfold
-import qualified Streamly.FileSystem.Handle as Handle
-import qualified Streamly.Unicode.Stream as Unicode
-import qualified Streamly.Internal.Unicode.Stream as Unicode
-import qualified System.IO as Sys
-
-import Prelude hiding (pred)
-
--------------------------------------------------------------------------------
--- Types
--------------------------------------------------------------------------------
-
-data GeneralCategory =
-    Lu|Ll|Lt|             --LC
-    Lm|Lo|                --L
-    Mn|Mc|Me|             --M
-    Nd|Nl|No|             --N
-    Pc|Pd|Ps|Pe|Pi|Pf|Po| --P
-    Sm|Sc|Sk|So|          --S
-    Zs|Zl|Zp|             --Z
-    Cc|Cf|Cs|Co|Cn        --C
-    deriving (Show, Bounded, Enum, Read)
-
-data DecompType =
-       DTCanonical | DTCompat  | DTFont
-     | DTNoBreak   | DTInitial | DTMedial   | DTFinal
-     | DTIsolated  | DTCircle  | DTSuper    | DTSub
-     | DTVertical  | DTWide    | DTNarrow
-     | DTSmall     | DTSquare  | DTFraction
-    deriving (Show, Eq)
-
-data Decomp = DCSelf | DC [Char] deriving (Show, Eq)
-
--- data DType = Canonical | Kompat
-
-data DetailedChar =
-    DetailedChar
-        { _char :: Char
-        , _name :: String
-        , _generalCategory :: GeneralCategory
-        , _combiningClass :: Int
-        , _decompositionType :: Maybe DecompType
-        , _decomposition :: Decomp
-        , _simpleUppercaseMapping :: Maybe Char
-        , _simpleLowercaseMapping :: Maybe Char
-        , _simpleTitlecaseMapping :: Maybe Char
-        }
-    deriving (Show)
-
-{- [NOTE] Used by disabled generator
-
--- See: https://www.unicode.org/reports/tr44/#Default_Values_Table
-mkDefaultDetailedChar :: Char -> DetailedChar
-mkDefaultDetailedChar c = DetailedChar
-  { _char = c
-  , _name = mempty
-  , _generalCategory = Cn
-  , _combiningClass = 0
-  , _decompositionType = Nothing
-  , _decomposition = DCSelf
-  , _simpleUppercaseMapping = Nothing
-  , _simpleLowercaseMapping = Nothing
-  , _simpleTitlecaseMapping = Nothing }
--}
-
--------------------------------------------------------------------------------
--- Helpers
--------------------------------------------------------------------------------
-
-headerRule :: String
-headerRule = "-----------------------------------------------------------------------------"
-
-mkModuleHeader :: String -> String
-mkModuleHeader modName =
-    unlines
-        [ headerRule
-        , "-- |"
-        , "-- Module      : " <> modName
-        , "-- Copyright   : (c) 2020 Composewell Technologies and Contributors"
-        , "-- License     : BSD-3-Clause"
-        -- [FIXME] Update maintainer
-        , "-- Maintainer  : streamly at composewell.com"
-        , "-- Stability   : internal"
-        , headerRule
-        ]
-
-readCodePoint :: String -> Char
-readCodePoint = chr . read . ("0x"<>)
-
-readCodePointM :: String -> Maybe Char
-readCodePointM "" = Nothing
-readCodePointM u  = Just (readCodePoint u)
-
-genSignature :: String -> String
-genSignature = (<> " :: Char -> Bool")
-
--- | Check that var is between minimum and maximum of orderList
-genRangeCheck :: String -> [Int] -> String
-genRangeCheck var ordList =
-    var
-        <> " >= "
-        <> show (minimum ordList)
-        <> " && " <> var <> " <= " <> show (maximum ordList)
-
-genBitmap :: String -> [Int] -> String
-genBitmap funcName ordList =
-    unlines
-        [ "{-# INLINE " <> funcName <> " #-}"
-        , genSignature funcName
-        , funcName <> " = \\c -> let n = ord c in "
-              <> genRangeCheck "n" ordList <> " && lookupBit64 bitmap# n"
-        , "  where"
-        , "    bitmap# = \"" <> bitMapToAddrLiteral (positionsToBitMap ordList) "\"#"
-        ]
-
-positionsToBitMap :: [Int] -> [Bool]
-positionsToBitMap = go 0
-
-    where
-
-    go _ [] = []
-    go i xxs@(x:xs)
-        | i < x = False : go (i + 1) xxs
-        | otherwise = True : go (i + 1) xs
-
-bitMapToAddrLiteral ::
-  -- | Values to encode
-  [Bool] ->
-  -- | String to append
-  String ->
-  String
-bitMapToAddrLiteral bs cs = foldr encode cs (unfoldr mkChunks bs)
-
-    where
-
-    mkChunks :: [a] -> Maybe ([a], [a])
-    mkChunks [] = Nothing
-    mkChunks xs = Just $ splitAt 8 xs
-
-    encode :: [Bool] -> String -> String
-    encode chunk acc = '\\' : shows (toByte (padTo8 chunk)) acc
-
-    padTo8 :: [Bool] -> [Bool]
-    padTo8 xs
-        | length xs >= 8 = xs
-        | otherwise = xs <> replicate (8 - length xs) False
-
-    toByte :: [Bool] -> Int
-    toByte xs = sum $ map (\i -> if xs !! i then 1 `shiftL` i else 0) [0..7]
-
-genEnumBitmap ::
-  forall a. (Bounded a, Enum a, Show a) =>
-  -- | Function name
-  String ->
-  -- | Default value
-  a ->
-  -- | List of values to encode
-  [a] ->
-  String
-genEnumBitmap funcName def as = unlines
-    [ "{-# INLINE " <> funcName <> " #-}"
-    , funcName <> " :: Char -> Int"
-    , funcName <> " c = let n = ord c in if n >= "
-               <> show (length as)
-               <> " then "
-               <> show (fromEnum def)
-               <> " else lookup_bitmap n"
-
-    , "{-# NOINLINE lookup_bitmap #-}"
-    , "lookup_bitmap :: Int -> Int"
-    , "lookup_bitmap n = lookupIntN bitmap# n"
-    , "  where"
-    , "    bitmap# = \"" <> enumMapToAddrLiteral as "\"#"
-    ]
-
-{-| Encode a list of values as a byte map, using their 'Enum' instance.
-
-__Note:__ 'Enum' instance must respect the following:
-
-* @fromEnum minBound >= 0x00@
-* @fromEnum maxBound <= 0xff@
--}
-enumMapToAddrLiteral ::
-  forall a. (Bounded a, Enum a, Show a) =>
-  -- | Values to encode
-  [a] ->
-  -- | String to append
-  String ->
-  String
-enumMapToAddrLiteral xs cs = foldr go cs xs
-
-    where
-
-    go :: a -> String -> String
-    go x acc = '\\' : shows (toWord8 x) acc
-
-    toWord8 :: a -> Word8
-    toWord8 a = let w = fromEnum a in if 0 <= w && w <= 0xff
-        then fromIntegral w
-        else error $ "Cannot convert to Word8: " <> show a
-
-{- [NOTE] Disabled generator (normalization)
--- This bit of code is duplicated but this duplication allows us to reduce 2
--- dependencies on the executable.
-
-jamoLCount :: Int
-jamoLCount = 19
-
-jamoVCount :: Int
-jamoVCount = 21
-
-jamoTCount :: Int
-jamoTCount = 28
-
-hangulFirst :: Int
-hangulFirst = 0xac00
-
-hangulLast :: Int
-hangulLast = hangulFirst + jamoLCount * jamoVCount * jamoTCount - 1
-
-isHangul :: Char -> Bool
-isHangul c = n >= hangulFirst && n <= hangulLast
-    where n = ord c
--}
-
-genUnicodeVersion :: FilePath -> IO ()
-genUnicodeVersion outdir = do
-  version <- catch
-              (getEnv "UNICODE_VERSION")
-              (\(_ :: IOException) -> return "<unknown>")
-  Stream.fold f (Stream.fromList (body version))
-  where
-    moduleName = "GHC.Internal.Unicode.Version"
-    f = moduleFileEmitter Nothing outdir
-          (moduleName, \_ -> Fold.foldMap (<> "\n"))
-    body :: String -> [String]
-    body version =
-      [ "{-# LANGUAGE NoImplicitPrelude #-}"
-      , "{-# OPTIONS_HADDOCK hide #-}"
-      , ""
-      , mkModuleHeader moduleName
-      , "module " <> moduleName
-      , "(unicodeVersion)"
-      , "where"
-      , ""
-      , "import {-# SOURCE #-} GHC.Internal.Data.Version"
-      , ""
-      , "-- | Version of Unicode standard used by @base@:"
-      , "-- [" <> version <> "](https://www.unicode.org/versions/Unicode" <> version <> "/)."
-      , "--"
-      , "-- @since base-4.15.0.0"
-      , "unicodeVersion :: Version"
-      , "unicodeVersion = makeVersion [" <> mkVersion version <> "]" ]
-    mkVersion = foldr (\c acc -> case c of {'.' -> ',':' ':acc; _ -> c:acc}) mempty
-
--------------------------------------------------------------------------------
--- Parsers
--------------------------------------------------------------------------------
-
--------------------------------------------------------------------------------
--- Parsing UnicodeData.txt
--------------------------------------------------------------------------------
-
-genGeneralCategoryModule
-    :: Monad m
-    => String
-    -> Fold m DetailedChar String
-genGeneralCategoryModule moduleName =
-    done <$> Fold.foldl' step initial
-
-    where
-
-    -- (categories, expected char)
-    initial = ([], '\0')
-
-    step (acc, p) a = if p < _char a
-        -- Fill missing char entry with default category Cn
-        -- See: https://www.unicode.org/reports/tr44/#Default_Values_Table
-        then step (Cn : acc, succ p) a
-        -- Regular entry
-        else (_generalCategory a : acc, succ (_char a))
-
-    done (acc, _) = unlines
-        [ "{-# LANGUAGE NoImplicitPrelude #-}"
-        , "{-# LANGUAGE MagicHash #-}"
-        , "{-# OPTIONS_HADDOCK hide #-}"
-        , ""
-        , mkModuleHeader moduleName
-        , "module " <> moduleName
-        , "(generalCategory)"
-        , "where"
-        , ""
-        , "import GHC.Internal.Base (Char, Int, Ord(..), ord)"
-        , "import GHC.Internal.Unicode.Bits (lookupIntN)"
-        , ""
-        , genEnumBitmap "generalCategory" Cn (reverse acc)
-        ]
-
-readDecomp :: String -> (Maybe DecompType, Decomp)
-readDecomp s =
-    if null wrds
-    then (Nothing, DCSelf)
-    else decmps wrds
-
-    where
-
-    decmps [] = error "Unreachable flow point"
-    decmps y@(x:xs) =
-        case dtmap x of
-            DTCanonical -> (,) (Just DTCanonical) (readCP y)
-            other -> (,) (Just other) (readCP xs)
-
-    wrds = words s
-
-    readCP ws = DC $ map readCodePoint ws
-
-    dtmap "<compat>" = DTCompat
-    dtmap "<circle>" = DTCircle
-    dtmap "<final>" = DTFinal
-    dtmap "<font>" = DTFont
-    dtmap "<fraction>" = DTFraction
-    dtmap "<initial>" = DTInitial
-    dtmap "<isolated>" = DTIsolated
-    dtmap "<medial>" = DTMedial
-    dtmap "<narrow>" = DTNarrow
-    dtmap "<noBreak>" = DTNoBreak
-    dtmap "<small>" = DTSmall
-    dtmap "<square>" = DTSquare
-    dtmap "<sub>" = DTSub
-    dtmap "<super>" = DTSuper
-    dtmap "<vertical>" = DTVertical
-    dtmap "<wide>" = DTWide
-    dtmap _ = DTCanonical
-
-{- [NOTE] Disabled generators
-
-filterNonHangul :: Monad m => Fold m DetailedChar a -> Fold m DetailedChar a
-filterNonHangul = Fold.filter (not . isHangul . _char)
-
-filterDecomposableType ::
-       Monad m => DType -> Fold m DetailedChar a -> Fold m DetailedChar a
-filterDecomposableType dtype =
-    Fold.filter ((/= DCSelf) . _decomposition)
-        . Fold.filter (predicate . _decompositionType)
-
-    where
-
-    predicate =
-        case dtype of
-            Canonical -> (== Just DTCanonical)
-            Kompat -> const True
-
-genDecomposableModule ::
-       Monad m => String -> DType -> Fold m DetailedChar String
-genDecomposableModule moduleName dtype =
-    filterNonHangul
-        $ filterDecomposableType dtype $ done <$> Fold.foldl' step initial
-
-    where
-
-    initial = []
-
-    step st a = ord (_char a) : st
-
-    done st =
-        unlines
-            [ "{-# LANGUAGE NoImplicitPrelude #-}"
-            , "{-# LANGUAGE MagicHash #-}"
-            , "{-# OPTIONS_HADDOCK hide #-}"
-            , ""
-            , mkModuleHeader moduleName
-            , "module " <> moduleName
-            , "(isDecomposable)"
-            , "where"
-            , ""
-            , "import Data.Char (ord)"
-            , "import GHC.Internal.Unicode.Bits (lookupBit64)"
-            , ""
-            , genBitmap "isDecomposable" (reverse st)
-            ]
-
-genCombiningClassModule :: Monad m => String -> Fold m DetailedChar String
-genCombiningClassModule moduleName =
-    Fold.filter (\dc -> _combiningClass dc /= 0)
-        $ done <$> Fold.foldl' step initial
-
-    where
-
-    initial = ([], [])
-
-    step (st1, st2) a = (genCombiningClassDef a : st1, ord (_char a) : st2)
-
-    done (st1, st2) =
-        unlines
-            [ "{-# LANGUAGE NoImplicitPrelude #-}"
-            , "{-# LANGUAGE MagicHash #-}"
-            , "{-# OPTIONS_HADDOCK hide #-}"
-            , ""
-            , mkModuleHeader moduleName
-            , "module " <> moduleName
-            , "(combiningClass, isCombining)"
-            , "where"
-            , ""
-            , "import Data.Char (ord)"
-            , "import GHC.Internal.Unicode.Bits (lookupBit64)"
-            , ""
-            , "combiningClass :: Char -> Int"
-            , unlines (reverse st1)
-            , "combiningClass _ = 0\n"
-            , ""
-            , genBitmap "isCombining" (reverse st2)
-            ]
-
-    genCombiningClassDef dc =
-        "combiningClass "
-            <> show (_char dc) <> " = " <> show (_combiningClass dc)
-
-genDecomposeDefModule ::
-       Monad m
-    => String
-    -> [String]
-    -> [String]
-    -> DType
-    -> (Int -> Bool)
-    -> Fold m DetailedChar String
-genDecomposeDefModule moduleName before after dtype pred =
-    Fold.filter (pred . ord . _char)
-        $ filterNonHangul
-        $ filterDecomposableType dtype $ done <$> Fold.foldl' step initial
-
-    where
-
-    decomposeChar c DCSelf = [c]
-    decomposeChar _c (DC ds) = ds
-
-    genHeader =
-        [ "{-# OPTIONS_GHC -fno-warn-incomplete-patterns #-}"
-        , ""
-        , mkModuleHeader moduleName
-        , "module " <> moduleName
-        , "(decompose)"
-        , "where"
-        , ""
-        ]
-    genSign =
-        [ "-- Note: this is a partial function we do not expect to call"
-        , "-- this if isDecomposable returns false."
-        , "{-# NOINLINE decompose #-}"
-        , "decompose :: Char -> [Char]"
-        ]
-    initial = []
-
-    step st dc = genDecomposeDef dc : st
-
-    done st =
-        let body = mconcat [genHeader, before, genSign, reverse st, after]
-        in unlines body
-
-    genDecomposeDef dc =
-        "decompose "
-            <> show (_char dc)
-            <> " = " <> show (decomposeChar (_char dc) (_decomposition dc))
-
-genCompositionsModule ::
-       Monad m
-    => String
-    -> [Int]
-    -> [Int]
-    -> Fold m DetailedChar String
-genCompositionsModule moduleName compExclu non0CC =
-    Fold.filter (not . flip elem compExclu . ord . _char)
-        $ filterNonHangul
-        $ Fold.filter (isDecompositionLen2 . _decomposition)
-        $ filterDecomposableType Canonical $ done <$> Fold.foldl' step initial
-
-    where
-
-    isDecompositionLen2 DCSelf = False
-    isDecompositionLen2 (DC ds) = length ds == 2
-
-    genComposePairDef name dc =
-        name
-            <> " "
-            <> show (head d01)
-            <> " " <> show (d01 !! 1) <> " = Just " <> show (_char dc)
-
-        where
-
-        d01 = decompPair dc
-
-    decompPair dc =
-        case _decomposition dc of
-            DCSelf -> error "toCompFormat: DCSelf"
-            (DC ds) ->
-                if length ds == 2
-                then ds
-                else error "toCompFormat: length /= 2"
-
-    initial = ([], [], [])
-
-    step (dec, sp, ss) dc = (dec1, sp1, ss1)
-
-        where
-
-        d01 = decompPair dc
-        d1Ord = ord $ d01 !! 1
-        dec1 = genComposePairDef "compose" dc : dec
-        sp1 =
-            if d1Ord `notElem` non0CC
-            then genComposePairDef "composeStarters" dc : sp
-            else sp
-        ss1 =
-            if d1Ord `notElem` non0CC
-            then d1Ord : ss
-            else ss
-
-    header =
-        [ "{-# LANGUAGE NoImplicitPrelude #-}"
-        , "{-# LANGUAGE MagicHash #-}"
-        , "{-# OPTIONS_GHC -fno-warn-incomplete-patterns #-}"
-        , "{-# OPTIONS_HADDOCK hide #-}"
-        , ""
-        , mkModuleHeader moduleName
-        , "module " <> moduleName
-        , "(compose, composeStarters, isSecondStarter)"
-        , "where"
-        , ""
-        , "import GHC.Internal.Base (Char, ord)"
-        , "import GHC.Internal.Unicode.Bits (lookupBit64)"
-        , ""
-        ]
-
-    composePair decomps =
-        [ "{-# NOINLINE compose #-}"
-        , "compose :: Char -> Char -> Maybe Char"
-        , unlines decomps
-        , "compose _ _ = " <> "Nothing" <> "\n"
-        , ""
-        ]
-
-    composeStarterPair starterPairs =
-        [ "composeStarters :: Char -> Char -> Maybe Char"
-        , unlines starterPairs
-        , "composeStarters _ _ = " <> "Nothing" <> "\n"
-        , ""
-        ]
-
-    isSecondStarter secondStarters =
-        [genBitmap "isSecondStarter" secondStarters]
-
-    done (dec, sp, ss) =
-        unlines
-            $ header
-            <> composePair (reverse dec)
-            <> composeStarterPair (reverse sp)
-            <> isSecondStarter (Set.toList (Set.fromList ss))
--}
-genSimpleCaseMappingModule
-    :: Monad m
-    => String
-    -> String
-    -> (DetailedChar -> Maybe Char)
-    -> Fold m DetailedChar String
-genSimpleCaseMappingModule moduleName funcName field =
-    done <$> Fold.foldl' step initial
-
-    where
-
-    genHeader =
-        [ "{-# LANGUAGE NoImplicitPrelude, LambdaCase #-}"
-        , "{-# OPTIONS_HADDOCK hide #-}"
-        , ""
-        , mkModuleHeader moduleName
-        , "module " <> moduleName
-        , "(" <> funcName <> ")"
-        , "where"
-        , ""
-        , "import GHC.Internal.Base (Char)"
-        , ""
-        ]
-    genSign =
-        [ "{-# NOINLINE " <> funcName <> " #-}"
-        , funcName <> " :: Char -> Char"
-        , funcName <> " = \\case"
-        ]
-    initial = []
-
-    step ds dc = case mkEntry dc of
-        Nothing -> ds
-        Just d  -> d : ds
-
-    after = ["  c -> c"]
-
-    done st =
-        let body = mconcat [genHeader, genSign, reverse st, after]
-        in unlines body
-
-    mkEntry dc = field dc <&> \c -> mconcat
-        [ "  '\\x"
-        , showHexChar (_char dc) "' -> '\\x"
-        , showHexChar c "'"
-        ]
-
-    showHexChar c = showHex (ord c)
-
-genCorePropertiesModule ::
-       Monad m => String -> (String -> Bool) -> Fold m (String, [Int]) String
-genCorePropertiesModule moduleName isProp =
-    Fold.filter (\(name, _) -> isProp name) $ done <$> Fold.foldl' step initial
-
-    where
-
-    prop2FuncName x = "is" <> x
-
-    initial = ([], [])
-
-    step (props, bitmaps) (name, bits) =
-        (name : props, genBitmap (prop2FuncName name) bits : bitmaps)
-
-    done (props, bitmaps) = unlines $ header props <> bitmaps
-
-    header exports =
-        [ "{-# LANGUAGE NoImplicitPrelude #-}"
-        , "{-# LANGUAGE MagicHash #-}"
-        , "{-# OPTIONS_HADDOCK hide #-}"
-        , ""
-        , mkModuleHeader moduleName
-        , "module " <> moduleName
-        , "(" <> unwords (intersperse "," (map prop2FuncName exports)) <> ")"
-        , "where"
-        , ""
-        , "import GHC.Internal.Base (Bool, Char, Ord(..), (&&), ord)"
-        , "import GHC.Internal.Unicode.Bits (lookupBit64)"
-        , ""
-        ]
-
-{- [NOTE] Disabled generator
-genUnicode002TestResults :: Monad m => Fold m DetailedChar String
-genUnicode002TestResults = done <$> Fold.foldl' step initial
-
-    where
-
-    header = "Code C P S U L A D"
-    -- (output, expected char)
-    initial = ([], '\0')
-    -- [TODO] Increase the number of tested char?
-    -- maxChar = '\xF0000' -- First codepoint of the last private use areas.
-    -- maxChar = '\xFFFF' -- Last codepoint of BMP.
-    maxChar = chr 6553 -- Value in GHC 9.2.2 base test
-
-    step (acc, c) dc = if c > maxChar
-      then (acc, c)
-      else if c < _char dc
-        -- Fill missing char entry with default values
-        -- See: https://www.unicode.org/reports/tr44/#Default_Values_Table
-        then step (mkEntry (mkDefaultDetailedChar c) : acc, succ c) dc
-        -- Regular entry
-        else (mkEntry dc : acc, succ (_char dc))
-
-    done (acc, _) = unlines (header : reverse acc)
-
-    mkEntry dc = mconcat
-        [ showCode (_char dc)
-        -- [TODO] General category
-        , showBool (isControl' dc)
-        , showBool (isPrint'   dc)
-        , showBool (isSpace'   dc)
-        -- [TODO] isSeparator
-        , showBool (isUpper'   dc)
-        , showBool (isLower'   dc)
-        , showBool (isAlpha'   dc)
-        -- [TODO] isAlphaNum
-        , showBool (isDigit'   dc)
-        -- [TODO] isNumber
-        -- [TODO] isMark
-        -- [TODO] isPunctuation
-        -- [TODO] isSymbol
-        ]
-
-    padding = length (show (ord maxChar))
-    showCode c = take padding (shows (ord c) (repeat ' '))
-    -- [TODO] use showHex
-    -- showCode c =
-    --   let code = showHex (ord c) mempty
-    --   in replicate (padding - length code) '0' <> code
-    showBool b = if b then " T" else " F"
-
-    -- [NOTE] The following functions replicates Data.Char. Keep them up to date!
-
-    isControl' dc = case _generalCategory dc of
-      Cc -> True   -- Control
-      _  -> False
-
-    isPrint' dc = case _generalCategory dc of
-      Zl -> False -- LineSeparator
-      Zp -> False -- ParagraphSeparator
-      Cc -> False -- Control
-      Cf -> False -- Format
-      Cs -> False -- Surrogate
-      Co -> False -- PrivateUse
-      Cn -> False -- NotAssigned
-      _  -> True
-
-    isSpace' dc = case _char dc of
-      '\t' -> True
-      '\n' -> True
-      '\v' -> True
-      '\f' -> True
-      '\r' -> True
-      _    -> case _generalCategory dc of
-        Zs -> True -- Space
-        _  -> False
-
-    isUpper' dc = case _generalCategory dc of
-      Lu -> True -- UppercaseLetter
-      Lt -> True -- TitlecaseLetter
-      _  -> False
-
-    isLower' dc = case _generalCategory dc of
-      Ll -> True -- LowercaseLetter
-      _  -> False
-
-    isAlpha' dc = case _generalCategory dc of
-      Lu -> True -- UppercaseLetter
-      Ll -> True -- LowercaseLetter
-      Lt -> True -- TitlecaseLetter
-      Lm -> True -- ModifierLetter
-      Lo -> True -- OtherLetter
-      _  -> False
-
-    isDigit' dc = let c = _char dc
-                  in (fromIntegral (ord c - ord '0') :: Word) <= 9
--}
-
--------------------------------------------------------------------------------
--- Parsing property files
--------------------------------------------------------------------------------
-
-type PropertyLine = (String, [Int])
-
-trim :: String -> String
-trim = takeWhile (not . isSpace) . dropWhile isSpace
-
-emptyPropertyLine :: PropertyLine
-emptyPropertyLine = ("", [])
-
-combinePropertyLines :: PropertyLine -> PropertyLine -> PropertyLine
-combinePropertyLines t1@(n1, o1) t2@(n2, o2)
-    | n1 == "" = t2
-    | n2 == "" = t1
-    | n1 == n2 = (n1, o1 <> o2)
-    | otherwise = error $ "Cannot group " <> n1 <> " with " <> n2
-
-parsePropertyLine :: String -> PropertyLine
-parsePropertyLine ln
-    | null ln = emptyPropertyLine
-    | head ln == '#' = emptyPropertyLine
-    | otherwise = parseLineJ ln
-
-    where
-
-    parseLineJ :: String -> (String, [Int])
-    parseLineJ line =
-        let (rangeLn, line1) = span (/= ';') line
-            propLn = takeWhile (/= '#') (tail line1)
-         in (trim propLn, parseRange (trim rangeLn))
-
-    parseRange :: String -> [Int]
-    parseRange rng =
-        if '.' `elem` rng
-        then let low = read $ "0x" <> takeWhile (/= '.') rng
-                 high =
-                     read $ "0x" <> reverse (takeWhile (/= '.') (reverse rng))
-              in [low .. high]
-        else [read $ "0x" <> rng]
-
-isDivider :: String -> Bool
-isDivider x = x == "# ================================================"
-
-parsePropertyLines :: (Monad m) => Stream m String -> Stream m PropertyLine
-parsePropertyLines =
-    Stream.splitOn isDivider
-        $ Fold.lmap parsePropertyLine
-        $ Fold.foldl' combinePropertyLines emptyPropertyLine
-
--- | A range entry in @UnicodeData.txt at .
-data UnicodeDataRange
-    = SingleCode    !DetailedChar
-    -- ^ Regular entry for one code point
-    | FirstCode     !String !DetailedChar
-    -- ^ A partial range for entry with a name as: @\<RANGE_IDENTIFIER, First\>@
-    | CompleteRange !String !DetailedChar !DetailedChar
-    -- ^ A complete range, requiring 2 continuous entries with respective names:
-    --
-    -- * @\<RANGE_IDENTIFIER, First\>@
-    -- * @\<RANGE_IDENTIFIER, Last\>@
-
-{-| Parse UnicodeData.txt lines
-
-Parse ranges according to https://www.unicode.org/reports/tr44/#Code_Point_Ranges.
-
-__Note:__ this does /not/ fill missing char entries,
-i.e. entries with no explicit entry nor within a range.
--}
-parseUnicodeDataLines :: forall m. (Monad m) => Stream m String -> Stream m DetailedChar
-parseUnicodeDataLines
-    = Stream.unfoldMany (Unfold.unfoldr unitToRange)
-    . Stream.foldMany ( Fold.lmap parseDetailedChar
-                      $ Fold.foldt' step initial id)
-
-    where
-
-    step :: Maybe UnicodeDataRange
-         -> DetailedChar
-         -> Fold.Step (Maybe UnicodeDataRange) (Maybe UnicodeDataRange)
-    step Nothing dc = case span (/= ',') (_name dc) of
-        (range, ", First>") -> Fold.Partial (Just (FirstCode range dc))
-        _                   -> Fold.Done (Just (SingleCode dc))
-    step (Just (FirstCode range1 dc1)) dc2 = case span (/= ',') (_name dc2) of
-        (range2, ", Last>") -> if range1 == range2 && _char dc1 < _char dc2
-            then Fold.Done (Just (CompleteRange range1 dc1 dc2))
-            else error $ "Cannot create range: incompatible ranges" <> show (dc1, dc2)
-        _ -> error $ "Cannot create range: missing <range, Last> entry correspong to: " <> show range1
-    step _ _ = error "impossible case"
-
-    initial :: Fold.Step (Maybe UnicodeDataRange) (Maybe UnicodeDataRange)
-    initial = Fold.Partial Nothing
-
-    unitToRange :: Maybe UnicodeDataRange -> Maybe (DetailedChar, Maybe UnicodeDataRange)
-    unitToRange = fmap $ \case
-        SingleCode          dc      -> (dc, Nothing)
-        FirstCode     _     dc      -> error $ "Incomplete range: " <> show dc
-        CompleteRange range dc1 dc2 -> if _char dc1 < _char dc2
-            -- [TODO] Create the proper name
-            then (dc1{_name="TODO"}, Just (CompleteRange range dc1{_char=succ (_char dc1)} dc2))
-            else (dc2{_name="TODO"}, Nothing)
-
--- | Parse a single entry of @UnicodeData.txt@
-parseDetailedChar :: String -> DetailedChar
-parseDetailedChar line = case splitWhen (== ';') line of
-  char
-    :name
-    :gc
-    :combining
-    :_bidi
-    :decomposition
-    :_decimal
-    :_digit
-    :_numeric
-    :_bidiM
-    :_uni1Name
-    :_iso
-    :sUpper
-    :sLower
-    :sTitle
-    :_ ->
-      let (dctype, dcval) = readDecomp decomposition
-      in DetailedChar
-          { _char = readCodePoint char
-          , _name = name
-          , _generalCategory = read gc
-          , _combiningClass = read combining
-          , _decompositionType = dctype
-          , _decomposition = dcval
-          , _simpleUppercaseMapping = readCodePointM sUpper
-          , _simpleLowercaseMapping = readCodePointM sLower
-          , _simpleTitlecaseMapping = readCodePointM sTitle
-          }
-  _ -> error ("Unsupported line: " <> line)
-
--------------------------------------------------------------------------------
--- Generation
--------------------------------------------------------------------------------
-
-readLinesFromFile :: String -> Stream IO String
-readLinesFromFile file =
-    withFile file Sys.ReadMode
-        $ \h -> Handle.read h & Unicode.decodeUtf8 & Unicode.lines Fold.toList
-
-    where
-    withFile file_ mode =
-        Stream.bracketIO (Sys.openFile file_ mode) (Sys.hClose)
-
-
-moduleToFileName :: String -> String
-moduleToFileName = map (\x -> if x == '.' then '/' else x)
-
-dirFromFileName :: String -> String
-dirFromFileName = reverse . dropWhile (/= '/') . reverse
-
-data FileRecipe a
-    = ModuleRecipe
-      -- ^ A recipe to create a Haskell module file.
-        String
-        -- ^ Module name
-        (String -> Fold IO a String)
-        -- ^ Function that generate the module, given the module name.
-    | TestOutputRecipe
-      -- ^ A recipe to create a test output file.
-        String
-        -- ^ Test name
-        (Fold IO a String)
-        -- ^ Test output generator
-
--- ModuleRecipe is a tuple of the module name and a function that generates the
--- module using the module name
-type ModuleRecipe a = (String, String -> Fold IO a String)
-type TestOutputRecipe a = (FilePath, Fold IO a String)
-
--- GeneratorRecipe is a list of ModuleRecipe
-type GeneratorRecipe a = [FileRecipe a]
-
-moduleFileEmitter :: Maybe FilePath -> FilePath -> ModuleRecipe a -> Fold IO a ()
-moduleFileEmitter mfile outdir (modName, fldGen) = Fold.rmapM action $ fldGen modName
-
-    where
-
-    pretext version = case mfile of
-      Just file -> mconcat
-        [ "-- DO NOT EDIT: This file is automatically generated by the internal tool ucd2haskell,\n"
-        , "-- with data from: https://www.unicode.org/Public/"
-        , version
-        , "/ucd/"
-        , file
-        ,".\n\n"
-        ]
-      Nothing -> "-- DO NOT EDIT: This file is automatically generated by the internal tool ucd2haskell.\n\n"
-    outfile = outdir </> moduleToFileName modName <.> ".hs"
-    outfiledir = dirFromFileName outfile
-    action c = do
-        version <-
-            catch
-                (getEnv "UNICODE_VERSION")
-                (\(_ :: IOException) -> return "<unknown>")
-        createDirectoryIfMissing True outfiledir
-        writeFile outfile (pretext version <> c)
-
-testOutputFileEmitter :: FilePath -> TestOutputRecipe a -> Fold IO a ()
-testOutputFileEmitter outdir (name, fldGen) = Fold.rmapM action fldGen
-
-    where
-
-    outfile = outdir </> "tests" </> name <.> ".stdout"
-    outfiledir = dirFromFileName outfile
-    action c
-        = createDirectoryIfMissing True outfiledir
-        *> writeFile outfile c
-
-runGenerator ::
-       FilePath
-    -> FilePath
-    -> (Stream IO String -> Stream IO a)
-    -> FilePath
-    -> GeneratorRecipe a
-    -> IO ()
-runGenerator indir file transformLines outdir recipes =
-    readLinesFromFile (indir <> file) & transformLines & Stream.fold combinedFld
-
-    where
-
-    generatedFolds = recipes <&> \case
-      ModuleRecipe     name f -> moduleFileEmitter (Just file) outdir (name, f)
-      TestOutputRecipe name f -> testOutputFileEmitter         outdir (name, f)
-    combinedFld = void $ Fold.distribute generatedFolds
-
-genModules :: String -> String -> [String] -> IO ()
-genModules indir outdir props = do
-    genUnicodeVersion outdir
-
-    -- [NOTE] Disabled generator
-    -- compExclu <-
-    --     readLinesFromFile (indir <> "DerivedNormalizationProps.txt")
-    --         & parsePropertyLines
-    --         & Stream.find (\(name, _) -> name == "Full_Composition_Exclusion")
-    --         & fmap (snd . fromMaybe ("", []))
-
-    -- [NOTE] Disabled generator
-    -- non0CC <-
-    --     readLinesFromFile (indir <> "extracted/DerivedCombiningClass.txt")
-    --         & parsePropertyLines
-    --         & Stream.filter (\(name, _) -> name /= "0")
-    --         & Stream.map snd
-    --         & Stream.fold (Fold.foldl' (<>) [])
-
-    runGenerator
-        indir
-        "UnicodeData.txt"
-        parseUnicodeDataLines
-        outdir
-        -- [NOTE] Disabled generators
-        -- [ uncurry ModuleRecipe compositions compExclu non0CC
-        -- , uncurry ModuleRecipe combiningClass
-        -- , uncurry ModuleRecipe decomposable
-        -- , uncurry ModuleRecipe decomposableK
-        -- , uncurry ModuleRecipe decompositions
-        -- , uncurry ModuleRecipe decompositionsK2
-        -- , uncurry ModuleRecipe decompositionsK
-        [ uncurry ModuleRecipe generalCategory
-        , uncurry ModuleRecipe simpleUpperCaseMapping
-        , uncurry ModuleRecipe simpleLowerCaseMapping
-        , uncurry ModuleRecipe simpleTitleCaseMapping
-        -- , uncurry TestOutputRecipe unicode002Test
-        ]
-
-    -- [NOTE] Disabled generator
-    -- runGenerator
-    --     indir
-    --     "PropList.txt"
-    --     parsePropertyLines
-    --     outdir
-    --     [ uncurry ModuleRecipe propList ]
-
-    runGenerator
-        indir
-        "DerivedCoreProperties.txt"
-        parsePropertyLines
-        outdir
-        [ uncurry ModuleRecipe derivedCoreProperties ]
-
-    where
-
-    -- [NOTE] Disabled generator
-    -- propList =
-    --     ("GHC.Internal.Unicode.Char.PropList"
-    --     , (`genCorePropertiesModule` (`elem` props)))
-
-    derivedCoreProperties =
-        ("GHC.Internal.Unicode.Char.DerivedCoreProperties"
-        , (`genCorePropertiesModule` (`elem` props)))
-
-    -- [NOTE] Disabled generator
-    -- compositions exc non0 =
-    --     ( "GHC.Internal.Unicode.Char.UnicodeData.Compositions"
-    --     , \m -> genCompositionsModule m exc non0)
-
-    -- [NOTE] Disabled generator
-    -- combiningClass =
-    --     ( "GHC.Internal.Unicode.Char.UnicodeData.CombiningClass"
-    --     , genCombiningClassModule)
-
-    -- [NOTE] Disabled generator
-    -- decomposable =
-    --     ( "GHC.Internal.Unicode.Char.UnicodeData.Decomposable"
-    --     , (`genDecomposableModule` Canonical))
-
-    -- [NOTE] Disabled generator
-    -- decomposableK =
-    --     ( "GHC.Internal.Unicode.Char.UnicodeData.DecomposableK"
-    --     , (`genDecomposableModule` Kompat))
-
-    -- [NOTE] Disabled generator
-    -- decompositions =
-    --     ( "GHC.Internal.Unicode.Char.UnicodeData.Decompositions"
-    --     , \m -> genDecomposeDefModule m [] [] Canonical (const True))
-
-    -- [NOTE] Disabled generator
-    -- decompositionsK2 =
-    --     ( "GHC.Internal.Unicode.Char.UnicodeData.DecompositionsK2"
-    --     , \m -> genDecomposeDefModule m [] [] Kompat (>= 60000))
-
-    -- [NOTE] Disabled generator
-    -- decompositionsK =
-    --     let pre = ["import qualified " <> fst decompositionsK2 <> " as DK2", ""]
-    --         post = ["decompose c = DK2.decompose c"]
-    --      in ( "GHC.Internal.Unicode.Char.UnicodeData.DecompositionsK"
-    --         , \m -> genDecomposeDefModule m pre post Kompat (< 60000))
-
-    generalCategory =
-         ( "GHC.Internal.Unicode.Char.UnicodeData.GeneralCategory"
-         , genGeneralCategoryModule)
-
-    simpleUpperCaseMapping =
-         ( "GHC.Internal.Unicode.Char.UnicodeData.SimpleUpperCaseMapping"
-         , \m -> genSimpleCaseMappingModule m "toSimpleUpperCase" _simpleUppercaseMapping)
-
-    simpleLowerCaseMapping =
-         ( "GHC.Internal.Unicode.Char.UnicodeData.SimpleLowerCaseMapping"
-         , \m -> genSimpleCaseMappingModule m "toSimpleLowerCase" _simpleLowercaseMapping)
-
-    simpleTitleCaseMapping =
-         ( "GHC.Internal.Unicode.Char.UnicodeData.SimpleTitleCaseMapping"
-         , \m -> genSimpleCaseMappingModule m "toSimpleTitleCase" _simpleTitlecaseMapping)
-
-    -- unicode002Test =
-    --      ( "unicode002"
-    --      , genUnicode002TestResults)


=====================================
libraries/ghc-internal/tools/ucd2haskell/exe/UCD2Haskell.hs
=====================================
@@ -2,14 +2,17 @@
 -- Module      : Main
 -- Copyright   : (c) 2020 Composewell Technologies and Contributors
 -- License     : BSD-3-Clause
--- Maintainer  : streamly at composewell.com
+-- Maintainer  : The GHC Developers <ghc-devs at haskell.org>"
 -- Stability   : internal
 --
 module Main where
 
-import WithCli (HasArguments(..), withCli)
-import Parser.Text (genModules)
+import qualified Data.ByteString.Char8 as B8
+import qualified Data.ByteString.Short as BS
 import GHC.Generics (Generic)
+import WithCli (HasArguments(..), withCli)
+
+import UCD2Haskell.ModuleGenerators (genModules)
 
 data CLIOptions =
     CLIOptions
@@ -20,7 +23,10 @@ data CLIOptions =
     deriving (Show, Generic, HasArguments)
 
 cliClient :: CLIOptions -> IO ()
-cliClient opts = genModules (input opts) (output opts) (core_prop opts)
+cliClient opts = genModules
+    opts.input
+    opts.output
+    (BS.toShort . B8.pack <$> opts.core_prop)
 
 main :: IO ()
 main = withCli cliClient


=====================================
libraries/ghc-internal/tools/ucd2haskell/exe/UCD2Haskell/ModuleGenerators.hs
=====================================
@@ -0,0 +1,517 @@
+-- |
+-- Module      : UCD2Haskell.ModuleGenerators
+-- Copyright   : (c) 2020 Composewell Technologies and Contributors
+--               (c) 2016-2017 Harendra Kumar
+--               (c) 2014-2015 Antonio Nikishaev
+--               (c) 2022-2024 Pierre Le Marre
+-- License     : BSD-3-Clause
+-- Maintainer  : The GHC Developers <ghc-devs at haskell.org>"
+-- Stability   : internal
+
+-- Code history:
+--
+-- This code was adapted from https://github.com/composewell/unicode-data/
+-- (around commit c4aa52ed932ad8badf97296858932c3389b275b8) by Pierre Le Marre.
+-- The original Unicode database parser was taken from
+-- https://github.com/composewell/unicode-transforms but was completely
+-- rewritten from scratch to parse from UCD text files instead of XML, only
+-- some types remain the same. That code in turn was originally taken from
+-- https://github.com/llelf/prose (Antonio Nikishaev) and heavily modified by
+-- Harendra Kumar.
+--
+module UCD2Haskell.ModuleGenerators (genModules) where
+
+import Control.Exception (catch, IOException)
+import Data.Bits (Bits(..))
+import Data.Word (Word8)
+import Data.Char (ord)
+import Data.Functor ((<&>), ($>))
+import Data.List (intersperse, unfoldr)
+import System.Directory (createDirectoryIfMissing)
+import System.Environment (getEnv)
+import System.FilePath ((</>), (<.>))
+import Data.String (IsString)
+import Data.Foldable (Foldable(..))
+
+import qualified Data.ByteString.Builder as BB
+import qualified Data.ByteString as B
+import qualified Data.ByteString.Lazy as BL
+import qualified Data.ByteString.Short as BS
+
+import qualified Unicode.CharacterDatabase.Parser.Common as C
+import qualified Unicode.CharacterDatabase.Parser.UnicodeData as UD
+import qualified Unicode.CharacterDatabase.Parser.Properties.Multiple as P
+
+import Prelude hiding (pred)
+
+--------------------------------------------------------------------------------
+-- Helpers
+--------------------------------------------------------------------------------
+
+unlinesBB :: [BB.Builder] -> BB.Builder
+unlinesBB = (<> "\n") . mconcat . intersperse "\n"
+
+unwordsBB :: [BB.Builder] -> BB.Builder
+unwordsBB = mconcat . intersperse " "
+
+headerRule :: BB.Builder
+headerRule = "-----------------------------------------------------------------------------"
+
+mkModuleHeader :: BB.Builder -> BB.Builder
+mkModuleHeader modName =
+    unlinesBB
+        [ headerRule
+        , "-- |"
+        , "-- Module      : " <> modName
+        , "-- License     : BSD-3-Clause"
+        , "-- Maintainer  : The GHC Developers <ghc-devs at haskell.org>"
+        , "-- Stability   : internal"
+        , headerRule
+        ]
+
+genSignature :: BB.Builder -> BB.Builder
+genSignature = (<> " :: Char -> Bool")
+
+-- | Check that var is between minimum and maximum of orderList
+genRangeCheck :: BB.Builder -> [Int] -> BB.Builder
+genRangeCheck var ordList =
+    var
+        <> " >= "
+        <> BB.intDec (minimum ordList)
+        <> " && " <> var <> " <= " <> BB.intDec (maximum ordList)
+
+genBitmap :: BB.Builder -> [Int] -> BB.Builder
+genBitmap funcName ordList =
+    unlinesBB
+        [ "{-# INLINE " <> funcName <> " #-}"
+        , genSignature funcName
+        , funcName <> " = \\c -> let n = ord c in "
+              <> genRangeCheck "n" ordList <> " && lookupBit64 bitmap# n"
+        , "  where"
+        , "    bitmap# = \"" <> bitMapToAddrLiteral (positionsToBitMap ordList) "\"#"
+        ]
+
+positionsToBitMap :: [Int] -> [Bool]
+positionsToBitMap = go 0
+
+    where
+
+    go _ [] = []
+    go i xxs@(x:xs)
+        | i < x = False : go (i + 1) xxs
+        | otherwise = True : go (i + 1) xs
+
+bitMapToAddrLiteral ::
+  -- | Values to encode
+  [Bool] ->
+  -- | String to append
+  BB.Builder ->
+  BB.Builder
+bitMapToAddrLiteral bs cs = foldr encode cs (unfoldr mkChunks bs)
+
+    where
+
+    mkChunks :: [a] -> Maybe ([a], [a])
+    mkChunks [] = Nothing
+    mkChunks xs = Just $ splitAt 8 xs
+
+    encode :: [Bool] -> BB.Builder -> BB.Builder
+    encode chunk acc = BB.char7 '\\' <> BB.intDec (toByte (padTo8 chunk)) <> acc
+
+    padTo8 :: [Bool] -> [Bool]
+    padTo8 xs
+        | length xs >= 8 = xs
+        | otherwise = xs <> replicate (8 - length xs) False
+
+    toByte :: [Bool] -> Int
+    toByte xs = sum $ map (\i -> if xs !! i then 1 `shiftL` i else 0) [0..7]
+
+genEnumBitmap ::
+  forall a. (Bounded a, Enum a, Show a) =>
+  -- | Function name
+  BB.Builder ->
+  -- | Default value
+  a ->
+  -- | List of values to encode
+  [a] ->
+  BB.Builder
+genEnumBitmap funcName def as = unlinesBB
+    [ "{-# INLINE " <> funcName <> " #-}"
+    , funcName <> " :: Char -> Int"
+    , funcName <> " c = let n = ord c in if n >= "
+               <> BB.intDec (length as)
+               <> " then "
+               <> BB.intDec (fromEnum def)
+               <> " else lookup_bitmap n"
+
+    , "{-# NOINLINE lookup_bitmap #-}"
+    , "lookup_bitmap :: Int -> Int"
+    , "lookup_bitmap n = lookupIntN bitmap# n"
+    , "  where"
+    , "    bitmap# = \"" <> enumMapToAddrLiteral as "\"#"
+    ]
+
+{-| Encode a list of values as a byte map, using their 'Enum' instance.
+
+__Note:__ 'Enum' instance must respect the following:
+
+* @fromEnum minBound >= 0x00@
+* @fromEnum maxBound <= 0xff@
+-}
+enumMapToAddrLiteral ::
+  forall a. (Bounded a, Enum a, Show a) =>
+  -- | Values to encode
+  [a] ->
+  -- | String to append
+  BB.Builder ->
+  BB.Builder
+enumMapToAddrLiteral xs cs = foldr go cs xs
+
+    where
+
+    go :: a -> BB.Builder -> BB.Builder
+    go x acc = BB.char7 '\\' <> BB.word8Dec (toWord8 x) <> acc
+
+    toWord8 :: a -> Word8
+    toWord8 a = let w = fromEnum a in if 0 <= w && w <= 0xff
+        then fromIntegral w
+        else error $ "Cannot convert to Word8: " <> show a
+
+genUnicodeVersion :: FilePath -> IO ()
+genUnicodeVersion outdir = do
+  version <- catch
+              (getEnv "UNICODE_VERSION")
+              (\(_ :: IOException) -> return "<unknown>")
+  runFold f [body version]
+  where
+    moduleName :: (IsString a) => a
+    moduleName = "GHC.Internal.Unicode.Version"
+    f = moduleFileEmitter Nothing outdir
+          (moduleName, \_ -> Fold (\_ x -> x) mempty id)
+    body :: String -> BB.Builder
+    body version = unlinesBB
+      [ "{-# LANGUAGE NoImplicitPrelude #-}"
+      , "{-# OPTIONS_HADDOCK hide #-}"
+      , ""
+      , mkModuleHeader moduleName
+      , "module " <> moduleName
+      , "(unicodeVersion)"
+      , "where"
+      , ""
+      , "import {-# SOURCE #-} GHC.Internal.Data.Version"
+      , ""
+      , "-- | Version of Unicode standard used by @base@:"
+      , "-- [" <> BB.string7 version <> "](https://www.unicode.org/versions/Unicode" <> BB.string7 version <> "/)."
+      , "--"
+      , "-- @since base-4.15.0.0"
+      , "unicodeVersion :: Version"
+      , "unicodeVersion = makeVersion [" <> mkVersion version <> "]" ]
+    mkVersion = foldMap (\c -> case c of {'.' -> BB.char7 ',' <> BB.char7 ' '; _ -> BB.char7 c})
+
+--------------------------------------------------------------------------------
+-- Fold
+--------------------------------------------------------------------------------
+
+data Fold a b = forall s. Fold
+  { _step :: s -> a -> s
+  , _initial :: s
+  , _final :: s -> b }
+
+data Pair a b = Pair !a !b
+
+teeWith :: (a -> b -> c) -> Fold x a -> Fold x b -> Fold x c
+teeWith f (Fold stepL initialL finalL) (Fold stepR initialR finalR) =
+  Fold step initial final
+  where
+    step (Pair sL sR) x = Pair (stepL sL x) (stepR sR x)
+    initial = Pair initialL initialR
+    final (Pair sL sR) = f (finalL sL) (finalR sR)
+
+distribute :: [Fold a b] -> Fold a [b]
+distribute = foldr (teeWith (:)) (Fold const () (const []))
+
+rmapFold :: (b -> c) -> Fold a b -> Fold a c
+rmapFold f (Fold step initial final) = Fold step initial (f . final)
+
+runFold :: Fold a b -> [a] -> b
+runFold (Fold step initial final) = final . foldl' step initial
+
+--------------------------------------------------------------------------------
+-- Modules generators
+--------------------------------------------------------------------------------
+
+data GeneralCategoryAcc = GeneralCategoryAcc
+  { _categories :: ![UD.GeneralCategory]
+  , _expectedChar :: !Char
+  }
+
+genGeneralCategoryModule :: BB.Builder -> Fold UD.Entry BB.Builder
+genGeneralCategoryModule moduleName = Fold step initial done
+
+    where
+
+    -- (categories, expected char)
+    initial = GeneralCategoryAcc [] '\0'
+
+    step (GeneralCategoryAcc acc p) e@(UD.Entry r d)
+      | p < r.start
+      -- Fill missing char entry with default category Cn
+      -- See: https://www.unicode.org/reports/tr44/#Default_Values_Table
+      = step (GeneralCategoryAcc (replicate (ord r.start - ord p) UD.Cn <> acc) r.start) e
+      -- Regular entry
+      | otherwise = case r of
+        C.SingleChar ch -> GeneralCategoryAcc
+          (d.generalCategory : acc)
+          (succ ch)
+        C.CharRange ch1 ch2 -> GeneralCategoryAcc
+          (replicate (ord ch2 - ord ch1 + 1) d.generalCategory <> acc)
+          (succ ch2)
+
+    done (GeneralCategoryAcc acc _) = unlinesBB
+        [ "{-# LANGUAGE NoImplicitPrelude #-}"
+        , "{-# LANGUAGE MagicHash #-}"
+        , "{-# OPTIONS_HADDOCK hide #-}"
+        , ""
+        , mkModuleHeader moduleName
+        , "module " <> moduleName
+        , "(generalCategory)"
+        , "where"
+        , ""
+        , "import GHC.Internal.Base (Char, Int, Ord(..), ord)"
+        , "import GHC.Internal.Unicode.Bits (lookupIntN)"
+        , ""
+        , genEnumBitmap "generalCategory" UD.Cn (reverse acc)
+        ]
+
+genSimpleCaseMappingModule
+    :: BB.Builder
+    -> BB.Builder
+    -> (UD.CharDetails -> Maybe Char)
+    -> Fold UD.Entry BB.Builder
+genSimpleCaseMappingModule moduleName funcName field =
+    Fold step initial done
+
+    where
+
+    genHeader =
+        [ "{-# LANGUAGE NoImplicitPrelude, LambdaCase #-}"
+        , "{-# OPTIONS_HADDOCK hide #-}"
+        , ""
+        , mkModuleHeader moduleName
+        , "module " <> moduleName
+        , "(" <> funcName <> ")"
+        , "where"
+        , ""
+        , "import GHC.Internal.Base (Char)"
+        , ""
+        ]
+    genSign =
+        [ "{-# NOINLINE " <> funcName <> " #-}"
+        , funcName <> " :: Char -> Char"
+        , funcName <> " = \\case"
+        ]
+    initial = []
+
+    step ds dc = case mkEntry dc of
+        Nothing -> ds
+        Just d  -> d : ds
+
+    after = ["  c -> c"]
+
+    done st =
+        let body = mconcat [genHeader, genSign, reverse st, after]
+        in unlinesBB body
+
+    mkEntry (UD.Entry r dc) = case r of
+      C.SingleChar ch -> field dc <&> \c -> mconcat
+        [ "  '\\x"
+        , showHexChar ch
+        , "' -> '\\x"
+        , showHexChar c
+        , "'"
+        ]
+      C.CharRange{} -> field dc $> error ("genSimpleCaseMappingModule: unexpected char range: " <> show r)
+
+    showHexChar c = BB.wordHex (fromIntegral (ord c))
+
+data PropertiesAcc = PropertiesAcc
+  { _properties :: ![BS.ShortByteString]
+  , _bitmaps :: ![BB.Builder]
+  , _currentBitmap :: ![[Int]] }
+
+genCorePropertiesModule ::
+       BB.Builder -> (BS.ShortByteString -> Bool) -> Fold P.Entry BB.Builder
+genCorePropertiesModule moduleName isProp = Fold step initial done
+    where
+    prop2FuncName x = "is" <> BB.shortByteString x
+
+    initial = PropertiesAcc [] [] []
+
+    step acc@(PropertiesAcc props bitmaps bits) P.Entry{..}
+      | not (isProp property) = acc -- property filtered out
+      | otherwise = case props of
+        prop' : _
+          | prop' == property -> PropertiesAcc props bitmaps (rangeToBits range : bits)
+          | otherwise -> PropertiesAcc
+              { _properties = property : props
+              , _bitmaps = genBitmap' prop' bits : bitmaps
+              , _currentBitmap = [rangeToBits range] }
+        _ -> PropertiesAcc [property] bitmaps [rangeToBits range]
+
+    rangeToBits = \case
+      C.SingleChar ch -> [ord ch]
+      C.CharRange ch1 ch2 -> [ord ch1 .. ord ch2]
+
+    genBitmap' prop bits = genBitmap (prop2FuncName prop) (mconcat (reverse bits))
+
+    done (PropertiesAcc props bitmaps bits) = unlinesBB (header props <> bitmaps')
+      where
+        lastProp = case props of
+          prop : _ -> prop
+          [] -> error "impossible"
+        bitmaps' = genBitmap' lastProp bits : bitmaps
+
+    header exports =
+        [ "{-# LANGUAGE NoImplicitPrelude #-}"
+        , "{-# LANGUAGE MagicHash #-}"
+        , "{-# OPTIONS_HADDOCK hide #-}"
+        , ""
+        , mkModuleHeader moduleName
+        , "module " <> moduleName
+        , "(" <> unwordsBB (intersperse "," (map prop2FuncName exports)) <> ")"
+        , "where"
+        , ""
+        , "import GHC.Internal.Base (Bool, Char, Ord(..), (&&), ord)"
+        , "import GHC.Internal.Unicode.Bits (lookupBit64)"
+        , ""
+        ]
+
+--------------------------------------------------------------------------------
+-- Generation
+--------------------------------------------------------------------------------
+
+moduleToFileName :: String -> String
+moduleToFileName = map (\x -> if x == '.' then '/' else x)
+
+dirFromFileName :: String -> String
+dirFromFileName = reverse . dropWhile (/= '/') . reverse
+
+data FileRecipe a
+    = ModuleRecipe
+      -- ^ A recipe to create a Haskell module file.
+        String
+        -- ^ Module name
+        (BB.Builder -> Fold a BB.Builder)
+        -- ^ Function that generate the module, given the module name.
+    | TestOutputRecipe
+      -- ^ A recipe to create a test output file.
+        String
+        -- ^ Test name
+        (Fold a BB.Builder)
+        -- ^ Test output generator
+
+-- ModuleRecipe is a tuple of the module name and a function that generates the
+-- module using the module name
+type ModuleRecipe a = (String, BB.Builder -> Fold a BB.Builder)
+type TestOutputRecipe a = (FilePath, Fold a BB.Builder)
+
+-- GeneratorRecipe is a list of ModuleRecipe
+type GeneratorRecipe a = [FileRecipe a]
+
+moduleFileEmitter :: Maybe FilePath -> FilePath -> ModuleRecipe a -> Fold a (IO ())
+moduleFileEmitter mfile outdir (modName, fldGen) = rmapFold action $ fldGen (BB.string7 modName)
+
+    where
+
+    pretext version = case mfile of
+      Just file -> mconcat
+        [ "-- DO NOT EDIT: This file is automatically generated by the internal tool ucd2haskell,\n"
+        , "-- with data from: https://www.unicode.org/Public/"
+        , BB.string7 version
+        , "/ucd/"
+        , BB.string7 file
+        ,".\n\n"
+        ]
+      Nothing -> "-- DO NOT EDIT: This file is automatically generated by the internal tool ucd2haskell.\n\n"
+    outfile = outdir </> moduleToFileName modName <.> ".hs"
+    outfiledir = dirFromFileName outfile
+    action c = do
+        version <-
+            catch
+                (getEnv "UNICODE_VERSION")
+                (\(_ :: IOException) -> return "<unknown>")
+        createDirectoryIfMissing True outfiledir
+        B.writeFile outfile (BL.toStrict (BB.toLazyByteString (pretext version <> c)))
+
+testOutputFileEmitter :: FilePath -> TestOutputRecipe a -> Fold a (IO ())
+testOutputFileEmitter outdir (name, fldGen) = rmapFold action fldGen
+
+    where
+
+    outfile = outdir </> "tests" </> name <.> ".stdout"
+    outfiledir = dirFromFileName outfile
+    action c
+        = createDirectoryIfMissing True outfiledir
+        *> B.writeFile outfile (BL.toStrict (BB.toLazyByteString c))
+
+runGenerator ::
+       FilePath
+    -> FilePath
+    -> (B.ByteString -> [a])
+    -> FilePath
+    -> GeneratorRecipe a
+    -> IO ()
+runGenerator indir file transformLines outdir recipes = do
+    raw <- B.readFile (indir <> file)
+    sequence_ (runFold combinedFld (transformLines raw))
+
+    where
+
+    generatedFolds = recipes <&> \case
+      ModuleRecipe     name f -> moduleFileEmitter (Just file) outdir (name, f)
+      TestOutputRecipe name f -> testOutputFileEmitter         outdir (name, f)
+    combinedFld = distribute generatedFolds
+
+genModules :: FilePath -> FilePath -> [BS.ShortByteString] -> IO ()
+genModules indir outdir props = do
+    genUnicodeVersion outdir
+
+    runGenerator
+        indir
+        "UnicodeData.txt"
+        UD.parse
+        outdir
+        [ generalCategory
+        , simpleUpperCaseMapping
+        , simpleLowerCaseMapping
+        , simpleTitleCaseMapping
+        ]
+
+    runGenerator
+        indir
+        "DerivedCoreProperties.txt"
+        P.parse
+        outdir
+        [ derivedCoreProperties ]
+
+    where
+
+    derivedCoreProperties = ModuleRecipe
+        "GHC.Internal.Unicode.Char.DerivedCoreProperties"
+        (`genCorePropertiesModule` (`elem` props))
+
+    generalCategory = ModuleRecipe
+         "GHC.Internal.Unicode.Char.UnicodeData.GeneralCategory"
+         genGeneralCategoryModule
+
+    simpleUpperCaseMapping = ModuleRecipe
+         "GHC.Internal.Unicode.Char.UnicodeData.SimpleUpperCaseMapping"
+         (\m -> genSimpleCaseMappingModule m "toSimpleUpperCase" UD.simpleUpperCaseMapping)
+
+    simpleLowerCaseMapping = ModuleRecipe
+         "GHC.Internal.Unicode.Char.UnicodeData.SimpleLowerCaseMapping"
+         (\m -> genSimpleCaseMappingModule m "toSimpleLowerCase" UD.simpleLowerCaseMapping)
+
+    simpleTitleCaseMapping = ModuleRecipe
+         "GHC.Internal.Unicode.Char.UnicodeData.SimpleTitleCaseMapping"
+         (\m -> genSimpleCaseMappingModule m "toSimpleTitleCase" UD.simpleTitleCaseMapping)


=====================================
libraries/ghc-internal/tools/ucd2haskell/ucd2haskell.cabal
=====================================
@@ -1,6 +1,6 @@
 cabal-version:       2.2
 name:                ucd2haskell
-version:             0.3.0
+version:             0.4.0
 synopsis:            Converter from Unicode character database to Haskell.
 description:
   The Haskell data structures are generated programmatically from the
@@ -10,12 +10,12 @@ description:
 license:             BSD-3-Clause
 license-file:        LICENSE
 author:              Composewell Technologies and Contributors
-maintainer:          streamly at composewell.com
-copyright:           2020 Composewell Technologies and Contributors
+maintainer:          The GHC Developers <ghc-devs at haskell.org>
+bug-reports:         https://gitlab.haskell.org/ghc/ghc/-/issues/new
 category:            Data,Text,Unicode
 stability:           Experimental
 build-type:          Simple
-tested-with:         GHC==9.2.2
+tested-with:         GHC==9.8.2
 
 extra-source-files:
     README.md
@@ -23,18 +23,14 @@ extra-source-files:
 
 common default-extensions
   default-extensions:
-      BangPatterns
       DeriveGeneric
-      MagicHash
-      RecordWildCards
-      ScopedTypeVariables
-      TupleSections
-      FlexibleContexts
-
-      -- Experimental, may lead to issues
       DeriveAnyClass
-      TemplateHaskell
-      UnboxedTuples
+      ExistentialQuantification
+      LambdaCase
+      OverloadedStrings
+      OverloadedRecordDot
+      ScopedTypeVariables
+      RecordWildCards
 
 common compile-options
   ghc-options: -Wall
@@ -42,21 +38,20 @@ common compile-options
                -fwarn-incomplete-record-updates
                -fwarn-incomplete-uni-patterns
                -fwarn-tabs
-  default-language: Haskell2010
+  default-language: GHC2021
 
 executable ucd2haskell
   import: default-extensions, compile-options
-  default-language: Haskell2010
   ghc-options: -O2
   hs-source-dirs: exe
   main-is: UCD2Haskell.hs
-  other-modules: Parser.Text
+  other-modules: UCD2Haskell.ModuleGenerators
   build-depends:
-      base             >= 4.7   && < 4.20
-    , streamly-core    >= 0.2.2 && < 0.3
-    , streamly         >= 0.10   && < 0.11
-    , split            >= 0.2.3 && < 0.3
-    , getopt-generics  >= 0.13  && < 0.14
-    , containers       >= 0.5   && < 0.7
-    , directory        >= 1.3.6 && < 1.3.8
-    , filepath         >= 1.4.2 && < 1.5
+      base                >= 4.7   && < 5
+    , bytestring          >= 0.11  && < 0.13
+    , containers          >= 0.5   && < 0.7
+    , directory           >= 1.3.6 && < 1.3.8
+    , filepath            >= 1.4.2 && < 1.5
+    , getopt-generics     >= 0.13  && < 0.14
+    , split               >= 0.2.3 && < 0.3
+    , unicode-data-parser >= 0.2.0 && < 0.4


=====================================
testsuite/tests/typecheck/T24026/T24026a.hs
=====================================
@@ -0,0 +1,7 @@
+-- This rule has a type error on the LHS
+module T24026a where
+
+{-# RULES "f" forall (x :: Bool). f x = 0 #-}
+
+f :: Int -> Int
+f x = 0


=====================================
testsuite/tests/typecheck/T24026/T24026a.stderr
=====================================
@@ -0,0 +1,9 @@
+T24026a.hs:4:11: warning: [GHC-95396] [-Winline-rule-shadowing (in -Wdefault)]
+    Rule "f" may never fire because ‘f’ might inline first
+    Suggested fix: Add an INLINE[n] or NOINLINE[n] pragma for ‘f’
+
+T24026a.hs:4:37: warning: [GHC-83865] [-Wdeferred-type-errors (in -Wdefault)]
+    • Couldn't match expected type ‘Int’ with actual type ‘Bool’
+    • In the first argument of ‘f’, namely ‘x’
+      In the expression: f x
+      When checking the rewrite rule "f"
\ No newline at end of file


=====================================
testsuite/tests/typecheck/T24026/T24026b.hs
=====================================
@@ -0,0 +1,7 @@
+-- This rule has a type error on the LHS
+module T24026b where
+
+{-# RULES "f" forall (x :: Bool). f x = 0 #-}
+
+f :: Int -> Int
+f x = 0


=====================================
testsuite/tests/typecheck/T24026/T24026b.stderr
=====================================
@@ -0,0 +1,5 @@
+T24026b.hs:4:37: error: [GHC-83865]
+    • Couldn't match expected type ‘Int’ with actual type ‘Bool’
+    • In the first argument of ‘f’, namely ‘x’
+      In the expression: f x
+      When checking the rewrite rule "f"
\ No newline at end of file


=====================================
testsuite/tests/typecheck/T24026/all.T
=====================================
@@ -0,0 +1,2 @@
+test('T24026a', normal, compile, ['-dlint -fdefer-type-errors'])
+test('T24026b', normal, compile_fail, [''])
\ No newline at end of file


=====================================
utils/haddock/CONTRIBUTING.md
=====================================
@@ -28,6 +28,17 @@ Then, run the following command from the top-level:
 $ ./hadrian/build -j --flavour=Quick --freeze1 _build/stage1/bin/haddock
 ```
 
+### Running the test suites
+
+Currently, this cannot be done with hadrian but has to be done with a
+`cabal-install` built from `master`.
+
+```
+cabal test -w <absolute-path-to-ghc-repo>/_build/stage1/bin/ghc
+```
+
+For more details, see https://gitlab.haskell.org/ghc/ghc/-/issues/24976.
+
 ## Working with the codebase
 
 The project provides a Makefile with rules to accompany you during development:


=====================================
utils/haddock/cabal.project
=====================================
@@ -1,5 +1,3 @@
-with-compiler: ghc-9.7
-
 packages: ./
           ./haddock-api
           ./haddock-library



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/de0abf3607d31f1e8ab0b9265a540338b2f81c31...7677c545f69ef3fbbd56cbd7319a46bf6a98a56f

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/de0abf3607d31f1e8ab0b9265a540338b2f81c31...7677c545f69ef3fbbd56cbd7319a46bf6a98a56f
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20240614/26a574a0/attachment-0001.html>


More information about the ghc-commits mailing list