[Git][ghc/ghc][wip/unboxed-codebuffer] 7 commits: docs: release notes, user guide: add js backend
Josh Meredith (@JoshMeredith)
gitlab at gitlab.haskell.org
Wed Feb 15 17:18:17 UTC 2023
Josh Meredith pushed to branch wip/unboxed-codebuffer at Glasgow Haskell Compiler / GHC
Commits:
08c0822c by doyougnu at 2023-02-15T00:16:39-05:00
docs: release notes, user guide: add js backend
Follow up from #21078
- - - - -
79d8fd65 by Bryan Richter at 2023-02-15T00:17:15-05:00
Allow failure in nightly-x86_64-linux-deb10-no_tntc-validate
See #22343
- - - - -
9ca51f9e by Cheng Shao at 2023-02-15T00:17:53-05:00
rts: add the rts_clearMemory function
This patch adds the rts_clearMemory function that does its best to
zero out unused RTS memory for a wasm backend use case. See the
comment above rts_clearMemory() prototype declaration for more
detailed explanation. Closes #22920.
- - - - -
18700351 by Josh Meredith at 2023-02-15T17:16:27+00:00
CodeBuffer: change to use unboxed tuples for encoders/decoders
Updates submodules for filepath and haskeline
- - - - -
f2e7f89d by Josh Meredith at 2023-02-15T17:16:27+00:00
Lint
- - - - -
2b51e003 by Josh Meredith at 2023-02-15T17:16:27+00:00
Fix build
- - - - -
edd10397 by Josh Meredith at 2023-02-15T17:16:27+00:00
Tabs
- - - - -
24 changed files:
- .gitlab/gen_ci.hs
- .gitlab/jobs.yaml
- docs/users_guide/9.6.1-notes.rst
- docs/users_guide/codegens.rst
- libraries/base/GHC/IO/Encoding.hs
- libraries/base/GHC/IO/Encoding/Failure.hs
- libraries/base/GHC/IO/Encoding/Iconv.hs
- libraries/base/GHC/IO/Encoding/Latin1.hs
- libraries/base/GHC/IO/Encoding/Types.hs
- libraries/base/GHC/IO/Encoding/UTF16.hs
- libraries/base/GHC/IO/Encoding/UTF32.hs
- libraries/base/GHC/IO/Encoding/UTF8.hs
- libraries/base/GHC/IO/Handle/Internals.hs
- libraries/filepath
- libraries/haskeline
- rts/RtsSymbols.c
- rts/include/RtsAPI.h
- rts/sm/BlockAlloc.c
- rts/sm/BlockAlloc.h
- rts/sm/NonMoving.h
- rts/sm/NonMovingSweep.c
- rts/sm/Storage.c
- rts/sm/Storage.h
- testsuite/tests/ffi/should_run/ffi023_c.c
Changes:
=====================================
.gitlab/gen_ci.hs
=====================================
@@ -713,6 +713,10 @@ modifyJobs = fmap
modifyValidateJobs :: (a -> a) -> JobGroup a -> JobGroup a
modifyValidateJobs f jg = jg { v = f <$> v jg }
+-- | Modify just the nightly jobs in a 'JobGroup'
+modifyNightlyJobs :: (a -> a) -> JobGroup a -> JobGroup a
+modifyNightlyJobs f jg = jg { n = f <$> n jg }
+
-- Generic helpers
addJobRule :: Rule -> Job -> Job
@@ -854,7 +858,9 @@ job_groups =
, fastCI (validateBuilds Amd64 (Linux Debian10) unreg)
, fastCI (validateBuilds Amd64 (Linux Debian10) debug)
, modifyValidateJobs manual tsan_jobs
- , modifyValidateJobs manual (validateBuilds Amd64 (Linux Debian10) noTntc)
+ , -- Nightly allowed to fail: #22343
+ modifyNightlyJobs allowFailure
+ (modifyValidateJobs manual (validateBuilds Amd64 (Linux Debian10) noTntc))
, addValidateRule LLVMBackend (validateBuilds Amd64 (Linux Debian10) llvm)
, disableValidate (standardBuilds Amd64 (Linux Debian11))
=====================================
.gitlab/jobs.yaml
=====================================
@@ -978,7 +978,7 @@
".gitlab/ci.sh clean",
"cat ci_timings"
],
- "allow_failure": false,
+ "allow_failure": true,
"artifacts": {
"expire_in": "8 weeks",
"paths": [
=====================================
docs/users_guide/9.6.1-notes.rst
=====================================
@@ -132,6 +132,15 @@ Compiler
presented in this GHC version as a technology preview, bugs and
missing features are expected.
+- The JavaScript backend has been merged. GHC is now able to be built as a
+ cross-compiler targeting the JavaScript platform. The backend should be
+ considered a technology preview. As such it is not ready for use in
+ production, is not distributed in the GHC release bindists and requires the
+ user to manually build GHC as a cross-compiler. See the JavaScript backend
+ `wiki <https://gitlab.haskell.org/ghc/ghc/-/wikis/javascript-backend>`_ page
+ on the GHC wiki for the current status, project roadmap, build instructions
+ and demos.
+
- The :extension:`TypeInType` is now marked as deprecated. Its meaning has been included
in :extension:`PolyKinds` and :extension:`DataKinds`.
=====================================
docs/users_guide/codegens.rst
=====================================
@@ -95,6 +95,36 @@ was built this way. If it has then the native code generator probably
won't be available. You can check this information by calling
``ghc --info`` (see :ghc-flag:`--info`).
+.. _javascript-code-gen:
+
+JavaScript Code Generator
+------------------------------
+
+.. index::
+ single: JavaScript code generator
+
+This is an alternative code generator included in GHC 9.6 and above. It
+generates `ECMA-262 <https://tc39.es/ecma262/>`_ compliant JavaScript and is
+included as a technical preview. At time of writing, it is being actively
+developed but is not suitable for serious projects and production environments.
+The JavaScript backend is not distributed in the GHC bindist and requires a
+manual build. See `building the JavaScript backend
+<https://gitlab.haskell.org/ghc/ghc/-/wikis/javascript-backend/building>`_ page
+on the GHC wiki for build instructions.
+
+A JavaScript cross-compiling GHC produces an executable script, and a directory
+of the same name suffixed with ``.jsexe``. For example, compiling a file named
+``Foo.hs`` will produce an executable script ``Foo`` and a ``Foo.jsexe``
+directory. The script is a thin wrapper that calls `Node.js
+<https://nodejs.org/en/>`_ on the payload of the compiled Haskell code and can
+be run in the usual way, e.g., ``./Foo``, as long as ``node`` is in your
+environment . The actual payload is in ``<ModuleName>.jsexe/all.js``, for
+example ``Foo.jsexe/all.js``. This file is the Haskell program cross-compiled to
+JavaScript *concrete syntax* and can be wrapped in a ``<script>`` HTML tag. For
+a breakdown of the rest of the build artifacts see the `compiler output
+<https://gitlab.haskell.org/ghc/ghc/-/wikis/javascript-backend/building#compiler-output-and-build-artifacts>`_
+section in the wiki.
+
.. _unreg:
Unregisterised compilation
=====================================
libraries/base/GHC/IO/Encoding.hs
=====================================
@@ -1,5 +1,6 @@
{-# LANGUAGE Trustworthy #-}
{-# LANGUAGE CPP, NoImplicitPrelude #-}
+{-# LANGUAGE UnboxedTuples, MagicHash #-}
{-# OPTIONS_GHC -funbox-strict-fields #-}
-----------------------------------------------------------------------------
@@ -336,11 +337,13 @@ mkTextEncoding' cfm enc =
latin1_encode :: CharBuffer -> Buffer Word8 -> IO (CharBuffer, Buffer Word8)
-latin1_encode input output = fmap (\(_why,input',output') -> (input',output')) $ Latin1.latin1_encode input output -- unchecked, used for char8
+latin1_encode input output = IO $ \st -> case Latin1.latin1_encode input output st of
+ (# st', _why, input', output' #) -> (# st', (input', output') #) -- unchecked, used for char8
--latin1_encode = unsafePerformIO $ do mkTextEncoder Iconv.latin1 >>= return.encode
latin1_decode :: Buffer Word8 -> CharBuffer -> IO (Buffer Word8, CharBuffer)
-latin1_decode input output = fmap (\(_why,input',output') -> (input',output')) $ Latin1.latin1_decode input output
+latin1_decode input output = IO $ \st -> case Latin1.latin1_decode input output st of
+ (# st', _why, input', output' #) -> (# st', (input',output') #)
--latin1_decode = unsafePerformIO $ do mkTextDecoder Iconv.latin1 >>= return.encode
unknownEncodingErr :: String -> IO a
=====================================
libraries/base/GHC/IO/Encoding/Failure.hs
=====================================
@@ -1,5 +1,7 @@
{-# LANGUAGE Trustworthy #-}
{-# LANGUAGE NoImplicitPrelude #-}
+{-# LANGUAGE UnboxedTuples #-}
+{-# LANGUAGE MagicHash #-}
-----------------------------------------------------------------------------
-- |
@@ -18,7 +20,8 @@
module GHC.IO.Encoding.Failure (
CodingFailureMode(..), codingFailureModeSuffix,
isSurrogate,
- recoverDecode, recoverEncode
+ recoverDecode, recoverEncode,
+ recoverDecode#, recoverEncode#,
) where
import GHC.IO
@@ -142,6 +145,12 @@ unescapeRoundtripCharacterSurrogate c
| otherwise = Nothing
where x = ord c
+recoverDecode# :: CodingFailureMode -> Buffer Word8 -> Buffer Char
+ -> State# RealWorld -> (# State# RealWorld, Buffer Word8, Buffer Char #)
+recoverDecode# cfm input output st =
+ let (# st', (bIn, bOut) #) = unIO (recoverDecode cfm input output) st
+ in (# st', bIn, bOut #)
+
recoverDecode :: CodingFailureMode -> Buffer Word8 -> Buffer Char
-> IO (Buffer Word8, Buffer Char)
recoverDecode cfm input at Buffer{ bufRaw=iraw, bufL=ir, bufR=_ }
@@ -160,6 +169,12 @@ recoverDecode cfm input at Buffer{ bufRaw=iraw, bufL=ir, bufR=_ }
ow' <- writeCharBuf oraw ow (escapeToRoundtripCharacterSurrogate b)
return (input { bufL=ir+1 }, output { bufR=ow' })
+recoverEncode# :: CodingFailureMode -> Buffer Char -> Buffer Word8
+ -> State# RealWorld -> (# State# RealWorld, Buffer Char, Buffer Word8 #)
+recoverEncode# cfm input output st =
+ let (# st', (bIn, bOut) #) = unIO (recoverEncode cfm input output) st
+ in (# st', bIn, bOut #)
+
recoverEncode :: CodingFailureMode -> Buffer Char -> Buffer Word8
-> IO (Buffer Char, Buffer Word8)
recoverEncode cfm input at Buffer{ bufRaw=iraw, bufL=ir, bufR=_ }
=====================================
libraries/base/GHC/IO/Encoding/Iconv.hs
=====================================
@@ -2,6 +2,8 @@
{-# LANGUAGE CPP
, NoImplicitPrelude
, NondecreasingIndentation
+ , UnboxedTuples
+ , MagicHash
#-}
{-# OPTIONS_HADDOCK not-home #-}
@@ -133,19 +135,24 @@ newIConv from to rec fn =
withCAString to $ \ to_str -> do
iconvt <- throwErrnoIfMinus1 "mkTextEncoding" $ hs_iconv_open to_str from_str
let iclose = throwErrnoIfMinus1_ "Iconv.close" $ hs_iconv_close iconvt
+ fn_iconvt ibuf obuf st = case unIO (fn iconvt ibuf obuf) st of
+ (# st', (prog, ibuf', obuf') #) -> (# st', prog, ibuf', obuf' #)
return BufferCodec{
- encode = fn iconvt,
- recover = rec,
- close = iclose,
+ encode# = fn_iconvt,
+ recover# = rec#,
+ close = iclose,
-- iconv doesn't supply a way to save/restore the state
getState = return (),
setState = const $ return ()
}
+ where
+ rec# ibuf obuf st = case unIO (rec ibuf obuf) st of
+ (# st', (ibuf', obuf') #) -> (# st', ibuf', obuf' #)
-iconvDecode :: IConv -> DecodeBuffer
+iconvDecode :: IConv -> Buffer Word8 -> Buffer Char -> IO (CodingProgress, Buffer Word8, Buffer Char)
iconvDecode iconv_t ibuf obuf = iconvRecode iconv_t ibuf 0 obuf char_shift
-iconvEncode :: IConv -> EncodeBuffer
+iconvEncode :: IConv -> Buffer Char -> Buffer Word8 -> IO (CodingProgress, Buffer Char, Buffer Word8)
iconvEncode iconv_t ibuf obuf = iconvRecode iconv_t ibuf char_shift obuf 0
iconvRecode :: IConv -> Buffer a -> Int -> Buffer b -> Int
=====================================
libraries/base/GHC/IO/Encoding/Latin1.hs
=====================================
@@ -2,6 +2,8 @@
{-# LANGUAGE NoImplicitPrelude
, BangPatterns
, NondecreasingIndentation
+ , UnboxedTuples
+ , MagicHash
#-}
{-# OPTIONS_GHC -funbox-strict-fields #-}
@@ -57,8 +59,8 @@ mkLatin1 cfm = TextEncoding { textEncodingName = "ISO-8859-1",
latin1_DF :: CodingFailureMode -> IO (TextDecoder ())
latin1_DF cfm =
return (BufferCodec {
- encode = latin1_decode,
- recover = recoverDecode cfm,
+ encode# = latin1_decode,
+ recover# = recoverDecode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -67,8 +69,8 @@ latin1_DF cfm =
latin1_EF :: CodingFailureMode -> IO (TextEncoder ())
latin1_EF cfm =
return (BufferCodec {
- encode = latin1_encode,
- recover = recoverEncode cfm,
+ encode# = latin1_encode,
+ recover# = recoverEncode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -86,8 +88,8 @@ mkLatin1_checked cfm = TextEncoding { textEncodingName = "ISO-8859-1",
latin1_checked_EF :: CodingFailureMode -> IO (TextEncoder ())
latin1_checked_EF cfm =
return (BufferCodec {
- encode = latin1_checked_encode,
- recover = recoverEncode cfm,
+ encode# = latin1_checked_encode,
+ recover# = recoverEncode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -109,8 +111,8 @@ mkAscii cfm = TextEncoding { textEncodingName = "ASCII",
ascii_DF :: CodingFailureMode -> IO (TextDecoder ())
ascii_DF cfm =
return (BufferCodec {
- encode = ascii_decode,
- recover = recoverDecode cfm,
+ encode# = ascii_decode,
+ recover# = recoverDecode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -119,8 +121,8 @@ ascii_DF cfm =
ascii_EF :: CodingFailureMode -> IO (TextEncoder ())
ascii_EF cfm =
return (BufferCodec {
- encode = ascii_encode,
- recover = recoverEncode cfm,
+ encode# = ascii_encode,
+ recover# = recoverEncode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -138,65 +140,78 @@ latin1_decode :: DecodeBuffer
latin1_decode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- loop !ir !ow
- | ow >= os = done OutputUnderflow ir ow
- | ir >= iw = done InputUnderflow ir ow
+ loop :: Int -> Int -> DecodingBuffer
+ loop !ir !ow st0
+ | ow >= os = done OutputUnderflow ir ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
| otherwise = do
- c0 <- readWord8Buf iraw ir
- ow' <- writeCharBuf oraw ow (unsafeChr (fromIntegral c0))
- loop (ir+1) ow'
+ let !(# st1, c0 #) = unIO (readWord8Buf iraw ir) st0
+ !(# st2, ow' #) = unIO (writeCharBuf oraw ow (unsafeChr (fromIntegral c0))) st1
+ loop (ir+1) ow' st2
-- lambda-lifted, to avoid thunks being built in the inner-loop:
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> DecodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL=0, bufR=0 } else input{ bufL=ir }
+ !ro = output{ bufR=ow }
+ in (# st', why, ri, ro #)
in
- loop ir0 ow0
+ loop ir0 ow0 st
ascii_decode :: DecodeBuffer
ascii_decode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- loop !ir !ow
- | ow >= os = done OutputUnderflow ir ow
- | ir >= iw = done InputUnderflow ir ow
+ loop :: Int -> Int -> DecodingBuffer
+ loop !ir !ow st0
+ | ow >= os = done OutputUnderflow ir ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
| otherwise = do
- c0 <- readWord8Buf iraw ir
- if c0 > 0x7f then invalid else do
- ow' <- writeCharBuf oraw ow (unsafeChr (fromIntegral c0))
- loop (ir+1) ow'
+ let !(# st1, c0 #) = unIO (readWord8Buf iraw ir) st0
+ if c0 > 0x7f then invalid st1 else do
+ let !(# st2, ow' #) = unIO (writeCharBuf oraw ow (unsafeChr (fromIntegral c0))) st1
+ loop (ir+1) ow' st2
where
- invalid = done InvalidSequence ir ow
+ invalid :: DecodingBuffer
+ invalid st' = done InvalidSequence ir ow st'
-- lambda-lifted, to avoid thunks being built in the inner-loop:
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> DecodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL=0, bufR=0 } else input{ bufL=ir }
+ !ro = output{ bufR=ow }
+ in (# st', why, ri, ro #)
in
- loop ir0 ow0
+ loop ir0 ow0 st
latin1_encode :: EncodeBuffer
latin1_encode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
- loop !ir !ow
- | ow >= os = done OutputUnderflow ir ow
- | ir >= iw = done InputUnderflow ir ow
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> EncodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL=0, bufR=0 } else input{ bufL=ir }
+ !ro = output{ bufR=ow }
+ in (# st', why, ri, ro #)
+ loop :: Int -> Int -> EncodingBuffer
+ loop !ir !ow st0
+ | ow >= os = done OutputUnderflow ir ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
| otherwise = do
- (c,ir') <- readCharBuf iraw ir
- writeWord8Buf oraw ow (fromIntegral (ord c))
- loop ir' (ow+1)
+ let !(# st1, (c,ir') #) = unIO (readCharBuf iraw ir) st0
+ !(# st2, () #) = unIO (writeWord8Buf oraw ow (fromIntegral (ord c))) st1
+ loop ir' (ow+1) st2
in
- loop ir0 ow0
+ loop ir0 ow0 st
latin1_checked_encode :: EncodeBuffer
latin1_checked_encode input output
@@ -210,21 +225,26 @@ single_byte_checked_encode :: Int -> EncodeBuffer
single_byte_checked_encode max_legal_char
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
- loop !ir !ow
- | ow >= os = done OutputUnderflow ir ow
- | ir >= iw = done InputUnderflow ir ow
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> EncodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL=0, bufR=0 } else input{ bufL=ir }
+ !ro = output{ bufR=ow }
+ in (# st', why, ri, ro #)
+ loop :: Int -> Int -> EncodingBuffer
+ loop !ir !ow st0
+ | ow >= os = done OutputUnderflow ir ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
| otherwise = do
- (c,ir') <- readCharBuf iraw ir
- if ord c > max_legal_char then invalid else do
- writeWord8Buf oraw ow (fromIntegral (ord c))
- loop ir' (ow+1)
+ let !(# st1, (c,ir') #) = unIO (readCharBuf iraw ir) st0
+ if ord c > max_legal_char then invalid st1 else do
+ let (# st2, () #) = unIO (writeWord8Buf oraw ow (fromIntegral (ord c))) st1
+ loop ir' (ow+1) st2
where
- invalid = done InvalidSequence ir ow
+ invalid :: EncodingBuffer
+ invalid st' = done InvalidSequence ir ow st'
in
- loop ir0 ow0
+ loop ir0 ow0 st
{-# INLINE single_byte_checked_encode #-}
=====================================
libraries/base/GHC/IO/Encoding/Types.hs
=====================================
@@ -1,5 +1,6 @@
{-# LANGUAGE Trustworthy #-}
{-# LANGUAGE NoImplicitPrelude, ExistentialQuantification #-}
+{-# LANGUAGE MagicHash, UnboxedTuples #-}
{-# OPTIONS_GHC -funbox-strict-fields #-}
-----------------------------------------------------------------------------
@@ -21,7 +22,9 @@ module GHC.IO.Encoding.Types (
TextEncoding(..),
TextEncoder, TextDecoder,
CodeBuffer, EncodeBuffer, DecodeBuffer,
- CodingProgress(..)
+ EncodingBuffer, DecodingBuffer,
+ CodingProgress(..),
+ recover, encode,
) where
import GHC.Base
@@ -34,7 +37,7 @@ import GHC.IO.Buffer
-- Text encoders/decoders
data BufferCodec from to state = BufferCodec {
- encode :: CodeBuffer from to,
+ encode# :: CodeBuffer# from to,
-- ^ The @encode@ function translates elements of the buffer @from@
-- to the buffer @to at . It should translate as many elements as possible
-- given the sizes of the buffers, including translating zero elements
@@ -50,7 +53,7 @@ data BufferCodec from to state = BufferCodec {
-- library in order to report translation errors at the point they
-- actually occur, rather than when the buffer is translated.
- recover :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to),
+ recover# :: Buffer from -> Buffer to -> State# RealWorld -> (# State# RealWorld, Buffer from, Buffer to #),
-- ^ The @recover@ function is used to continue decoding
-- in the presence of invalid or unrepresentable sequences. This includes
-- both those detected by @encode@ returning @InvalidSequence@ and those
@@ -92,9 +95,24 @@ data BufferCodec from to state = BufferCodec {
-- call to 'getState'.
}
-type CodeBuffer from to = Buffer from -> Buffer to -> IO (CodingProgress, Buffer from, Buffer to)
-type DecodeBuffer = CodeBuffer Word8 Char
-type EncodeBuffer = CodeBuffer Char Word8
+{-# INLINE encode #-}
+encode :: BufferCodec from to state -> CodeBuffer from to
+encode codec from to = IO $ \s -> case encode# codec from to s of
+ (# s', progress, from', to' #) -> (# s', (progress, from', to') #)
+
+{-# INLINE recover #-}
+recover :: BufferCodec from to state -> Buffer from -> Buffer to -> IO (Buffer from, Buffer to)
+recover codec from to = IO $ \s -> case recover# codec from to s of
+ (# s', from', to' #) -> (# s', (from', to') #)
+
+
+type CodeBuffer from to = Buffer from -> Buffer to -> IO (CodingProgress, Buffer from, Buffer to)
+type CodingBuffer# from to = State# RealWorld -> (# State# RealWorld, CodingProgress, Buffer from, Buffer to #)
+type CodeBuffer# from to = Buffer from -> Buffer to -> CodingBuffer# from to
+type DecodingBuffer = CodingBuffer# Word8 Char
+type DecodeBuffer = CodeBuffer# Word8 Char
+type EncodingBuffer = CodingBuffer# Char Word8
+type EncodeBuffer = CodeBuffer# Char Word8
type TextDecoder state = BufferCodec Word8 CharBufElem state
type TextEncoder state = BufferCodec CharBufElem Word8 state
=====================================
libraries/base/GHC/IO/Encoding/UTF16.hs
=====================================
@@ -3,6 +3,7 @@
, BangPatterns
, NondecreasingIndentation
, MagicHash
+ , UnboxedTuples
#-}
{-# OPTIONS_GHC -funbox-strict-fields #-}
@@ -65,8 +66,8 @@ utf16_DF :: CodingFailureMode -> IO (TextDecoder (Maybe DecodeBuffer))
utf16_DF cfm = do
seen_bom <- newIORef Nothing
return (BufferCodec {
- encode = utf16_decode seen_bom,
- recover = recoverDecode cfm,
+ encode# = utf16_decode seen_bom,
+ recover# = recoverDecode# cfm,
close = return (),
getState = readIORef seen_bom,
setState = writeIORef seen_bom
@@ -76,8 +77,8 @@ utf16_EF :: CodingFailureMode -> IO (TextEncoder Bool)
utf16_EF cfm = do
done_bom <- newIORef False
return (BufferCodec {
- encode = utf16_encode done_bom,
- recover = recoverEncode cfm,
+ encode# = utf16_encode done_bom,
+ recover# = recoverEncode# cfm,
close = return (),
getState = readIORef done_bom,
setState = writeIORef done_bom
@@ -86,39 +87,41 @@ utf16_EF cfm = do
utf16_encode :: IORef Bool -> EncodeBuffer
utf16_encode done_bom input
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow, bufSize=os }
+ st0
= do
- b <- readIORef done_bom
- if b then utf16_native_encode input output
+ let !(# st1, b #) = unIO (readIORef done_bom) st0
+ if b then utf16_native_encode input output st1
else if os - ow < 2
- then return (OutputUnderflow,input,output)
+ then (# st1,OutputUnderflow,input,output #)
else do
- writeIORef done_bom True
- writeWord8Buf oraw ow bom1
- writeWord8Buf oraw (ow+1) bom2
- utf16_native_encode input output{ bufR = ow+2 }
+ let !(# st2, () #) = unIO (writeIORef done_bom True) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw ow bom1) st2
+ !(# st4, () #) = unIO (writeWord8Buf oraw (ow+1) bom2) st3
+ utf16_native_encode input output{ bufR = ow+2 } st4
utf16_decode :: IORef (Maybe DecodeBuffer) -> DecodeBuffer
utf16_decode seen_bom
input at Buffer{ bufRaw=iraw, bufL=ir, bufR=iw, bufSize=_ }
output
+ st0
= do
- mb <- readIORef seen_bom
+ let !(# st1, mb #) = unIO (readIORef seen_bom) st0
case mb of
- Just decode -> decode input output
+ Just decode -> decode input output st1
Nothing ->
- if iw - ir < 2 then return (InputUnderflow,input,output) else do
- c0 <- readWord8Buf iraw ir
- c1 <- readWord8Buf iraw (ir+1)
+ if iw - ir < 2 then (# st1,InputUnderflow,input,output #) else do
+ let !(# st2, c0 #) = unIO (readWord8Buf iraw ir ) st1
+ !(# st3, c1 #) = unIO (readWord8Buf iraw (ir+1)) st2
case () of
- _ | c0 == bomB && c1 == bomL -> do
- writeIORef seen_bom (Just utf16be_decode)
- utf16be_decode input{ bufL= ir+2 } output
- | c0 == bomL && c1 == bomB -> do
- writeIORef seen_bom (Just utf16le_decode)
- utf16le_decode input{ bufL= ir+2 } output
- | otherwise -> do
- writeIORef seen_bom (Just utf16_native_decode)
- utf16_native_decode input output
+ _ | c0 == bomB && c1 == bomL ->
+ let !(# st4, () #) = unIO (writeIORef seen_bom (Just utf16be_decode)) st3
+ in utf16be_decode input{ bufL= ir+2 } output st4
+ | c0 == bomL && c1 == bomB ->
+ let !(# st4, () #) = unIO (writeIORef seen_bom (Just utf16le_decode)) st3
+ in utf16le_decode input{ bufL= ir+2 } output st4
+ | otherwise ->
+ let !(# st4, () #) = unIO (writeIORef seen_bom (Just utf16_native_decode)) st3
+ in utf16_native_decode input output st4
bomB, bomL, bom1, bom2 :: Word8
@@ -150,8 +153,8 @@ mkUTF16be cfm = TextEncoding { textEncodingName = "UTF-16BE",
utf16be_DF :: CodingFailureMode -> IO (TextDecoder ())
utf16be_DF cfm =
return (BufferCodec {
- encode = utf16be_decode,
- recover = recoverDecode cfm,
+ encode# = utf16be_decode,
+ recover# = recoverDecode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -160,8 +163,8 @@ utf16be_DF cfm =
utf16be_EF :: CodingFailureMode -> IO (TextEncoder ())
utf16be_EF cfm =
return (BufferCodec {
- encode = utf16be_encode,
- recover = recoverEncode cfm,
+ encode# = utf16be_encode,
+ recover# = recoverEncode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -179,8 +182,8 @@ mkUTF16le cfm = TextEncoding { textEncodingName = "UTF16-LE",
utf16le_DF :: CodingFailureMode -> IO (TextDecoder ())
utf16le_DF cfm =
return (BufferCodec {
- encode = utf16le_decode,
- recover = recoverDecode cfm,
+ encode# = utf16le_decode,
+ recover# = recoverDecode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -189,8 +192,8 @@ utf16le_DF cfm =
utf16le_EF :: CodingFailureMode -> IO (TextEncoder ())
utf16le_EF cfm =
return (BufferCodec {
- encode = utf16le_encode,
- recover = recoverEncode cfm,
+ encode# = utf16le_encode,
+ recover# = recoverEncode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -201,91 +204,104 @@ utf16be_decode :: DecodeBuffer
utf16be_decode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- loop !ir !ow
- | ow >= os = done OutputUnderflow ir ow
- | ir >= iw = done InputUnderflow ir ow
- | ir + 1 == iw = done InputUnderflow ir ow
+ loop !ir !ow st0
+ | ow >= os = done OutputUnderflow ir ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
+ | ir + 1 == iw = done InputUnderflow ir ow st0
| otherwise = do
- c0 <- readWord8Buf iraw ir
- c1 <- readWord8Buf iraw (ir+1)
+ let !(# st1, c0 #) = unIO (readWord8Buf iraw ir ) st0
+ !(# st2, c1 #) = unIO (readWord8Buf iraw (ir+1)) st1
let x1 = fromIntegral c0 `shiftL` 8 + fromIntegral c1
if validate1 x1
- then do ow' <- writeCharBuf oraw ow (unsafeChr (fromIntegral x1))
- loop (ir+2) ow'
- else if iw - ir < 4 then done InputUnderflow ir ow else do
- c2 <- readWord8Buf iraw (ir+2)
- c3 <- readWord8Buf iraw (ir+3)
- let x2 = fromIntegral c2 `shiftL` 8 + fromIntegral c3
- if not (validate2 x1 x2) then invalid else do
- ow' <- writeCharBuf oraw ow (chr2 x1 x2)
- loop (ir+4) ow'
+ then let !(# st3, ow' #) = unIO (writeCharBuf oraw ow (unsafeChr (fromIntegral x1))) st2
+ in loop (ir+2) ow' st3
+ else if iw - ir < 4 then done InputUnderflow ir ow st2 else do
+ let !(# st3, c2 #) = unIO (readWord8Buf iraw (ir+2)) st2
+ !(# st4, c3 #) = unIO (readWord8Buf iraw (ir+3)) st3
+ x2 = fromIntegral c2 `shiftL` 8 + fromIntegral c3
+ if not (validate2 x1 x2) then invalid st4 else do
+ let !(# st5, ow' #) = unIO (writeCharBuf oraw ow (chr2 x1 x2)) st4
+ loop (ir+4) ow' st5
where
- invalid = done InvalidSequence ir ow
+ invalid :: DecodingBuffer
+ invalid st' = done InvalidSequence ir ow st'
-- lambda-lifted, to avoid thunks being built in the inner-loop:
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> DecodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input { bufL = 0, bufR = 0 } else input { bufL = ir }
+ !ro = output{ bufR = ow }
+ in (# st', why, ri, ro #)
in
- loop ir0 ow0
+ loop ir0 ow0 st
utf16le_decode :: DecodeBuffer
utf16le_decode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- loop !ir !ow
- | ow >= os = done OutputUnderflow ir ow
- | ir >= iw = done InputUnderflow ir ow
- | ir + 1 == iw = done InputUnderflow ir ow
+ loop :: Int -> Int -> DecodingBuffer
+ loop !ir !ow st0
+ | ow >= os = done OutputUnderflow ir ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
+ | ir + 1 == iw = done InputUnderflow ir ow st0
| otherwise = do
- c0 <- readWord8Buf iraw ir
- c1 <- readWord8Buf iraw (ir+1)
- let x1 = fromIntegral c1 `shiftL` 8 + fromIntegral c0
+ let !(# st1, c0 #) = unIO (readWord8Buf iraw ir ) st0
+ !(# st2, c1 #) = unIO (readWord8Buf iraw (ir+1)) st1
+ x1 = fromIntegral c1 `shiftL` 8 + fromIntegral c0
if validate1 x1
- then do ow' <- writeCharBuf oraw ow (unsafeChr (fromIntegral x1))
- loop (ir+2) ow'
- else if iw - ir < 4 then done InputUnderflow ir ow else do
- c2 <- readWord8Buf iraw (ir+2)
- c3 <- readWord8Buf iraw (ir+3)
- let x2 = fromIntegral c3 `shiftL` 8 + fromIntegral c2
- if not (validate2 x1 x2) then invalid else do
- ow' <- writeCharBuf oraw ow (chr2 x1 x2)
- loop (ir+4) ow'
+ then let !(# st3, ow' #) = unIO (writeCharBuf oraw ow (unsafeChr (fromIntegral x1))) st2
+ in loop (ir+2) ow' st3
+ else if iw - ir < 4 then done InputUnderflow ir ow st2 else do
+ let !(# st3, c2 #) = unIO (readWord8Buf iraw (ir+2)) st2
+ !(# st4, c3 #) = unIO (readWord8Buf iraw (ir+3)) st3
+ x2 = fromIntegral c3 `shiftL` 8 + fromIntegral c2
+ if not (validate2 x1 x2) then invalid st4 else do
+ let !(# st5, ow' #) = unIO (writeCharBuf oraw ow (chr2 x1 x2)) st4
+ loop (ir+4) ow' st5
where
- invalid = done InvalidSequence ir ow
+ invalid :: DecodingBuffer
+ invalid st' = done InvalidSequence ir ow st'
-- lambda-lifted, to avoid thunks being built in the inner-loop:
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> DecodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL = 0, bufR = 0 } else input{ bufL = ir }
+ !ro = output{ bufR = ow }
+ in (# st', why, ri, ro #)
in
- loop ir0 ow0
+ loop ir0 ow0 st
utf16be_encode :: EncodeBuffer
utf16be_encode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
- loop !ir !ow
- | ir >= iw = done InputUnderflow ir ow
- | os - ow < 2 = done OutputUnderflow ir ow
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> EncodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL = 0, bufR = 0 } else input{ bufL=ir }
+ !ro = output{ bufR=ow }
+ in (# st', why, ri, ro #)
+ loop :: Int -> Int -> EncodingBuffer
+ loop !ir !ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
+ | os - ow < 2 = done OutputUnderflow ir ow st0
| otherwise = do
- (c,ir') <- readCharBuf iraw ir
+ let !(# st1, (c,ir') #) = unIO (readCharBuf iraw ir) st0
case ord c of
- x | x < 0x10000 -> if isSurrogate c then done InvalidSequence ir ow else do
- writeWord8Buf oraw ow (fromIntegral (x `shiftR` 8))
- writeWord8Buf oraw (ow+1) (fromIntegral x)
- loop ir' (ow+2)
+ x | x < 0x10000 -> if isSurrogate c then done InvalidSequence ir ow st1 else do
+ let !(# st2, () #) = unIO (writeWord8Buf oraw ow (fromIntegral (x `shiftR` 8))) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw (ow+1) (fromIntegral x)) st2
+ loop ir' (ow+2) st3
| otherwise -> do
- if os - ow < 4 then done OutputUnderflow ir ow else do
+ if os - ow < 4 then done OutputUnderflow ir ow st1 else do
let
n1 = x - 0x10000
c1 = fromIntegral (n1 `shiftR` 18 + 0xD8)
@@ -294,35 +310,39 @@ utf16be_encode
c3 = fromIntegral (n2 `shiftR` 8 + 0xDC)
c4 = fromIntegral n2
--
- writeWord8Buf oraw ow c1
- writeWord8Buf oraw (ow+1) c2
- writeWord8Buf oraw (ow+2) c3
- writeWord8Buf oraw (ow+3) c4
- loop ir' (ow+4)
+ !(# st2, () #) = unIO (writeWord8Buf oraw ow c1) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw (ow+1) c2) st2
+ !(# st4, () #) = unIO (writeWord8Buf oraw (ow+2) c3) st3
+ !(# st5, () #) = unIO (writeWord8Buf oraw (ow+3) c4) st4
+ loop ir' (ow+4) st5
in
- loop ir0 ow0
+ loop ir0 ow0 st
utf16le_encode :: EncodeBuffer
utf16le_encode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
- loop !ir !ow
- | ir >= iw = done InputUnderflow ir ow
- | os - ow < 2 = done OutputUnderflow ir ow
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> EncodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL = 0, bufR = 0 } else input{ bufL = ir }
+ !ro = output{ bufR = ow }
+ in (# st', why, ri, ro #)
+ loop :: Int -> Int -> EncodingBuffer
+ loop !ir !ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
+ | os - ow < 2 = done OutputUnderflow ir ow st0
| otherwise = do
- (c,ir') <- readCharBuf iraw ir
+ let !(# st1, (c,ir') #) = unIO (readCharBuf iraw ir) st0
case ord c of
- x | x < 0x10000 -> if isSurrogate c then done InvalidSequence ir ow else do
- writeWord8Buf oraw ow (fromIntegral x)
- writeWord8Buf oraw (ow+1) (fromIntegral (x `shiftR` 8))
- loop ir' (ow+2)
+ x | x < 0x10000 -> if isSurrogate c then done InvalidSequence ir ow st1 else do
+ let !(# st2, () #) = unIO (writeWord8Buf oraw ow (fromIntegral x)) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw (ow+1) (fromIntegral (x `shiftR` 8))) st2
+ loop ir' (ow+2) st3
| otherwise ->
- if os - ow < 4 then done OutputUnderflow ir ow else do
+ if os - ow < 4 then done OutputUnderflow ir ow st1 else do
let
n1 = x - 0x10000
c1 = fromIntegral (n1 `shiftR` 18 + 0xD8)
@@ -331,13 +351,13 @@ utf16le_encode
c3 = fromIntegral (n2 `shiftR` 8 + 0xDC)
c4 = fromIntegral n2
--
- writeWord8Buf oraw ow c2
- writeWord8Buf oraw (ow+1) c1
- writeWord8Buf oraw (ow+2) c4
- writeWord8Buf oraw (ow+3) c3
- loop ir' (ow+4)
+ !(# st2, () #) = unIO (writeWord8Buf oraw ow c2) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw (ow+1) c1) st2
+ !(# st4, () #) = unIO (writeWord8Buf oraw (ow+2) c4) st3
+ !(# st5, () #) = unIO (writeWord8Buf oraw (ow+3) c3) st4
+ loop ir' (ow+4) st5
in
- loop ir0 ow0
+ loop ir0 ow0 st
chr2 :: Word16 -> Word16 -> Char
chr2 (W16# a#) (W16# b#) = C# (chr# (upper# +# lower# +# 0x10000#))
=====================================
libraries/base/GHC/IO/Encoding/UTF32.hs
=====================================
@@ -3,6 +3,7 @@
, BangPatterns
, NondecreasingIndentation
, MagicHash
+ , UnboxedTuples
#-}
{-# OPTIONS_GHC -funbox-strict-fields #-}
@@ -65,8 +66,8 @@ utf32_DF :: CodingFailureMode -> IO (TextDecoder (Maybe DecodeBuffer))
utf32_DF cfm = do
seen_bom <- newIORef Nothing
return (BufferCodec {
- encode = utf32_decode seen_bom,
- recover = recoverDecode cfm,
+ encode# = utf32_decode seen_bom,
+ recover# = recoverDecode# cfm,
close = return (),
getState = readIORef seen_bom,
setState = writeIORef seen_bom
@@ -76,8 +77,8 @@ utf32_EF :: CodingFailureMode -> IO (TextEncoder Bool)
utf32_EF cfm = do
done_bom <- newIORef False
return (BufferCodec {
- encode = utf32_encode done_bom,
- recover = recoverEncode cfm,
+ encode# = utf32_encode done_bom,
+ recover# = recoverEncode# cfm,
close = return (),
getState = readIORef done_bom,
setState = writeIORef done_bom
@@ -86,43 +87,45 @@ utf32_EF cfm = do
utf32_encode :: IORef Bool -> EncodeBuffer
utf32_encode done_bom input
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow, bufSize=os }
+ st0
= do
- b <- readIORef done_bom
- if b then utf32_native_encode input output
+ let !(# st1, b #) = unIO (readIORef done_bom) st0
+ if b then utf32_native_encode input output st1
else if os - ow < 4
- then return (OutputUnderflow, input,output)
+ then (# st1,OutputUnderflow,input,output #)
else do
- writeIORef done_bom True
- writeWord8Buf oraw ow bom0
- writeWord8Buf oraw (ow+1) bom1
- writeWord8Buf oraw (ow+2) bom2
- writeWord8Buf oraw (ow+3) bom3
- utf32_native_encode input output{ bufR = ow+4 }
+ let !(# st2, () #) = unIO (writeIORef done_bom True) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw ow bom0) st2
+ !(# st4, () #) = unIO (writeWord8Buf oraw (ow+1) bom1) st3
+ !(# st5, () #) = unIO (writeWord8Buf oraw (ow+2) bom2) st4
+ !(# st6, () #) = unIO (writeWord8Buf oraw (ow+3) bom3) st5
+ utf32_native_encode input output{ bufR = ow+4 } st6
utf32_decode :: IORef (Maybe DecodeBuffer) -> DecodeBuffer
utf32_decode seen_bom
input at Buffer{ bufRaw=iraw, bufL=ir, bufR=iw, bufSize=_ }
output
+ st0
= do
- mb <- readIORef seen_bom
+ let !(# st1, mb #) = unIO (readIORef seen_bom) st0
case mb of
- Just decode -> decode input output
+ Just decode -> decode input output st1
Nothing ->
- if iw - ir < 4 then return (InputUnderflow, input,output) else do
- c0 <- readWord8Buf iraw ir
- c1 <- readWord8Buf iraw (ir+1)
- c2 <- readWord8Buf iraw (ir+2)
- c3 <- readWord8Buf iraw (ir+3)
+ if iw - ir < 4 then (# st1,InputUnderflow,input,output #) else do
+ let !(# st2, c0 #) = unIO (readWord8Buf iraw ir ) st1
+ !(# st3, c1 #) = unIO (readWord8Buf iraw (ir+1)) st2
+ !(# st4, c2 #) = unIO (readWord8Buf iraw (ir+2)) st3
+ !(# st5, c3 #) = unIO (readWord8Buf iraw (ir+3)) st4
case () of
- _ | c0 == bom0 && c1 == bom1 && c2 == bom2 && c3 == bom3 -> do
- writeIORef seen_bom (Just utf32be_decode)
- utf32be_decode input{ bufL= ir+4 } output
- _ | c0 == bom3 && c1 == bom2 && c2 == bom1 && c3 == bom0 -> do
- writeIORef seen_bom (Just utf32le_decode)
- utf32le_decode input{ bufL= ir+4 } output
- | otherwise -> do
- writeIORef seen_bom (Just utf32_native_decode)
- utf32_native_decode input output
+ _ | c0 == bom0 && c1 == bom1 && c2 == bom2 && c3 == bom3 ->
+ let !(# st6, () #) = unIO (writeIORef seen_bom (Just utf32be_decode)) st5
+ in utf32be_decode input{ bufL= ir+4 } output st6
+ _ | c0 == bom3 && c1 == bom2 && c2 == bom1 && c3 == bom0 ->
+ let !(# st6, () #) = unIO (writeIORef seen_bom (Just utf32le_decode)) st5
+ in utf32le_decode input{ bufL= ir+4 } output st6
+ | otherwise ->
+ let !(# st6, () #) = unIO (writeIORef seen_bom (Just utf32_native_decode)) st5
+ in utf32_native_decode input output st6
bom0, bom1, bom2, bom3 :: Word8
@@ -153,8 +156,8 @@ mkUTF32be cfm = TextEncoding { textEncodingName = "UTF-32BE",
utf32be_DF :: CodingFailureMode -> IO (TextDecoder ())
utf32be_DF cfm =
return (BufferCodec {
- encode = utf32be_decode,
- recover = recoverDecode cfm,
+ encode# = utf32be_decode,
+ recover# = recoverDecode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -163,8 +166,8 @@ utf32be_DF cfm =
utf32be_EF :: CodingFailureMode -> IO (TextEncoder ())
utf32be_EF cfm =
return (BufferCodec {
- encode = utf32be_encode,
- recover = recoverEncode cfm,
+ encode# = utf32be_encode,
+ recover# = recoverEncode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -183,8 +186,8 @@ mkUTF32le cfm = TextEncoding { textEncodingName = "UTF-32LE",
utf32le_DF :: CodingFailureMode -> IO (TextDecoder ())
utf32le_DF cfm =
return (BufferCodec {
- encode = utf32le_decode,
- recover = recoverDecode cfm,
+ encode# = utf32le_decode,
+ recover# = recoverDecode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -193,8 +196,8 @@ utf32le_DF cfm =
utf32le_EF :: CodingFailureMode -> IO (TextEncoder ())
utf32le_EF cfm =
return (BufferCodec {
- encode = utf32le_encode,
- recover = recoverEncode cfm,
+ encode# = utf32le_encode,
+ recover# = recoverEncode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -205,105 +208,122 @@ utf32be_decode :: DecodeBuffer
utf32be_decode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- loop !ir !ow
- | ow >= os = done OutputUnderflow ir ow
- | iw - ir < 4 = done InputUnderflow ir ow
+ loop :: Int -> Int -> DecodingBuffer
+ loop !ir !ow st0
+ | ow >= os = done OutputUnderflow ir ow st0
+ | iw - ir < 4 = done InputUnderflow ir ow st0
| otherwise = do
- c0 <- readWord8Buf iraw ir
- c1 <- readWord8Buf iraw (ir+1)
- c2 <- readWord8Buf iraw (ir+2)
- c3 <- readWord8Buf iraw (ir+3)
+ let !(# st1, c0 #) = unIO (readWord8Buf iraw ir ) st0
+ !(# st2, c1 #) = unIO (readWord8Buf iraw (ir+1)) st1
+ !(# st3, c2 #) = unIO (readWord8Buf iraw (ir+2)) st2
+ !(# st4, c3 #) = unIO (readWord8Buf iraw (ir+3)) st3
let x1 = chr4 c0 c1 c2 c3
- if not (validate x1) then invalid else do
- ow' <- writeCharBuf oraw ow x1
- loop (ir+4) ow'
+ if not (validate x1) then invalid st4 else do
+ let !(# st5, ow' #) = unIO (writeCharBuf oraw ow x1) st4
+ loop (ir+4) ow' st5
where
- invalid = done InvalidSequence ir ow
+ invalid :: DecodingBuffer
+ invalid st' = done InvalidSequence ir ow st'
-- lambda-lifted, to avoid thunks being built in the inner-loop:
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> DecodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL=0, bufR=0 } else input{ bufL=ir }
+ !ro = output{ bufR=ow }
+ in (# st', why, ri, ro #)
in
- loop ir0 ow0
+ loop ir0 ow0 st
utf32le_decode :: DecodeBuffer
utf32le_decode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- loop !ir !ow
- | ow >= os = done OutputUnderflow ir ow
- | iw - ir < 4 = done InputUnderflow ir ow
+ loop :: Int -> Int -> DecodingBuffer
+ loop !ir !ow st0
+ | ow >= os = done OutputUnderflow ir ow st0
+ | iw - ir < 4 = done InputUnderflow ir ow st0
| otherwise = do
- c0 <- readWord8Buf iraw ir
- c1 <- readWord8Buf iraw (ir+1)
- c2 <- readWord8Buf iraw (ir+2)
- c3 <- readWord8Buf iraw (ir+3)
+ let !(# st1, c0 #) = unIO (readWord8Buf iraw ir ) st0
+ !(# st2, c1 #) = unIO (readWord8Buf iraw (ir+1)) st1
+ !(# st3, c2 #) = unIO (readWord8Buf iraw (ir+2)) st2
+ !(# st4, c3 #) = unIO (readWord8Buf iraw (ir+3)) st3
let x1 = chr4 c3 c2 c1 c0
- if not (validate x1) then invalid else do
- ow' <- writeCharBuf oraw ow x1
- loop (ir+4) ow'
+ if not (validate x1) then invalid st4 else do
+ let (# st5, ow' #) = unIO (writeCharBuf oraw ow x1) st4
+ loop (ir+4) ow' st5
where
- invalid = done InvalidSequence ir ow
+ invalid :: DecodingBuffer
+ invalid st' = done InvalidSequence ir ow st'
-- lambda-lifted, to avoid thunks being built in the inner-loop:
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> DecodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL=0, bufR=0 } else input{ bufL=ir }
+ !ro = output{ bufR=ow }
+ in (# st', why, ri, ro #)
in
- loop ir0 ow0
+ loop ir0 ow0 st
utf32be_encode :: EncodeBuffer
utf32be_encode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
- loop !ir !ow
- | ir >= iw = done InputUnderflow ir ow
- | os - ow < 4 = done OutputUnderflow ir ow
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> EncodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL=0, bufR=0 } else input{ bufL=ir }
+ !ro = output{ bufR=ow }
+ in (# st', why, ri, ro #)
+ loop :: Int -> Int -> EncodingBuffer
+ loop !ir !ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
+ | os - ow < 4 = done OutputUnderflow ir ow st0
| otherwise = do
- (c,ir') <- readCharBuf iraw ir
- if isSurrogate c then done InvalidSequence ir ow else do
+ let !(# st1, (c,ir') #) = unIO (readCharBuf iraw ir) st0
+ if isSurrogate c then done InvalidSequence ir ow st1 else do
let (c0,c1,c2,c3) = ord4 c
- writeWord8Buf oraw ow c0
- writeWord8Buf oraw (ow+1) c1
- writeWord8Buf oraw (ow+2) c2
- writeWord8Buf oraw (ow+3) c3
- loop ir' (ow+4)
+ !(# st2, () #) = unIO (writeWord8Buf oraw ow c0) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw (ow+1) c1) st2
+ !(# st4, () #) = unIO (writeWord8Buf oraw (ow+2) c2) st3
+ !(# st5, () #) = unIO (writeWord8Buf oraw (ow+3) c3) st4
+ loop ir' (ow+4) st5
in
- loop ir0 ow0
+ loop ir0 ow0 st
utf32le_encode :: EncodeBuffer
utf32le_encode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
- loop !ir !ow
- | ir >= iw = done InputUnderflow ir ow
- | os - ow < 4 = done OutputUnderflow ir ow
+ done :: CodingProgress -> Int -> Int -> EncodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL=0, bufR=0 } else input{ bufL=ir }
+ !ro = output{ bufR=ow }
+ in (# st', why, ri, ro #)
+ loop :: Int -> Int -> EncodingBuffer
+ loop !ir !ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
+ | os - ow < 4 = done OutputUnderflow ir ow st0
| otherwise = do
- (c,ir') <- readCharBuf iraw ir
- if isSurrogate c then done InvalidSequence ir ow else do
+ let !(# st1, (c,ir') #) = unIO (readCharBuf iraw ir) st0
+ if isSurrogate c then done InvalidSequence ir ow st1 else do
let (c0,c1,c2,c3) = ord4 c
- writeWord8Buf oraw ow c3
- writeWord8Buf oraw (ow+1) c2
- writeWord8Buf oraw (ow+2) c1
- writeWord8Buf oraw (ow+3) c0
- loop ir' (ow+4)
+ !(# st2, () #) = unIO (writeWord8Buf oraw ow c3) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw (ow+1) c2) st2
+ !(# st4, () #) = unIO (writeWord8Buf oraw (ow+2) c1) st3
+ !(# st5, () #) = unIO (writeWord8Buf oraw (ow+3) c0) st4
+ loop ir' (ow+4) st5
in
- loop ir0 ow0
+ loop ir0 ow0 st
chr4 :: Word8 -> Word8 -> Word8 -> Word8 -> Char
chr4 (W8# x1#) (W8# x2#) (W8# x3#) (W8# x4#) =
=====================================
libraries/base/GHC/IO/Encoding/UTF8.hs
=====================================
@@ -3,6 +3,7 @@
, BangPatterns
, NondecreasingIndentation
, MagicHash
+ , UnboxedTuples
#-}
{-# OPTIONS_GHC -funbox-strict-fields #-}
@@ -57,8 +58,8 @@ mkUTF8 cfm = TextEncoding { textEncodingName = "UTF-8",
utf8_DF :: CodingFailureMode -> IO (TextDecoder ())
utf8_DF cfm =
return (BufferCodec {
- encode = utf8_decode,
- recover = recoverDecode cfm,
+ encode# = utf8_decode,
+ recover# = recoverDecode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -67,8 +68,8 @@ utf8_DF cfm =
utf8_EF :: CodingFailureMode -> IO (TextEncoder ())
utf8_EF cfm =
return (BufferCodec {
- encode = utf8_encode,
- recover = recoverEncode cfm,
+ encode# = utf8_encode,
+ recover# = recoverEncode# cfm,
close = return (),
getState = return (),
setState = const $ return ()
@@ -86,8 +87,8 @@ utf8_bom_DF :: CodingFailureMode -> IO (TextDecoder Bool)
utf8_bom_DF cfm = do
ref <- newIORef True
return (BufferCodec {
- encode = utf8_bom_decode ref,
- recover = recoverDecode cfm,
+ encode# = utf8_bom_decode ref,
+ recover# = recoverDecode# cfm,
close = return (),
getState = readIORef ref,
setState = writeIORef ref
@@ -97,8 +98,8 @@ utf8_bom_EF :: CodingFailureMode -> IO (TextEncoder Bool)
utf8_bom_EF cfm = do
ref <- newIORef True
return (BufferCodec {
- encode = utf8_bom_encode ref,
- recover = recoverEncode cfm,
+ encode# = utf8_bom_encode ref,
+ recover# = recoverEncode# cfm,
close = return (),
getState = readIORef ref,
setState = writeIORef ref
@@ -108,39 +109,41 @@ utf8_bom_decode :: IORef Bool -> DecodeBuffer
utf8_bom_decode ref
input at Buffer{ bufRaw=iraw, bufL=ir, bufR=iw, bufSize=_ }
output
+ st0
= do
- first <- readIORef ref
+ let (# st1, first #) = unIO (readIORef ref) st0
if not first
- then utf8_decode input output
+ then utf8_decode input output st1
else do
- let no_bom = do writeIORef ref False; utf8_decode input output
- if iw - ir < 1 then return (InputUnderflow,input,output) else do
- c0 <- readWord8Buf iraw ir
+ let no_bom = let !(# st', () #) = unIO (writeIORef ref False) st1 in utf8_decode input output st'
+ if iw - ir < 1 then (# st1,InputUnderflow,input,output #) else do
+ let !(# st2, c0 #) = unIO (readWord8Buf iraw ir) st1
if (c0 /= bom0) then no_bom else do
- if iw - ir < 2 then return (InputUnderflow,input,output) else do
- c1 <- readWord8Buf iraw (ir+1)
+ if iw - ir < 2 then (# st2,InputUnderflow,input,output #) else do
+ let !(# st3, c1 #) = unIO (readWord8Buf iraw (ir+1)) st2
if (c1 /= bom1) then no_bom else do
- if iw - ir < 3 then return (InputUnderflow,input,output) else do
- c2 <- readWord8Buf iraw (ir+2)
+ if iw - ir < 3 then (# st3,InputUnderflow,input,output #) else do
+ let !(# st4, c2 #) = unIO (readWord8Buf iraw (ir+2)) st3
if (c2 /= bom2) then no_bom else do
-- found a BOM, ignore it and carry on
- writeIORef ref False
- utf8_decode input{ bufL = ir + 3 } output
+ let !(# st5, () #) = unIO (writeIORef ref False) st4
+ utf8_decode input{ bufL = ir + 3 } output st5
utf8_bom_encode :: IORef Bool -> EncodeBuffer
utf8_bom_encode ref input
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow, bufSize=os }
+ st0
= do
- b <- readIORef ref
- if not b then utf8_encode input output
+ let !(# st1, b #) = unIO (readIORef ref) st0
+ if not b then utf8_encode input output st1
else if os - ow < 3
- then return (OutputUnderflow,input,output)
+ then (# st1,OutputUnderflow,input,output #)
else do
- writeIORef ref False
- writeWord8Buf oraw ow bom0
- writeWord8Buf oraw (ow+1) bom1
- writeWord8Buf oraw (ow+2) bom2
- utf8_encode input output{ bufR = ow+3 }
+ let !(# st2, () #) = unIO (writeIORef ref False) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw ow bom0) st2
+ !(# st4, () #) = unIO (writeWord8Buf oraw (ow+1) bom1) st3
+ !(# st5, () #) = unIO (writeWord8Buf oraw (ow+2) bom2) st4
+ utf8_encode input output{ bufR = ow+3 } st5
bom0, bom1, bom2 :: Word8
bom0 = 0xef
@@ -151,111 +154,120 @@ utf8_decode :: DecodeBuffer
utf8_decode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- loop !ir !ow
- | ow >= os = done OutputUnderflow ir ow
- | ir >= iw = done InputUnderflow ir ow
+ loop :: Int -> Int -> DecodingBuffer
+ loop !ir !ow st0
+ | ow >= os = done OutputUnderflow ir ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
| otherwise = do
- c0 <- readWord8Buf iraw ir
+ let !(# st1, c0 #) = unIO (readWord8Buf iraw ir) st0
case c0 of
_ | c0 <= 0x7f -> do
- ow' <- writeCharBuf oraw ow (unsafeChr (fromIntegral c0))
- loop (ir+1) ow'
- | c0 >= 0xc0 && c0 <= 0xc1 -> invalid -- Overlong forms
+ let !(# st2, ow' #) = unIO (writeCharBuf oraw ow (unsafeChr (fromIntegral c0))) st1
+ loop (ir+1) ow' st2
+ | c0 >= 0xc0 && c0 <= 0xc1 -> invalid st1 -- Overlong forms
| c0 >= 0xc2 && c0 <= 0xdf ->
- if iw - ir < 2 then done InputUnderflow ir ow else do
- c1 <- readWord8Buf iraw (ir+1)
- if (c1 < 0x80 || c1 >= 0xc0) then invalid else do
- ow' <- writeCharBuf oraw ow (chr2 c0 c1)
- loop (ir+2) ow'
+ if iw - ir < 2 then done InputUnderflow ir ow st1 else do
+ let !(# st2, c1 #) = unIO (readWord8Buf iraw (ir+1)) st1
+ if (c1 < 0x80 || c1 >= 0xc0) then invalid st2 else do
+ let !(# st3, ow' #) = unIO (writeCharBuf oraw ow (chr2 c0 c1)) st2
+ loop (ir+2) ow' st3
| c0 >= 0xe0 && c0 <= 0xef ->
case iw - ir of
- 1 -> done InputUnderflow ir ow
+ 1 -> done InputUnderflow ir ow st1
2 -> do -- check for an error even when we don't have
-- the full sequence yet (#3341)
- c1 <- readWord8Buf iraw (ir+1)
+ let !(# st2, c1 #) = unIO (readWord8Buf iraw (ir+1)) st1
if not (validate3 c0 c1 0x80)
- then invalid else done InputUnderflow ir ow
+ then invalid st2 else done InputUnderflow ir ow st2
_ -> do
- c1 <- readWord8Buf iraw (ir+1)
- c2 <- readWord8Buf iraw (ir+2)
- if not (validate3 c0 c1 c2) then invalid else do
- ow' <- writeCharBuf oraw ow (chr3 c0 c1 c2)
- loop (ir+3) ow'
+ let !(# st2, c1 #) = unIO (readWord8Buf iraw (ir+1)) st1
+ let !(# st3, c2 #) = unIO (readWord8Buf iraw (ir+2)) st2
+ if not (validate3 c0 c1 c2) then invalid st3 else do
+ let !(# st4, ow' #) = unIO (writeCharBuf oraw ow (chr3 c0 c1 c2)) st3
+ loop (ir+3) ow' st4
| c0 >= 0xf0 ->
case iw - ir of
- 1 -> done InputUnderflow ir ow
+ 1 -> done InputUnderflow ir ow st1
2 -> do -- check for an error even when we don't have
-- the full sequence yet (#3341)
- c1 <- readWord8Buf iraw (ir+1)
+ let (# st2, c1 #) = unIO (readWord8Buf iraw (ir+1)) st1
if not (validate4 c0 c1 0x80 0x80)
- then invalid else done InputUnderflow ir ow
+ then invalid st2 else done InputUnderflow ir ow st2
3 -> do
- c1 <- readWord8Buf iraw (ir+1)
- c2 <- readWord8Buf iraw (ir+2)
+ let !(# st2, c1 #) = unIO (readWord8Buf iraw (ir+1)) st1
+ !(# st3, c2 #) = unIO (readWord8Buf iraw (ir+2)) st2
if not (validate4 c0 c1 c2 0x80)
- then invalid else done InputUnderflow ir ow
+ then invalid st3 else done InputUnderflow ir ow st3
_ -> do
- c1 <- readWord8Buf iraw (ir+1)
- c2 <- readWord8Buf iraw (ir+2)
- c3 <- readWord8Buf iraw (ir+3)
- if not (validate4 c0 c1 c2 c3) then invalid else do
- ow' <- writeCharBuf oraw ow (chr4 c0 c1 c2 c3)
- loop (ir+4) ow'
+ let !(# st2, c1 #) = unIO (readWord8Buf iraw (ir+1)) st1
+ !(# st3, c2 #) = unIO (readWord8Buf iraw (ir+2)) st2
+ !(# st4, c3 #) = unIO (readWord8Buf iraw (ir+3)) st3
+ if not (validate4 c0 c1 c2 c3) then invalid st4 else do
+ let !(# st5, ow' #) = unIO (writeCharBuf oraw ow (chr4 c0 c1 c2 c3)) st4
+ loop (ir+4) ow' st5
| otherwise ->
- invalid
+ invalid st1
where
- invalid = done InvalidSequence ir ow
+ invalid :: DecodingBuffer
+ invalid st' = done InvalidSequence ir ow st'
-- lambda-lifted, to avoid thunks being built in the inner-loop:
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> DecodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL = 0, bufR = 0} else input{ bufL = ir }
+ !ro = output { bufR = ow }
+ in (# st', why, ri, ro #)
in
- loop ir0 ow0
+ loop ir0 ow0 st
utf8_encode :: EncodeBuffer
utf8_encode
input at Buffer{ bufRaw=iraw, bufL=ir0, bufR=iw, bufSize=_ }
output at Buffer{ bufRaw=oraw, bufL=_, bufR=ow0, bufSize=os }
+ st
= let
- done why !ir !ow = return (why,
- if ir == iw then input{ bufL=0, bufR=0 }
- else input{ bufL=ir },
- output{ bufR=ow })
- loop !ir !ow
- | ow >= os = done OutputUnderflow ir ow
- | ir >= iw = done InputUnderflow ir ow
+ {-# NOINLINE done #-}
+ done :: CodingProgress -> Int -> Int -> EncodingBuffer
+ done why !ir !ow st' =
+ let !ri = if ir == iw then input{ bufL = 0, bufR = 0 } else input{ bufL = ir }
+ !ro = output{ bufR = ow }
+ in (# st', why, ri, ro #)
+ loop :: Int -> Int -> EncodingBuffer
+ loop !ir !ow st0
+ | ow >= os = done OutputUnderflow ir ow st0
+ | ir >= iw = done InputUnderflow ir ow st0
| otherwise = do
- (c,ir') <- readCharBuf iraw ir
+ let (# st1, (c,ir') #) = unIO (readCharBuf iraw ir) st0
case ord c of
x | x <= 0x7F -> do
- writeWord8Buf oraw ow (fromIntegral x)
- loop ir' (ow+1)
+ let !(# st2, () #) = unIO (writeWord8Buf oraw ow (fromIntegral x)) st1
+ loop ir' (ow+1) st2
| x <= 0x07FF ->
- if os - ow < 2 then done OutputUnderflow ir ow else do
+ if os - ow < 2 then done OutputUnderflow ir ow st1 else do
let (c1,c2) = ord2 c
- writeWord8Buf oraw ow c1
- writeWord8Buf oraw (ow+1) c2
- loop ir' (ow+2)
- | x <= 0xFFFF -> if isSurrogate c then done InvalidSequence ir ow else do
- if os - ow < 3 then done OutputUnderflow ir ow else do
+ !(# st2, () #) = unIO (writeWord8Buf oraw ow c1) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw (ow+1) c2) st2
+ loop ir' (ow+2) st3
+ | x <= 0xFFFF -> if isSurrogate c then done InvalidSequence ir ow st1 else do
+ if os - ow < 3 then done OutputUnderflow ir ow st1 else do
let (c1,c2,c3) = ord3 c
- writeWord8Buf oraw ow c1
- writeWord8Buf oraw (ow+1) c2
- writeWord8Buf oraw (ow+2) c3
- loop ir' (ow+3)
+ !(# st2, () #) = unIO (writeWord8Buf oraw ow c1) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw (ow+1) c2) st2
+ !(# st4, () #) = unIO (writeWord8Buf oraw (ow+2) c3) st3
+ loop ir' (ow+3) st4
| otherwise -> do
- if os - ow < 4 then done OutputUnderflow ir ow else do
+ if os - ow < 4 then done OutputUnderflow ir ow st1 else do
let (c1,c2,c3,c4) = ord4 c
- writeWord8Buf oraw ow c1
- writeWord8Buf oraw (ow+1) c2
- writeWord8Buf oraw (ow+2) c3
- writeWord8Buf oraw (ow+3) c4
- loop ir' (ow+4)
+ !(# st2, () #) = unIO (writeWord8Buf oraw ow c1) st1
+ !(# st3, () #) = unIO (writeWord8Buf oraw (ow+1) c2) st2
+ !(# st4, () #) = unIO (writeWord8Buf oraw (ow+2) c3) st3
+ !(# st5, () #) = unIO (writeWord8Buf oraw (ow+3) c4) st4
+ loop ir' (ow+4) st5
in
- loop ir0 ow0
+ loop ir0 ow0 st
-- -----------------------------------------------------------------------------
-- UTF-8 primitives, lifted from Data.Text.Fusion.Utf8
=====================================
libraries/base/GHC/IO/Handle/Internals.hs
=====================================
@@ -59,7 +59,7 @@ module GHC.IO.Handle.Internals (
import GHC.IO
import GHC.IO.IOMode
import GHC.IO.Encoding as Encoding
-import GHC.IO.Encoding.Types (CodeBuffer)
+import GHC.IO.Encoding.Types (CodeBuffer, encode, recover)
import GHC.IO.Handle.Types
import GHC.IO.Buffer
import GHC.IO.BufferedIO (BufferedIO)
=====================================
libraries/filepath
=====================================
@@ -1 +1 @@
-Subproject commit 7139cd3383a2aae440a57b5604a8182d9a983715
+Subproject commit fefa911af3a67d913ef2f28cdf79a5c6b6633552
=====================================
libraries/haskeline
=====================================
@@ -1 +1 @@
-Subproject commit ad40faf532ca86ae6d0839a299234db2ce4fc424
+Subproject commit a3876935fb3a5c5969992b5abe58b84d63a68cab
=====================================
rts/RtsSymbols.c
=====================================
@@ -925,6 +925,7 @@ extern char **environ;
SymI_HasProto(newArena) \
SymI_HasProto(arenaAlloc) \
SymI_HasProto(arenaFree) \
+ SymI_HasProto(rts_clearMemory) \
RTS_USER_SIGNALS_SYMBOLS \
RTS_INTCHAR_SYMBOLS
=====================================
rts/include/RtsAPI.h
=====================================
@@ -599,6 +599,51 @@ extern StgWord base_GHCziTopHandler_runNonIO_closure[];
/* ------------------------------------------------------------------------ */
+// This is a public RTS API function that does its best to zero out
+// unused RTS memory. rts_clearMemory() takes the storage manager
+// lock. It's only safe to call rts_clearMemory() when all mutators
+// have stopped and either minor/major garbage collection has just
+// been run.
+//
+// rts_clearMemory() works for all RTS ways on all platforms, though
+// the main intended use case is the pre-initialization of a
+// wasm32-wasi reactor module (#22920). A reactor module is like
+// shared library on other platforms, with foreign exported Haskell
+// functions as entrypoints. At run-time, the user calls hs_init_ghc()
+// to initialize the RTS, after that they can invoke Haskell
+// computation by calling the exported Haskell functions, persisting
+// the memory state across these invocations.
+//
+// Besides hs_init_ghc(), the user may want to invoke some Haskell
+// function to initialize some global state in the user code, this
+// global state is used by subsequent invocations. Now, it's possible
+// to run hs_init_ghc() & custom init logic in Haskell, then snapshot
+// the entire memory into a new wasm module! And the user can call the
+// new wasm module's exports directly, thus eliminating the
+// initialization overhead at run-time entirely.
+//
+// There's one problem though. After the custom init logic runs, the
+// RTS memory contains a lot of garbage data in various places. These
+// garbage data will be snapshotted into the new wasm module, causing
+// a significant size bloat. Therefore, we need an RTS API function
+// that zeros out unused RTS memory.
+//
+// At the end of the day, the custom init function will be a small C
+// function that first calls hs_init_ghc(), then calls a foreign
+// exported Haskell function to initialize whatever global state the
+// other Haskell functions need, followed by a hs_perform_gc() call to
+// do a major GC, and finally an rts_clearMemory() call to zero out
+// the unused RTS memory.
+//
+// Why add rts_clearMemory(), where there's the -DZ RTS flag that
+// zeros freed memory on GC? The -DZ flag actually fills freed memory
+// with a garbage byte like 0xAA, and the flag only works in debug
+// RTS. Why not add a new RTS flag that zeros freed memory on the go?
+// Because it only makes sense to do the zeroing once before
+// snapshotting the memory, but there's no point to pay for the
+// zeroing overhead at the new module's run-time.
+void rts_clearMemory(void);
+
#if defined(__cplusplus)
}
#endif
=====================================
rts/sm/BlockAlloc.c
=====================================
@@ -1395,3 +1395,17 @@ reportUnmarkedBlocks (void)
}
#endif
+
+void clear_free_list(void) {
+ for (uint32_t node = 0; node < n_numa_nodes; ++node) {
+ for (bdescr *bd = free_mblock_list[node]; bd != NULL; bd = bd->link) {
+ clear_blocks(bd);
+ }
+
+ for (int ln = 0; ln < NUM_FREE_LISTS; ++ln) {
+ for (bdescr *bd = free_list[node][ln]; bd != NULL; bd = bd->link) {
+ clear_blocks(bd);
+ }
+ }
+ }
+}
=====================================
rts/sm/BlockAlloc.h
=====================================
@@ -32,4 +32,6 @@ void reportUnmarkedBlocks (void);
extern W_ n_alloc_blocks; // currently allocated blocks
extern W_ hw_alloc_blocks; // high-water allocated blocks
+RTS_PRIVATE void clear_free_list(void);
+
#include "EndPrivate.h"
=====================================
rts/sm/NonMoving.h
=====================================
@@ -356,6 +356,10 @@ void print_thread_list(StgTSO* tso);
#endif
+RTS_PRIVATE void clear_segment(struct NonmovingSegment*);
+
+RTS_PRIVATE void clear_segment_free_blocks(struct NonmovingSegment*);
+
#include "EndPrivate.h"
#endif // CMINUSMINUS
=====================================
rts/sm/NonMovingSweep.c
=====================================
@@ -106,14 +106,16 @@ void nonmovingGcCafs()
debug_caf_list_snapshot = (StgIndStatic*)END_OF_CAF_LIST;
}
-static void
+#endif
+
+void
clear_segment(struct NonmovingSegment* seg)
{
size_t end = ((size_t)seg) + NONMOVING_SEGMENT_SIZE;
memset(&seg->bitmap, 0, end - (size_t)&seg->bitmap);
}
-static void
+void
clear_segment_free_blocks(struct NonmovingSegment* seg)
{
unsigned int block_size = nonmovingSegmentBlockSize(seg);
@@ -125,8 +127,6 @@ clear_segment_free_blocks(struct NonmovingSegment* seg)
}
}
-#endif
-
GNUC_ATTR_HOT void nonmovingSweep(void)
{
while (nonmovingHeap.sweep_list) {
=====================================
rts/sm/Storage.c
=====================================
@@ -1924,3 +1924,46 @@ The compacting collector does nothing to improve megablock
level fragmentation. The role of the compacting GC is to remove object level
fragmentation and to use less memory when collecting. - see #19248
*/
+
+void rts_clearMemory(void) {
+ ACQUIRE_SM_LOCK;
+
+ clear_free_list();
+
+ for (uint32_t i = 0; i < n_nurseries; ++i) {
+ for (bdescr *bd = nurseries[i].blocks; bd; bd = bd->link) {
+ clear_blocks(bd);
+ }
+ }
+
+ for (unsigned int i = 0; i < getNumCapabilities(); ++i) {
+ for (bdescr *bd = getCapability(i)->pinned_object_empty; bd; bd = bd->link) {
+ clear_blocks(bd);
+ }
+
+ for (bdescr *bd = gc_threads[i]->free_blocks; bd; bd = bd->link) {
+ clear_blocks(bd);
+ }
+ }
+
+ if (RtsFlags.GcFlags.useNonmoving)
+ {
+ for (struct NonmovingSegment *seg = nonmovingHeap.free; seg; seg = seg->link) {
+ clear_segment(seg);
+ }
+
+ for (int i = 0; i < NONMOVING_ALLOCA_CNT; ++i) {
+ struct NonmovingAllocator *alloc = nonmovingHeap.allocators[i];
+
+ for (struct NonmovingSegment *seg = alloc->active; seg; seg = seg->link) {
+ clear_segment_free_blocks(seg);
+ }
+
+ for (unsigned int j = 0; j < getNumCapabilities(); ++j) {
+ clear_segment_free_blocks(alloc->current[j]);
+ }
+ }
+ }
+
+ RELEASE_SM_LOCK;
+}
=====================================
rts/sm/Storage.h
=====================================
@@ -206,4 +206,8 @@ extern StgIndStatic * dyn_caf_list;
extern StgIndStatic * debug_caf_list;
extern StgIndStatic * revertible_caf_list;
+STATIC_INLINE void clear_blocks(bdescr *bd) {
+ memset(bd->start, 0, BLOCK_SIZE * bd->blocks);
+}
+
#include "EndPrivate.h"
=====================================
testsuite/tests/ffi/should_run/ffi023_c.c
=====================================
@@ -5,5 +5,6 @@
HsInt out (HsInt x)
{
performMajorGC();
+ rts_clearMemory();
return incall(x);
}
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/1523be1cc721f1ee435f0640cf910c64e91a63f7...edd103973ad9fac0b4e3d3a36e9829384f060a14
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/compare/1523be1cc721f1ee435f0640cf910c64e91a63f7...edd103973ad9fac0b4e3d3a36e9829384f060a14
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20230215/6528e274/attachment-0001.html>
More information about the ghc-commits
mailing list