[Git][ghc/ghc][wip/chunked-unpackCString] ghc-prim: Strictly in chunks of 32 characters

Ben Gamari gitlab at gitlab.haskell.org
Fri Apr 10 02:07:36 UTC 2020



Ben Gamari pushed to branch wip/chunked-unpackCString at Glasgow Haskell Compiler / GHC


Commits:
0b02c227 by Ben Gamari at 2020-04-10T02:07:27+00:00
ghc-prim: Strictly in chunks of 32 characters

- - - - -


1 changed file:

- libraries/ghc-prim/GHC/CString.hs


Changes:

=====================================
libraries/ghc-prim/GHC/CString.hs
=====================================
@@ -93,23 +93,45 @@ the string and the current offset, saving a word for each character unpacked.
 
 unpackCString# :: Addr# -> [Char]
 {-# NOINLINE CONLIKE unpackCString# #-}
-unpackCString# addr
-    | isTrue# (ch `eqChar#` '\0'#) = []
-    | True                         = C# ch : unpackCString# (addr `plusAddr#` 1#)
-      where
-        -- See Note [unpackCString# iterating over addr]
-        !ch = indexCharOffAddr# addr 0#
-
+     -- See the NOINLINE note on unpackCString#
+unpackCString# addr = unpackAppendCString'# [] addr
 
 unpackAppendCString# :: Addr# -> [Char] -> [Char]
 {-# NOINLINE unpackAppendCString# #-}
      -- See the NOINLINE note on unpackCString#
-unpackAppendCString# addr rest
-    | isTrue# (ch `eqChar#` '\0'#) = rest
-    | True                         = C# ch : unpackAppendCString# (addr `plusAddr#` 1#) rest
-      where
-        -- See Note [unpackCString# iterating over addr]
-        !ch = indexCharOffAddr# addr 0#
+unpackAppendCString# addr rest = unpackAppendCString'# rest addr
+
+-- | This is an local helper to reduce duplication between
+-- 'unpackCString#' and 'unpackAppendCString#'. Because it is inlined the
+-- this gets specialised to @rest = []@ in the former case.
+unpackAppendCString'# :: [Char] -> Addr# -> [Char]
+{-# INLINE unpackAppendCString'# #-}
+unpackAppendCString'# rest0 addr0 = unpackChunk addr0
+  where
+    -- Laziness is expensive: it involves allocating a thunk, then an indirect
+    -- jump, perhaps some cache misses, etc. However, in practice we find that
+    -- most applications tend to use at least *some* of their unpacked string.
+    -- Consequently we unpack eagerly in chunks of this many characters.
+    -- Compared to fully-lazy unpacking this improves runtime of GHC by about
+    -- 0.5%.
+    unpackChunkLen = 32#
+
+    unpackChunk :: Addr# -> [Char]
+    unpackChunk addr = goStrict addr unpackChunkLen
+
+    goStrict :: Addr# -> Int# -> [Char]
+    goStrict addr n
+      | isTrue# (ch `eqChar#` '\0'#) = rest0
+      | isTrue# (n ==# 0#) =
+          -- We've reached the end of our chunk, lazily unpack the next chunk
+          let rest = unpackChunk (addr `plusAddr#` 1#)
+          in C# ch : rest
+      | True =
+          let !rest = goStrict (addr `plusAddr#` 1#) (n -# 1#)
+          in C# ch : rest
+        where
+          -- See Note [unpackCString# iterating over addr]
+          !ch = indexCharOffAddr# addr 0#
 
 unpackFoldrCString# :: Addr# -> (Char  -> a -> a) -> a -> a
 



View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/0b02c227a83b3a0711a5d4e51b515a768f712c46

-- 
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/0b02c227a83b3a0711a5d4e51b515a768f712c46
You're receiving this email because of your account on gitlab.haskell.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20200409/2ec12683/attachment-0001.html>


More information about the ghc-commits mailing list