[Git][ghc/ghc][wip/T25375] Fix CRLF in multiline strings (#25375)
Brandon Chinn (@brandonchinn178)
gitlab at gitlab.haskell.org
Sun Nov 3 00:14:01 UTC 2024
Brandon Chinn pushed to branch wip/T25375 at Glasgow Haskell Compiler / GHC
Commits:
7e963f9e by Brandon Chinn at 2024-11-02T17:13:42-07:00
Fix CRLF in multiline strings (#25375)
- - - - -
6 changed files:
- .gitattributes
- compiler/GHC/Parser/String.hs
- docs/users_guide/exts/multiline_strings.rst
- + testsuite/tests/parser/should_run/T25375.hs
- + testsuite/tests/parser/should_run/T25375.stdout
- testsuite/tests/parser/should_run/all.T
Changes:
=====================================
.gitattributes
=====================================
@@ -2,3 +2,4 @@
# don't convert anything on checkout
* text=auto eol=lf
mk/win32-tarballs.md5sum text=auto eol=LF
+testsuite/tests/parser/should_run/T25375.hs text=auto eol=crlf
=====================================
compiler/GHC/Parser/String.hs
=====================================
@@ -261,12 +261,23 @@ lexMultilineString = lexStringWith processChars processChars
processChars :: HasChar c => [c] -> Either (c, LexErr) [c]
processChars =
collapseGaps -- Step 1
- >>> expandLeadingTabs -- Step 3
- >>> rmCommonWhitespacePrefix -- Step 4
- >>> collapseOnlyWsLines -- Step 5
- >>> rmFirstNewline -- Step 7a
- >>> rmLastNewline -- Step 7b
- >>> resolveEscapes -- Step 8
+ >>> normalizeEOL -- Step 2
+ >>> expandLeadingTabs -- Step 4
+ >>> rmCommonWhitespacePrefix -- Step 5
+ >>> collapseOnlyWsLines -- Step 6
+ >>> rmFirstNewline -- Step 8a
+ >>> rmLastNewline -- Step 8b
+ >>> resolveEscapes -- Step 9
+
+ normalizeEOL :: HasChar c => [c] -> [c]
+ normalizeEOL =
+ let go = \case
+ Char '\r' : c@(Char '\n') : cs -> c : go cs
+ c@(Char '\r') : cs -> setChar '\n' c : go cs
+ c@(Char '\f') : cs -> setChar '\n' c : go cs
+ c : cs -> c : go cs
+ [] -> []
+ in go
-- expands all tabs, since the lexer will verify that tabs can only appear
-- as leading indentation
@@ -354,15 +365,16 @@ the same behavior as HsString, which contains the normalized string
The canonical steps for post processing a multiline string are:
1. Collapse string gaps
-2. Split the string by newlines
-3. Convert leading tabs into spaces
+2. Normalize newline characters
+3. Split the string by newlines
+4. Convert leading tabs into spaces
* In each line, any tabs preceding non-whitespace characters are replaced with spaces up to the next tab stop
-4. Remove common whitespace prefix in every line except the first (see below)
-5. If a line contains only whitespace, remove all of the whitespace
-6. Join the string back with `\n` delimiters
-7a. If the first character of the string is a newline, remove it
-7b. If the last character of the string is a newline, remove it
-8. Interpret escaped characters
+5. Remove common whitespace prefix in every line except the first (see below)
+6. If a line contains only whitespace, remove all of the whitespace
+7. Join the string back with `\n` delimiters
+8a. If the first character of the string is a newline, remove it
+8b. If the last character of the string is a newline, remove it
+9. Interpret escaped characters
The common whitespace prefix can be informally defined as "The longest
prefix of whitespace shared by all lines in the string, excluding the
=====================================
docs/users_guide/exts/multiline_strings.rst
=====================================
@@ -14,6 +14,8 @@ With this extension, GHC now recognizes multiline string literals with ``"""`` d
Normal string literals are lexed, then string gaps are collapsed, then escape characters are resolved. Multiline string literals add the following post-processing steps between collapsing string gaps and resolving escape characters:
+#. Convert all newlines (``\r\n``, ``\n``, ``\r``, ``\f``) to ``\n``
+
#. Split the string by newlines
#. Replace leading tabs with spaces up to the next tab stop
@@ -26,6 +28,8 @@ Normal string literals are lexed, then string gaps are collapsed, then escape ch
#. If the first character of the string is a newline, remove it
+#. If the last character of the string is a newline, remove it
+
Examples
~~~~~~~~
=====================================
testsuite/tests/parser/should_run/T25375.hs
=====================================
@@ -0,0 +1,38 @@
+{-# LANGUAGE MultilineStrings #-}
+
+str1 = unlines
+ [ "aaa"
+ , "bbb"
+ , "ccc"
+ ]
+
+str2 = "aaa\n\
+ \bbb\n\
+ \ccc\n"
+
+str3 = """
+ aaa
+ bbb
+ ccc
+ """
+
+str4 = """
+
+ aaa
+ bbb
+ ccc
+
+ """
+
+str5 = """
+ aaa
+ bbb
+ ccc\n
+ """
+
+main = do
+ print str1
+ print str2
+ print str3
+ print str4
+ print str5
=====================================
testsuite/tests/parser/should_run/T25375.stdout
=====================================
@@ -0,0 +1,5 @@
+"aaa\nbbb\nccc\n"
+"aaa\nbbb\nccc\n"
+"aaa\nbbb\nccc"
+"\naaa\nbbb\nccc\n"
+"aaa\nbbb\nccc\n"
=====================================
testsuite/tests/parser/should_run/all.T
=====================================
@@ -23,3 +23,4 @@ test('RecordDotSyntax5', normal, compile_and_run, [''])
test('ListTuplePunsConstraints', extra_files(['ListTuplePunsConstraints.hs']), ghci_script, ['ListTuplePunsConstraints.script'])
test('MultilineStrings', normal, compile_and_run, [''])
test('MultilineStringsOverloaded', normal, compile_and_run, [''])
+test('T25375', normal, compile_and_run, [''])
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/7e963f9e23f50d8aa2678ca823270d17c6da9afd
--
View it on GitLab: https://gitlab.haskell.org/ghc/ghc/-/commit/7e963f9e23f50d8aa2678ca823270d17c6da9afd
You're receiving this email because of your account on gitlab.haskell.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/ghc-commits/attachments/20241102/1785f5d1/attachment-0001.html>
More information about the ghc-commits
mailing list