[Haskell-beginners] Escaping special characters in text
Stefan Höck
efasckenoth at gmail.com
Thu Jun 11 06:55:15 UTC 2015
On Thu, Jun 11, 2015 at 03:53:41PM +1000, Thomas Koster wrote:
> My program needs to escape and unescape "special characters" in text
> (Data.Text.Text), using my own definition of "special character"
> (isSpecial :: Char -> Bool). I am looking for a library that provides
> functions that implement or help me implement this functionality. I
> don't really care exactly how the special characters are escaped, but
> my preference is to prefix them with backslashes.
Hi Thomas
The answer to your question depends on whether your program needs
additional functionality. If the only thing you need to do is taking
special characters and escaping them with an escape character plus a
substitute character, this can be done with very little code using
functions from Data.Text:
import Data.Text (Text)
import qualified Data.Text as T
-- Character used for escaping
ec :: Char
ec = '$'
-- Replace a character to be escaped with its substitute
escapeChar :: Char -> Char
escapeChar = id
-- Inverse of escapeChar
unescapeChar :: Char -> Char
unescapeChar = id
-- True if given char needs to be escaped
isSpecial :: Char -> Bool
isSpecial = ('?' ==)
-- Escape chars in a given text
escape :: Text -> Text
escape = T.concatMap handleChar
where handleChar c | isSpecial c = T.pack [ec, escapeChar c]
| otherwise = T.singleton c
-- Unescape chars in a given text
unescape :: Text -> Text
unescape t = case T.break (ec ==) t of
(a,b) | T.null b -> a
| otherwise -> let b' = T.tail b
e = unescapeChar $ T.head b'
in T.append a $
T.cons e $ unescape (T.tail b')
This code was loaded into ghci and tested there, so it should compile
(GHC 7.10).
Example:
escape $ T.pack "This?Is?A?Test??"
yields
"This$?Is$?A$?Test$?$?"
'unescape' yields the original string. Note that the implementation does
not handle trailing escape characters: "This$?Is$?A$" will throw an
exception, but this can be remedied with very little additional code.
You of course must provide the correct implementation for 'ec',
'escapeChar', and 'unescapeChar'. These you need to implement no
matter what other library you use.
If on the other hand you want to escape special characters with blocks of text
(instead of single characters as in my code) you probably also need a
second character to mark the end of an escape. Even then, the code
should not get much more involved than the example above.
Text validation and error handling before unescaping adds some more
bloat, but again should be straight forward to add using Either
as a return type.
So, either this is all you need, or we need more information.
Cheers
Stefan
More information about the Beginners
mailing list