[Haskell-beginners] Escaping special characters in text

Stefan Höck efasckenoth at gmail.com
Thu Jun 11 06:55:15 UTC 2015


On Thu, Jun 11, 2015 at 03:53:41PM +1000, Thomas Koster wrote:
> My program needs to escape and unescape "special characters" in text
> (Data.Text.Text), using my own definition of "special character"
> (isSpecial :: Char -> Bool). I am looking for a library that provides
> functions that implement or help me implement this functionality. I
> don't really care exactly how the special characters are escaped, but
> my preference is to prefix them with backslashes.

Hi Thomas

The answer to your question depends on whether your program needs
additional functionality. If the only thing you need to do is taking
special characters and escaping them with an escape character plus a
substitute character, this can be done with very little code using
functions from Data.Text:

  import Data.Text (Text)
  import qualified Data.Text as T
  
  -- Character used for escaping
  ec :: Char
  ec = '$'
  
  -- Replace a character to be escaped with its substitute
  escapeChar :: Char -> Char
  escapeChar = id
  
  -- Inverse of escapeChar
  unescapeChar :: Char -> Char
  unescapeChar = id
  
  -- True if given char needs to be escaped
  isSpecial :: Char -> Bool
  isSpecial = ('?' ==)
  
  -- Escape chars in a given text
  escape :: Text -> Text
  escape = T.concatMap handleChar
    where handleChar c | isSpecial c = T.pack [ec, escapeChar c]
                       | otherwise   = T.singleton c
  
  -- Unescape chars in a given text
  unescape :: Text -> Text
  unescape t = case T.break (ec ==) t of
                 (a,b) | T.null b  -> a 
                       | otherwise -> let b' = T.tail b
                                          e  = unescapeChar $ T.head b'
                                      in T.append a $ 
                                         T.cons e $ unescape (T.tail b')

This code was loaded into ghci and tested there, so it should compile
(GHC 7.10).

Example:

  escape $ T.pack "This?Is?A?Test??"

yields

  "This$?Is$?A$?Test$?$?"

'unescape' yields the original string. Note that the implementation does
not handle trailing escape characters: "This$?Is$?A$" will throw an
exception, but this can be remedied with very little additional code.

You of course must provide the correct implementation for 'ec',
'escapeChar', and 'unescapeChar'. These you need to implement no
matter what other library you use.

If on the other hand you want to escape special characters with blocks of text
(instead of single characters as in my code) you probably also need a
second character to mark the end of an escape. Even then, the code
should not get much more involved than the example above.
Text validation and error handling before unescaping adds some more
bloat, but again should be straight forward to add using Either
as a return type.

So, either this is all you need, or we need more information.

Cheers

Stefan


More information about the Beginners mailing list