[Haskell-cafe] Escaping of string literals

Michael Snoyman michael at snoyman.com
Fri May 27 06:55:09 CEST 2011


Hi all,

I'm working on a program right now that will involve embedding some
static files inside my Haskell program as bytestrings. I've done this
in the past with file-embed[1]. In this case, I have a strange
requirement: I need to be able to modify the embedded data after the
compiler has run. In general, I've gotten the code to work (and will
soon be following up with a blog post explaining). But in this
process, I ran into something I found interesting. If I run this
program:

    {-# LANGUAGE TemplateHaskell #-}
    import Data.ByteString.Unsafe (unsafePackAddressLen)
    import Language.Haskell.TH.Syntax
    import qualified Data.ByteString as S

    main = do
        fromAddr <- unsafePackAddressLen 7 $(return $ LitE $
StringPrimL "123\0\&456")
        print fromAddr
        let fromStr = S.pack $ map (toEnum . fromEnum) $(return $ LitE
$ StringL "123\0\&456")
        print fromStr

I get the result:

    "123\192\128\&45"
    "123\NUL456"

My understanding is that GHC is automatically escaping the non-ASCII
characters in the strings. When I use StringL, GHC also automatically
unescapes the bytes, so everything works as expected. In the case of
StringPrimL, there is no point at which the unescaping occurs.

I haven't been able to find the rules on how this escaping functions.
Can anyone point it out to me? Also, for my case, it would be very
useful to be able to bypass the escaping. I can easily do so with a
post-process on the executable to embed raw bytes, but I'm wondering
if there's some way to convince StringPrimL not to do any escaping.

Thanks,
Michael

[1] http://hackage.haskell.org/package/file-embed



More information about the Haskell-Cafe mailing list