[Haskell-cafe] Text.JSON and utf8
Martin Hilbig
lists at mhilbig.de
Mon Feb 11 14:56:04 CET 2013
hi,
tl;dr: i propose this patch to Text/JSON/String.hs and would like to
know why it is needed:
@@ -375,7 +375,7 @@
where
go s1 =
case s1 of
- (x :xs) | x < '\x20' || x > '\x7e' -> '\\' : encControl x (go xs)
+ (x :xs) | x < '\x20' -> '\\' : encControl x (go xs)
('"' :xs) -> '\\' : '"' : go xs
('\\':xs) -> '\\' : '\\' : go xs
(x :xs) -> x : go xs
i recently stumbled upon CouchDB telling me i'm sending invalid json.
i basically read lines from a utf8 file with german umlauts and send
them to CouchDB using Text.JSON and Database.CouchDB.
$ file lines.txt
lines.txt: UTF-8 Unicode text
lets take 'ö' as an example. i use LANG=de_DE.utf8
ghci tells
> 'ö'
'\246'
> putChar '\246'
ö
> putChar 'ö'
ö
> :m + Text.JSON Database.CouchDB
> runCouchDB' $ newNamedDoc (db "foo") (doc "bar") (showJSON $
toJSObject [("test","ö")])
*** Exception: HTTP/1.1 400 Bad Request
Server: CouchDB/1.2.1 (Erlang OTP/R15B03)
Date: Mon, 11 Feb 2013 13:24:49 GMT
Content-Type: text/plain; charset=utf-8
Content-Length: 48
Cache-Control: must-revalidate
couchdb log says:
Invalid JSON: {{error,{10,"lexical error: invalid bytes in UTF8
string.\n"}},<<"{\"test\":\"<F6>\"}">>}
this is indeed hex ö:
> :m + Numeric
> putChar $ toEnum $ fst $ head $ readHex "f6"
ö
if i apply the above patch and reinstall JSON and CouchDB the doc
creation works:
> runCouchDB' $ newNamedDoc (db "db") (doc "foo") (showJSON $
toJSObject [("test", "ö")])
Right someRev
but i dont get back the ö i expected:
> Just (_,_,x) <-runCouchDB' $ getDoc (db "foo") (doc "bar") :: IO
(Maybe (Doc,Rev,JSObject String))
> let Ok y = valFromObj "test" =<< readJSON x :: Result String
> y
"\195\188"
> putStrLn y
ü
apperently with curl everything works fine:
$ curl localhost:5984/db/foo -XPUT -d '{"test": "ö"}'
{"ok":true,"id":"foo","rev":"someOtherRev"}
$ curl localhost:5984/db/foo
{"_id":"bars","_rev":"someOtherRev","test":"ö"}
so how can i get my precious ö back? what am i doing wrong or does
Text.JSON need another patch?
another question: why does encControl in Text/JSON/String.hs handle the
cases x < '\x100' and x < '\x1000' even though they can never be
reached with the old predicate in encJSString (x < '\x20')
finally: is '\x7e' the right literal for the job?
thanks for reading
have fun
martin
More information about the Haskell-Cafe
mailing list