[Haskell-cafe] UTF-8 BOM

Tony Morris tonymorris at gmail.com
Wed Jan 5 02:08:22 CET 2011


I am reading files with System.IO.readFile. Some of these files start
with a UTF-8 Byte Order Marker (0xef 0xbb 0xbf). For some functions that
process this String, this causes choking so I drop the BOM as shown
below. This feels particularly hacky, but I am not in control of many of
these functions (that perhaps could use ByteString with a better solution).

I'm wondering if there is a better way of achieving this goal. Thanks
for any tips.


dropBOM ::
  String
  -> String
dropBOM [] =
  []
dropBOM s@(x:xs) = 
  let unicodeMarker = '\65279' -- UTF-8 BOM
  in if x == unicodeMarker then xs else s

readBOMFile ::
  FilePath
  -> IO String
readBOMFile p =
  dropBOM `fmap` readFile p




-- 
Tony Morris
http://tmorris.net/





More information about the Haskell-Cafe mailing list