[Haskell-cafe] UTF-8 BOM

Gregory Collins greg at gregorycollins.net
Wed Jan 5 19:22:23 CET 2011


Use the text library instead?

On Jan 5, 2011 2:09 AM, "Tony Morris" <tonymorris at gmail.com> wrote:
>
> I am reading files with System.IO.readFile. Some of these files start
> with a UTF-8 Byte Order Marker (0xef 0xbb 0xbf). For some functions that
> process this String, this causes choking so I drop the BOM as shown
> below. This feels particularly hacky, but I am not in control of many of
> these functions (that perhaps could use ByteString with a better
solution).
>
> I'm wondering if there is a better way of achieving this goal. Thanks
> for any tips.
>
>
> dropBOM ::
>  String
>  -> String
> dropBOM [] =
>  []
> dropBOM s@(x:xs) =
>  let unicodeMarker = '\65279' -- UTF-8 BOM
>  in if x == unicodeMarker then xs else s
>
> readBOMFile ::
>  FilePath
>  -> IO String
> readBOMFile p =
>  dropBOM `fmap` readFile p
>
>
>
>
> --
> Tony Morris
> http://tmorris.net/
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20110105/a9a225d4/attachment.htm>


More information about the Haskell-Cafe mailing list