[Haskell-cafe] empty fields are dropped in bytestring csv

Tom Doris tomdoris at gmail.com
Sat Feb 18 21:34:26 CET 2012


Hacky patch to fix this for future reference, against bytestring-csv-0.1.2,
cost center annotations used to anecdotally verify that the change doesn't
significantly impact performance, (interestingly the Alex lexer in
bytestring-csv appears to allocate 1.5GB while lexing a 1.6MB csv file!?)

Text/CSV/ByteString.hs

65c65
<         fields       = [ unquote s | Item s <- line ]
---
>         fields       = [ unquote s | Item s <- pline line]
76a77,86
>
>
> pline fs@(Item x : []) = fs
> pline (Item x : Comma : []) = {-# SCC "plinea" #-} Item x : Comma : Item
S.empty :  []
> pline (Item x : Comma : rs) = {-# SCC "plineb" #-} Item x : Comma : pline
rs
> pline (Comma : []) = {-# SCC "plinec" #-} Comma : Item S.empty : Comma :
Item S.empty : []
> pline (Comma : rs) = {-# SCC "plined" #-} Item S.empty : Comma : pline rs
> pline (Newline : rs ) = []
> pline [] = []
>


On 17 February 2012 23:16, Tom Doris <tomdoris at gmail.com> wrote:

> the bytestring-csv package appears to have a bug whereby empty fields are
> dropped completely from the row, which is different to Text.CSV , which
> will return an empty field in the parse result. I'd argue this is a bug in
> bytestring-csv, anyone know whether this has been raised before, or know of
> a workaround?
>
> Prelude Data.Maybe Data.List Text.CSV.ByteString Data.ByteString.Char8>
> parseCSV $ pack "a,b,c\n1,2,3\n1,,9\n"
> Just [["a","b","c"],["1","2","3"],["1","9"]]
>
> -- the last row has two fields ^
>
> Prelude Text.CSV> parseCSV "/tmp/err" "a,b,c\n1,2,3\n1,,9\n"
> Right [["a","b","c"],["1","2","3"],["1","","9"],[""]]
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120218/069e09cb/attachment.htm>


More information about the Haskell-Cafe mailing list