[Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

Sebastian Sylvan sebastian.sylvan at gmail.com
Fri Jun 15 19:57:17 EDT 2007


On 15/06/07, Jim Burton <jim at sdf-eu.org> wrote:
>
> Sebastian Sylvan wrote:
> > On 15/06/07, Jim Burton <jim at sdf-eu.org> wrote:
> [snip]
> > Hi,
> Hi Sebastian,
> > I haven't compiled this, but you get the general idea:
> >
> > import qualified Data.ByteString.Lazy.Char8 as B
> > -- takes a bytestring representing the file, concats the lines
> > -- then splits it up into "real" lines using the delimiter
> > clean :: Char -> B.ByteString -> [B.ByteString]
> > clean' d = B.split d . B.concat . B.lines
>
> I think that would only work if there was one column per line...I didn't
> make it clear that as well as being comma separated, the delimiter is
> around each column, of which there are several on a line so if the
> delimiter is ~ a file might look like:
>
> ~sdlkfj~, ~dsdkjf~ #eo row1
> ~sdf
> dfkj~, ~dfsd~      #eo row 2



A sorry, I thought the delimiter was a line delimiter. I'm trying to get to
that fusion goodness by using built-in functions as much as possible...

How about this one:

clean del = B.map ( B.filter (/='\n') ) . B.groupBy (\x y -> (x,y) /=
(del,'\n'))

That groupBy will group it into groups which don't have the delimiter
followed by a newline in them (which is the sequence your rows end with),
then it filters out newlines in each row. You might want to filter out
spaces first (if there are any) so that you don't get a space between the
delimiter and newline at the end...


-- 
Sebastian Sylvan
+44(0)7857-300802
UIN: 44640862
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20070616/7933de48/attachment.htm


More information about the Haskell-Cafe mailing list