[Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

Donn Cave donn at drizzle.com
Sat Jun 16 03:43:45 EDT 2007


Quoth Tomasz Zielonka <tomasz.zielonka at gmail.com>:
| On Fri, Jun 15, 2007 at 11:31:36PM +0100, Jim Burton wrote:
| > I think that would only work if there was one column per line...I didn't 
| > make it clear that as well as being comma separated, the delimiter is 
| > around each column, of which there are several on a line so if the 
| > delimiter is ~ a file might look like:
| > 
| > ~sdlkfj~, ~dsdkjf~ #eo row1
| > ~sdf
| > dfkj~, ~dfsd~      #eo row 2
|
| It would be easier to experiment if you could provide us with an
| example input file. If you are worried about revealing sensitive
| information, you can change all characters other then newline,
| ~ and , to "A"s, for example. An accompanying output file, for checking
| correctness, would be even nicer.

Yes, especially if there's anyone else as little acquainted with CSV
files as I am!

I have never bothered to learn to work with multiple lines in sed, but
from what I gather so far, the following awk would do it --

   awk '{ if (/~$/) print; else printf "%s", $0 }'

(literal separator for legibility.)  I know we're not exactly looking
for an awk or sed solution here, but thought it might add some context
to the exercise anyway.

	Donn Cave, donn at drizzle.com


More information about the Haskell-Cafe mailing list