[Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

Sat Jun 16 03:43:45 EDT 2007

Quoth Tomasz Zielonka <tomasz.zielonka at gmail.com>:
| On Fri, Jun 15, 2007 at 11:31:36PM +0100, Jim Burton wrote:
| > I think that would only work if there was one column per line...I didn't 
| > make it clear that as well as being comma separated, the delimiter is 
| > around each column, of which there are several on a line so if the 
| > delimiter is ~ a file might look like:
| > 
| > ~sdlkfj~, ~dsdkjf~ #eo row1
| > ~sdf
| > dfkj~, ~dfsd~      #eo row 2
|
| It would be easier to experiment if you could provide us with an
| example input file. If you are worried about revealing sensitive
| information, you can change all characters other then newline,
| ~ and , to "A"s, for example. An accompanying output file, for checking
| correctness, would be even nicer.

Yes, especially if there's anyone else as little acquainted with CSV
files as I am!

I have never bothered to learn to work with multiple lines in sed, but
from what I gather so far, the following awk would do it --

   awk '{ if (/~$/) print; else printf "%s", $0 }'

(literal separator for legibility.)  I know we're not exactly looking
for an awk or sed solution here, but thought it might add some context
to the exercise anyway.

	Donn Cave, donn at drizzle.com