[Haskell-cafe] Sneaking haskell in the workplace -- cleaning csv files

Sat Jun 16 07:08:22 EDT 2007

Tomasz Zielonka wrote:
> On Fri, Jun 15, 2007 at 11:31:36PM +0100, Jim Burton wrote:
>> I think that would only work if there was one column per line...I didn't 
>> make it clear that as well as being comma separated, the delimiter is 
>> around each column, of which there are several on a line so if the 
>> delimiter is ~ a file might look like:
>>
>> ~sdlkfj~, ~dsdkjf~ #eo row1
>> ~sdf
>> dfkj~, ~dfsd~      #eo row 2
> 
> It would be easier to experiment if you could provide us with an
> example input file. If you are worried about revealing sensitive
> information, you can change all characters other then newline,
> ~ and , to "A"s, for example. An accompanying output file, for checking
> correctness, would be even nicer.
> 
Hi Tomasz, I can do that but they do essentially look like the example 
above, except with 10 - 30 columns, more data in each column, and more 
rows, maybe this side of a million. They are produced by an Oracle 
export which escapes the delimiter (often a tilde) from within the cols. 
The output file should have exactly one row per line, with extra 
newlines replaced by a string given as a param (it might be a space or a 
html tag -- I only just remembered this and my initial effort doesn't do 
it).

Thanks,

Jim

> Best regards
> Tomek
> 
>