If a Libre Office Calc exports a csv file and there are cells that are multi-line, then there are newlines in the csv file at the end of every original row but also in every multi-line cell. I am trying to automate an application so manually editing the csv file is out of the question as well as being impossibly difficult on large Calc files.
This is a sample line output from a file I temporarily stored in a file called xxx for test purposes:
3580,"Lantern - Black Hand Lantern","Little Supreme #350","OR","Westchest
Light Co",,"Red Globe",100,"Y","Donated",,,2012,"Display Cabinet 1",,,"yes",
Note the newline makes this look like two lines instead of one. I decided to use sed to get rid of all newlines not immediately preceded by a comma and, in that case, replace the unwanted newline with a space. If there is a newline after a comma, then I want to keep the newline.
My test code looks like this:
cat xxx |sed 's/\([^,]\)|/\1x /'
This should find all newlines preceeded by any character but a comma and replace it with that character followed by a space but it doesn’t work. I read that sed has problems eliminating the last newline on a line but since it can’t tell what the real end of the line is, I tried this:
cat xxx | tr '\n' '|' | sed 's/\([^,]|\)|/\1x /'
This converts all newlines to vertical bars with tr and then uses the same sed but tries to replace any non-comma followed by a vertical bar to the non-comma character followed by a space. Same problem
Now, I know I must be doing something wrong but I can’t figure it out. I started using Unix nearly 40 years ago and used sed a lot but not very much in the last 20 since I retired. This is a volunteer project for a couple of non-profits.
Can someone point out the error in my code or provide an alternate fix that can be easily automated when only the csv file is available?