Ask Your Question
1

Using Calc to open a csv file that has cr/lf mid field

asked 2018-10-11 15:59:11 +0100

David Chapman gravatar image

updated 2018-10-11 17:26:04 +0100

erAck gravatar image

Good day. I'm not too good at explaining, but here goes:

I am trying to open a csv file in Calc. This file was created from the output of SLQCMD.EXE, is delimited with tilde (~) and has CR/LF line ends. Fields are variable length. The first field of each record is an 8 digit number and the last field is. There are several other fields with the value NULL in each record.

The problem I am having is that in some records there is one field containing one or more CR/LF mid-field. Calc is creating a new record at each instance. I would like to replace the mid-field CR/LF with space ' '.

I have tried opening the file in Calc, selecting the problem field and using Replace in column mode search for various combinations of \n, \r, \n\r, \r\n, [:cntrl:] Ctl-M, regular expressions checked. This did not work.

I have also tried sed, tr, and grep to no avail although I don't really know what I am doing.

The only method that has come close to working is to open the cvs file in vim and using substitute like: %s/\n<char>/ <char>/g where <char> is each character on the keyboard one at a time.

I hope this makes sense.... Any ideas appreciated!!

Thanks,

edit retag flag offensive close merge delete

2 Answers

Sort by » oldest newest most voted
1

answered 2018-10-11 17:24:04 +0100

erAck gravatar image

Sounds like a broken CSV format. If fields contain embedded line feeds they must be quoted, with your ~ tilde field delimiter that should look like

field1~"field2
with
line
feeds"~field3

The quote character could be a different one than ", you can choose in the Text Import dialog as String delimiter.

edit flag offensive delete link more
0

answered 2018-10-12 07:11:03 +0100

David Chapman gravatar image

Thank you for the quick response.

You are correct. The fields with line feeds are not quoted. Also, in examining the file with xxd there are times when there is only a line feed 0x0a mid field: 000a0920: 4b49 4e47 5061 796d 656e 7420 696e 2043 KINGPayment in C 000a0930: 6172 0a73 7475 6465 6e74 2070 6963 6b20 ar.student pick

other times CR/LF: 000e96d0: 6e74 0d0a 4f72 6465 7220 6861 7320 6d6f nt..Order has mo

Both cause Calc to create a new record.

So, I need a magical method of replacing <word>[LF|CRLF]<nextword> with "<word>[LF|CRLF]<nextword>" without changing [CRLF]<number>.

Anyone have any thoughts on how I can accomplish this? Or a better idea?

Thanks,

edit flag offensive delete link more

Comments

(You shouldn't use Answer to add to your question.) Thinking that you could use a text editor to Find and Replace the offending character, I tried several editors, NotepadPlus (old now) and Notepad++, but neither of them will search for a CRLF or Return character. There are many other editors; I just don't have them. Perhaps someone else can suggest an editor that can do this.

ve3oat gravatar imageve3oat ( 2018-10-14 01:09:14 +0100 )edit

I wound up using vim search and replace for each letter. Tiresome, but it worked. Since then I have found that cut from GNU Coreutils removes the mid-field CR/LF.

cut -d"~" -f1- infile > outfile

Thanks,

David Chapman gravatar imageDavid Chapman ( 2018-10-18 09:50:16 +0100 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-10-11 15:59:11 +0100

Seen: 50 times

Last updated: Oct 12 '18