Ask Your Question
0

error importing csv file

asked 2017-07-17 16:35:10 +0200

Paul_L gravatar image

I have been importing csv files provided by major bank computers into Excel for years. When I import them into Calc, Version: 5.3.4.2 (x64), Build ID: f82d347ccc0be322489bf7da61d7e4ad13fe2ff3, there are parsing errors. Here are some lines from a csv file.

211,04/06/17,"PAYMENT THANK YOU
",,857.06 ,(468.59),,"5,884.58 ",,"1,106.77 ",

212,04/07/17,"BJ WHOLESALE #0053 WAPPINGERS FA NY ",52.85 ,,(521.44),1,"5,937.43 ",,"1,106.77 ",

213,04/11/17,"WAL-MART #1810 FISHKILL NY ",96.54 ,,(617.98),1,"6,033.97 ",,"1,106.77 ",

214,04/12/17,TU *TRANSUNION 800-493-3292 CA,16.17 ,,(634.15),,"6,033.97 ",,"1,106.77 ",

The problem is in the third field, which is an alphabetic description field. The last line, where the third field is not quoted, imports correctly. The first three lines show the third field on two lines, doubling the height of those three rows. Excel does not do this!

Inspection of those lines with an editor which displays hexadecimal codes shows that the bank's computer sticks an Hex0A or ASCII 10 line feed character immediately before the closing double quote. Your import algorithm incorrectly interprets that as a line return. Within a quoted field line feed characters should probably be ignored.

Paul_L

edit retag flag offensive close merge delete

2 Answers

Sort by » oldest newest most voted
0

answered 2017-07-19 10:22:50 +0200

erAck gravatar image

Import is not wrong. Within quoted fields line breaks are part of the data of the field, that's how CSV is defined. See RFC 4180 and also its Informative References if interested in details.

If Excel does not import the line break then it might be that it ignores a trailing line break, assuming that the creator of the CSV was too stupid, but would fail if the trailing line break was intentional and relevant data.

edit flag offensive delete link more
0

answered 2017-07-17 19:45:15 +0200

librebel gravatar image

Hello @Paul_L,

"Within a quoted field line feed characters should probably be ignored."

As a workaround you could try :

  1. Make a copy of your csv-file,
  2. Find inside your csv-file all occurrences of the linefeed character directly followed by a double quote mark,
  3. Replace all found occurrences by a temporary unique string ending in a double quote mark, ( e.g. My_Unique_LF_string" ),
  4. Import the edited csv-file in Calc,
  5. If the import succeeds, then you can Find all occurrences of the temporary unique string in your Sheet, and Replace them by a linefeed character again.
edit flag offensive delete link more

Comments

Thanks. That's about what I did do yesterday. You should re-evaluate your algorithm's handling of the control characters in a quoted string.

Paul_L gravatar imagePaul_L ( 2017-07-18 10:47:22 +0200 )edit

You're welcome @Paul_L,

btw the algorithm is not mine in any way, i'm just a casual user just like yourself ( as most of us here are ).

librebel gravatar imagelibrebel ( 2017-07-18 13:26:28 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2017-07-17 16:35:10 +0200

Seen: 36 times

Last updated: Jul 19