Ask Your Question
0

Importing CSV and limiting the number of columns

asked 2017-11-21 12:31:48 +0100

rbaleksandar gravatar image

updated 2017-11-21 13:08:07 +0100

I've just encountered a silly problem with the way a CSV file that was given to me is formatted. For some reason the creator has used (blank space) as both the delimiter and also as a possible value in the last column where the values represent names (example: Bob Dylan, note the between the first and last name).

I'm trying to gather some stats using LibreCalc but I have no idea how to import this file properly. The Text Import wizard doesn't give me the ability to limit the number of columns that are to be recognized (it seem just like me the creators never expected for someone to format a CSV in such a silly way).

Let's say I have the following CSV content (I've used # to mark since it's more readable especially with multiple blanks)

a1###A
a2###B
a3###C
...
a10##A#B
a11##A.B.#C
a12##A#D#(ABC)
...
a998#Y
a999#Z

LibreCalc is smart enough to remove all the excessive blanks so multiple occurrences of a delimiter (### instead of just #) are treated as a single one and the rest is discarded. However setting the delimiter to a whitespace produces the following (obvious) result:

 A     B    C     D    E
----|----|-----|----|------
1   |a1  |A    |    |
2   |a2  |B    |    |
3   |a3  |C    |    |
... |... |...  |... |...
10  |a10 |A    |B   |
11  |a11 |A.B. |C   |
12  |a12 |A    |D   |(ABC)
... |... |...  |... |...
998 |a998|Y    |    |
999 |a999|Z    |    |

The result that I'm looking for is

 A     B    C
----|----|-----
1   |a1  |A
2   |a2  |B
3   |a3  |C
... |... |...
10  |a10 |A B
11  |a11 |A.B. C
12  |a12 |A D (ABC)
... |... |...
998 |a998|Y
999 |a999|Z

Is there a way to fix this CSV using LibreCalc or do I have to manually (or with some code of mine) do this tedious task?

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
1

answered 2017-11-21 13:31:08 +0100

pierre-yves samyn gravatar image

Hi

You could import without separator so as to get everything in one column then proceed in 2 steps:

  • Find & Replace
    • Find: ([^ ]*) *(.*)
    • Replace: $1;$2
    • Other options: tick Regular expressions & Current selection only (if needed)
    • Replace All
  • Select the range then DataText to Columns
    • Separated by: Semicolon

Explanation: you search for

  • anything that is not a space, regardless of the number of characters,
  • followed by one or more spaces,
  • followed by anything.

You replace by the first group, then a separator (; in the example), then the end

In this way you introduce a separator that you can use with the Text to Columns command

Regards

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2017-11-21 12:31:48 +0100

Seen: 259 times

Last updated: Nov 21 '17