Importing CSV and limiting the number of columns

I’ve just encountered a silly problem with the way a CSV file that was given to me is formatted. For some reason the creator has used (blank space) as both the delimiter and also as a possible value in the last column where the values represent names (example: Bob Dylan, note the between the first and last name).

I’m trying to gather some stats using LibreCalc but I have no idea how to import this file properly. The Text Import wizard doesn’t give me the ability to limit the number of columns that are to be recognized (it seem just like me the creators never expected for someone to format a CSV in such a silly way).

Let’s say I have the following CSV content (I’ve used # to mark since it’s more readable especially with multiple blanks)

a1###A
a2###B
a3###C
...
a10##A#B
a11##A.B.#C
a12##A#D#(ABC)
...
a998#Y
a999#Z

LibreCalc is smart enough to remove all the excessive blanks so multiple occurrences of a delimiter (### instead of just #) are treated as a single one and the rest is discarded. However setting the delimiter to a whitespace produces the following (obvious) result:

 A     B    C     D    E
----|----|-----|----|------
1   |a1  |A    |    |
2   |a2  |B    |    |
3   |a3  |C    |    |
... |... |...  |... |...
10  |a10 |A    |B   |
11  |a11 |A.B. |C   |
12  |a12 |A    |D   |(ABC)
... |... |...  |... |...
998 |a998|Y    |    |
999 |a999|Z    |    |

The result that I’m looking for is

 A     B    C
----|----|-----
1   |a1  |A
2   |a2  |B
3   |a3  |C
... |... |...
10  |a10 |A B
11  |a11 |A.B. C
12  |a12 |A D (ABC)
... |... |...
998 |a998|Y
999 |a999|Z

Is there a way to fix this CSV using LibreCalc or do I have to manually (or with some code of mine) do this tedious task?

Hi

You could import without separator so as to get everything in one column then proceed in 2 steps:

  • Find & Replace
    • Find: ([^ ]*) *(.*)
    • Replace: $1;$2
    • Other options: tick Regular expressions & Current selection only (if needed)
    • Replace All
  • Select the range then DataText to Columns
    • Separated by: Semicolon

Explanation: you search for

  • anything that is not a space, regardless of the number of characters,
  • followed by one or more spaces,
  • followed by anything.

You replace by the first group, then a separator (; in the example), then the end

In this way you introduce a separator that you can use with the Text to Columns command

Regards