How to parse text in Calc and chose a calculation depending on context

I am trying to take text strings that include a category and one or more numerical values. I have encountered difficulties and am reminded of the joke - “How do you eat an elephant? One bite at a time.”

To provide context, I created the scenario where I have a 3-D printing business and get requests to make 1-D, 2-D, or 3-D objects. Examples would be CIRCLE, DISK, CYLINDER, or SQUARE, TILE, CUBE. Each object would have 1 to 3 numbers associated with it, sufficient to describe the object.

In order to provide the cost, I need to calculate how much material each object requires. Thus I would need to calculate the linear length, area, or volume depending on which object I am making.

A sample data file follows the format: quantity, shape, date required, and 1-3 values delimited with ‘/’. I have no control over the initial format.

5 CIRCLE 10 APR 24 5
20 SQUARE 5 APR 24 2.5
1 RECTANGLE 1 JUL 24 10/15
1 CUBOID 1 JUL 24 10/12.5/18
1 ISOSCELES TRIANGLE 1 JUL 24 4/5/15

The first problem is to parse the data. I had the idea of using the ‘text to columns’ feature in Calc but using the space as delimiter does not work with the multi-word object and the date. One solution is to edit the data to strategically insert commas.

5, CIRCLE, 10 APR 24, 5
20, SQUARE, 5 APR 24, 2.5
1, RECTANGLE, 1 JUL 24, 10/15
1, CUBOID, 8 JUL 24, 10/12.5/18
1, ISOSCELES TRIANGLE, 14 MAR 24, 4/5/15

This edited data will properly expand using “text-to-columns.” I can then calculate new columns (Linear, Area, Volume) using a switch statement referencing the shape column.

At the moment I have a multi-step process with manual editing, some automation, and copy/paste into another spreadsheet or sheet tab for my record.

I am looking for ideas on how to streamline the process. For example, using FIND or REGEX expressions to add the commas and perhaps even bypass the 'text-to-columns" step. This idea has it’s own learning curve and I am not sure if this path would be a huge improvement.

Right now, it is almost easier to type in the data. I would appreciate any ideas on how I can proceed. Besides improving my data entry skills. :roll_eyes:

The decomposition (parsing) of strings with a roughly clean syntax can often be done with the help of the REGEX()) function which is available in LibO versions 6.2 or higher.
What you call a “date”, however, isn’t a date but a mess. You can’t get this fixed on a purely syntactical basis.
See attached:
disask103234stringDecompositionUsingRegex.ods (16.6 KB)
This is how dates look: 2024-03-09 .

3 Likes

Thank you so much. Your examples showing the REGEX function and expressions are very helpful. It would have taken me so much time to figure all that out on my own. I will have to scrutinize your work closely.

I better understand the flaws in that date format. I assumed that the 1- vs. 2-digit day would be the challenge but this part will be much more complex to solve. But for my purposes, I can keep the date field as text. Just getting it into its own column is a major accomplishment.

1 Like

Do you remember the “y2k hysteria”?
For me it’s incredible that now again there are people insisting on the usage of two-digit-year noatations.
Yet this is fact. In addition we have many millions of people believing their localized version of “formatting dates” like “MM/DD/YY” or variants with word-based abbreviations for months or with stubborn local denotation of ordinals might also be usable in data exchange.
There even was a questioner who got reported “dates” from worldwide spread shops and wanted help to distinguish them without offering any additinal information concerning “MM/DD/YY or DD/MM/YY”.

Since then, my attitude has been: “Then let them solve their own problems on their own.”
Those who insist on receiving or disseminating nonsense and dangerous disinformation as “data” will do so without asking me.

Because inconsistency is human, I have already proposed an extended solution to the question before I posted my answer above, and now I am also including it:
disask103234stringDecompositionUsingRegexEnh.ods (23.7 KB)

Warning: Anyone who tries to evaluate stupid as-if-dates is actually acting irresponsibly.