If delimited Cell contains - add id field to different column

markandeya · January 1, 2022, 2:44pm

I have a sheet about flowers with columns (name, id, colours). The colours column is comma delimited with colour numeric values. I.E. 1 = white, 2 = cream, 3 = pink. Each flower may have multiple colours as a comma delimited cell value. There are 900 rows for 900 flowers. so an example row may be A! = “rose” , B1 = 1, C1= “1,2,3”.
I need to get alll the ids for each colour. For exmple for 1 (white) all ids that have 1 in their colours cell. Then all the ids that have 2 (cream) in their colours cell.
The results for each colour can be in a separate column (maybe with colour number as column head) or any way you think will work. I will then make a separate sheet with these columns of ids one for each colour number. These will be made into JavaScript arrays for a program search algorithm.
Thank you so much for you kind consideration, ideas and time. markandeya

eeigor · January 1, 2022, 8:04pm

colors-and-flower-ids.ods (14.0 KB)
UPDATED:
colors-and-flower-ids (2).ods (14.5 KB)

Your data contains an array of color values, which does not allow you to process them efficiently (the first form of data normalization is broken). Transform your data into simple databse and analyze it in a pivot table (press the button and activate the “pivottable” sheet).

Warning! Do not remove the column headers in the “newdata” sheet (name, id, colour_code, colour_name).
NOTE: ~~If you add data to the “rawdata” sheet, then update the reference to the named range “flowers” (except for headers). Also update the reference to the “colours” range (except for headers)~~. Updated automatically.
Important: The current region must be surrounded by empty columns and rows. How it is done in the example.

In my opinion, this is not necessary. Try a pivot table.
And why are you working with identifiers? Leave them to the computer. Homo sapiens works with more meaningful information. So I added colour names.

joshua4 · January 2, 2022, 1:46am

I’m sure I spent a lot more time on this than @eeigor, but I was learning from eeigor’s work. What I noticed are:

Eeigor’s macro doesn’t erase the data area of the normalized database table before rebuilding. This can leave zombie lines at the end if the total number of color entries is reduced in the source table. I’ve inserted code to delete all but the header row before rebuilding the table.
The VLOOKUP function “secretly” encoded the exact position of the normalized database table in the macro, even though it looked like a person could just redefine the “database” named range and the macro might still work. I’ve used OFFSET to make the VLOOKUP relative during macro operation.
The OQ wanted columns with headers equal to the color IDs and entries for each flower ID going down the appropriate column. I borrowed a good form for the classic INDEX/SMALL/IF from ExcelJet and created a table that matches the exact OQ as I see it. The pivot table is more useful for reading, but it might not be for the envisioned automation.

(Notice that if you extend the array formulas in the column output table beyond the size of the normalized database table then you will get #N/A’s no matter what. ISERROR() won’t stop this because the #N/A’s are injected into the array-included cells when the returning array from the calculation just isn’t big enough to fill the array-included cells. This is avoidable, but it requires ever more use of the column output table location/dimensions, etc.)

Again, none of this is critical of eeigor. I just tried to learn by taking it up a notch and looking for the pitfalls that might happen in something a little closer to actual implementation. The result is thus perhaps a little closer to the OQ request.

colors-and-flower-ids-columns.ods (17.9 KB)

eeigor · January 2, 2022, 7:46am

@joshua4: Thanks for analyzing my example.

Used: =VLOOKUP(C2;colours;2;0)
Or: =VLOOKUP(RC[-1];colours;2;0)
NOTE: If we set the ExcelR1C1 option of formula syntax, then we will see that C2 is RC[-1]. This is a relative reference. OFFSET solves the same problem (Rows:=0;Columns:=-1), but OFFSET is volatile.

I’ll add something else as soon as I have free time.
EDIT: File above updated. See: colors-and-flower-ids (2).ods (14.5 KB).

Or try this:
colors-and-flower-ids (3).ods (14.7 KB)

If necessary, you can create a procedure that will create filter pages from a pivot table: separate sheets named for each color, containing the corresponding flower identifiers.
Unfortunately, the Excel “Show Report Filter Pages” function is missing in LO Calc pivot tables, so there is a reason to create it.
You can use an autofilter on the “newdata” sheet. In general, the ultimate goal of the author is unclear.

Villeroy · January 2, 2022, 10:53am

FlowersColours_72254.odb (207.5 KB)

eeigor · January 2, 2022, 2:08pm

@joshua4: I believe that a small data normalization (elimination of an array of colors, that is, repeating groups of data: color1, color2, etc.) solved the author’s problem (the autofilter allows you to get a set of flower identifiers). The pivot table is needed for a comprehensive analysis, because there are actually a lot of fields. In my example, 4 column references are hardcoded.
Data can be maintained as before, but for analysis it must be transformed to the correct form. Note: the transformed data in the “newdata” sheet contains a lot of redundant data that, if entered in this form, would lead to input errors.

Edit:
@joshua4, I took a peek at your NamedRange.ReferencePosition property, which denotes the cell address of the top left cell of the range. Thank you.

joshua4 · January 2, 2022, 10:03pm

@eeigor: On the “relative referencing” point, you took that as relative within the syntax of the resulting formula. Sorry, I wasn’t clear, by “relative during macro operation” I meant that “C” column was being encoded into the macro, so that there were actually three exterior references to the normalised database table: 1) the Named Range “database”, the Sheet name “newdata” and the constant string “C”. By “relative during macro operation” I meant to bring that to dependence only on the Named Range. R1C1 referencing certainly does that for the “C” question, if active.

I’ll look at the uploads…I want to see your use of the pivot table. Sounds like the OP is very satisfied.

eeigor · January 3, 2022, 11:59am

@joshua4: I’ve added an example just for you. Column C is no longer used. Take a look at the code for the ConvertFormula function. This solution is more versatile.
colors-and-flower-ids (4).ods (15.4 KB)

Always use in the code the R1C1 style (syntax grammar) and convert it to the one used on the sheet. The main advantage of this style is its uniformity. You don’t need to compute range names (addresses).

markandeya · January 2, 2022, 12:05pm

Thank you so much for your great solutions and considerations. They have helped me very much. I have 900 flowers and 9-12 attributes for each, like colour, no. of petals, plant type, leaf shape, leaf edges, plant type, flower cluster type, etc. So i have thousands of entries to make. With your help and instructions i now know how to enter the data correctly and how i can get it out in the form i need. This is a great great help and time saver that also gives me confidence to start data entry because i know it will be useful and the correct way. All my thanks for your generous time and ideas and knowledge. Markandeya

markandeya · January 2, 2022, 2:00pm

Ok 1 more great thanks! Your spread sheets worked with all 900 flower entries and 16 colours. So i can enter my data into the spread sheet example downloads you gave. Wow. You gave me the full solution with working example. I am filled with joy and gratitude. Happy new year to you too. Markandeya

eeigor · January 3, 2022, 9:21am

@markandeya: You have more fields, so I guess you can fix the code yourself. In the screenshot, you can see that 4 columns with indices 0-3 are hardcoded.
Снимок экрана от 2022-01-03 12-17-19