Solved: In calc, I have sheets that have record number fields for several years that I am combining. Some of the years have the same record number. How can I automate the find and replace to change the duplicate record number fields to a unique number?

For example, Column “A” are the existing record numbers, Column “B” are the duplicate numbers, and Column “C” are the new numbers (the 4-digit year plus the old number). I want “A” to be updated with the new value from “C” but only if it matches “B”.

EDIT
Solved by adding the 4-digit year to ALL record numbers. Detailed in my answer.

Based on your description (including the indication that this is a one-time process), here is how I suspect I would do it:

Let’s first insert a column. I would probably insert before column A, but that would require us to re-identify the columns in your post – so for the sake of discussion, let’s insert our column as D.

Here is the formula for our new Column D (we’ll assume it is placed in cell D2): “=IF(A2=B2,C2,A2)”

  1. Copy this formula to the remaining cells in Column D corresponding to your dataset.
  2. Review the results in Column D.
  3. Copy Column D to Column A. Use "Paste Special" or "Paste Only" to paste only the results you want (i.e., NOT formulas). For instance, let's assume your "4-digit year plus the old number" results in a text string. You could "Paste Only > Text" to paste the string (not the formula) into Column A from Column D.
  4. Delete Column D.
You have now updated Column A with Column C, but only in cells where Column A and Column B are the same.

Please click the checkmark next to the response you believe best answers your question.

Thanks for the help. I tried that and it did not work for my purpose. I have crash data from 10 years that has several .csv files for each year that need to be input into a database, so it is not exactly a one-time thing. It appears when I use your formula that the only comparison is between A2 and B2. For the sake of this discussion, I need A2 and subsequent “A” cells to be compared to the values in B2:B11. If there is a match, I need to replace only the matching value in “A”.

I’m not sure I understand why you had a problem. The spreadsheet formula contains relative referencing – that means it should respond as you want as you copy the formula from cell D2 to cells D3:D11.

If this is NOT a one-time thing, and if this involves many csv files (and many records), you would probably be better served by an automated approach involving a database.

Thanks. Using your formula, I get the same numbers in C as are in A. My duplicates are spread throughout so an ascending order sort will not lump them together. I will look into doing this with a database.

Just a thought, but have you checked your LO options – particularly those related to LO Calc? For instance, I have seen that enabling regular expressions can have a major impact on formula calculation results.

I had not thought of that and will do so. Thank you. EDIT- I checked and the regular expressions were enabled.

Let me clarify the above- I redid the test sheet and if column B has the duplicate numbers aligned with the same numbers in “A”, then that formula will work. The problem is that the duplicate numbers in “B” are not aligned with “A”.

The formula should work – unless there is something different about the data in columns A and B. If you can’t upload a file for us to see, can you give us a couple of examples with the actual data where the formula should work but doesn’t?

Honestly, I see no reason for this kind of setup, but it’s your problem.

An alternative to LKeith’s: in the combined sheet, put =ROW() as the column A’s cells’ content, then copypaste the other sheets completely, except for the columns containing those numbers, as combined sheet’s column B and so on.

Thank you. See my reply to LKeith for the reason, I have millions of rows with about 300 duplicates. I’m using a test sheet with fewer rows to learn how to do this. I have 20 old numbers in A2:A21, 10 duplicates in B2:B11, and 10 new numbers in C2:C11. I used =IF(ROW(B2:B6),C2,A2) for the formula in D1 and this returned correct values however it would not allow a reasonable replacement of the data in column A, I could not copy and paste since this would overwrite valid non-duplicate numbers.

Here’s what I did to solve this. While it appears to be more work, it is far less work than manually finding and replacing the duplicates. I added the 4-digit year to ALL record numbers for a particular year for each csv file:
New column “A” contains the 4-digit year and has that year as the heading in A1. Column “B” contains existing record numbers and has a heading of Record Number. New column “C” is titled Unique.

For cell C2, I used =($A$1&B2). For example, A1= 2007, B2= 123456, C2= 2007123456. Alternatively, you could use =(A2&B2) to save a few keystrokes.

I copied this formula for all of “C” and was rewarded with an entire set of new record numbers all beginning with the year for easier reference. Following LKeith’s advice, I copied “C” and used paste-special into “B” so only the results appear. I then deleted “A” and “C” and re-titled the old “B” as Record Number.