Remove duplicates -- "filter" is hard to understand

try an extension Remove Duplicates Fast » Extensions

I will try that. One concern I have is that sometimes I want to remove both instances of duplicates, and occasionally I just want to remove the duplicate and leave behind one instance. Sometimes I want to do some of each in the same document/process! The method I described above in Excel allows you to review all the “dupes” – as they have been highlighted/formatted by conditional format, and then sorted by their format/highlight so they’re all in one place – and then I can assess them and process them in batches, deleting both instances of some and deleting only the dupe on others.

That didn’t work. Again, Conditional Formatting finds several more dupes than this extension does. I took a sheet of 113 lines, 7 of which are/have duplicates according to Conditional Formatting (and verified by me). When I use the Remove Duplicates extension, it reports “We found and deleted 2 duplicated values.
Now we have only 111 unique values” which is just wrong – there are still 4 lines that are/have dupes in the list. The dupes are still highlighted by conditional formatting! That’s a really, really, really big error rate (greater than 50%!).

edit: It does find all 7 if I have it analyze only column A. If I have it analyze Columns A-E, it only finds 4 dupes. Weird. But I guess it works to some degree. Still, it doesn’t accomplish what I need which is the ability to identify dupes and organize them together for analysis before deleting most but not all.

also, this plugin only removes the dupe, not ALSO the original instance. More often than not, I want both instances removed. Only occasionally do I want to leave behind one instance as this extension does by default and it doesn’t appear to be an option the user can adjust.

One sheet for the marketing list. Other sheet for the transactions. A VLOOKUP formula near the marketing list echo the account number if the same account number is at the transaction sheet.

You can sort VLOOKUP and marketing list columns and delete all rows that show numbers for VLOOKUP.

No need to copy from transaction list to marketing list. Only check that VLOOKUP column has the formula.

See the sample file.

Made with LibreOffice 6.4.7.2 (x86); OS: Windows 6.1.


Add Answer is reserved for solutions.

Press edit below your question if you want to add more information; also can comment an answer.

Check the mark (Correct answer mark) to the left of the answer that solves your question.

=IFERROR(VLOOKUP(B3,$transaction.A:A,1,0),"–") that is so beyond my paygrade friend, like BIG lol – as I said elsewhere, I’ve never used a comma in a calculation, just to begin wtih. I think i’ll sooner hunt and peck for the lines that have been highlighted by Conditional Formatting. It’s just very hard from my competency level to gauge how much time and effort I will need to put in to get to a competency level where that calculation is sensible. I’m kinda thinking that just doing it the hard way will save me more time than learning how to do that, but really how can I know, but best guess. I mean…that just looks crazy, I can’t begin to make sense of it this moment. Sorry. As I said to the other person, I trust that this is a valid solution to my problem for a competent user, but it is not a solution I can use at my level.

@default_abuser, Just share a file with one row with fictitious data. I adapt the formula, and you only need to copy and paste the formula in the right place in a copy of your document to give it a try.

I had to remove the green Answer check due to this issue of lag. The lag is so substantial it often locks up Calc for a minute, sometimes 2, and I’m really afraid it’s going to corrupt the file. THis is absolutely NOT a reasonable solution given this. Sorting a column that contains calculations seems to kill LOCalc.

@default_abuser, But it wasn’t my answer that had the green check mark. Again, just share a file with one row with fictitious data. With my proposed answer the lag is negligible.

Indeed, sorting a calculated field in a large data set is inefficient. This only gives rise to further reflection.

  1. In my opinion, the best solution would be to use a database. Large arrays, selections, and sorting are the power of the database.
  2. After performing VLOOKUP calculations, you can disable the automatic recalculation mechanism. Choose Data-> Calculate-> AutoCalculate (uncheck).
    Then sort.
  3. Maybe sorting is unnecessary? Maybe a VLOOKUP filter is enough?