Remove duplicates -- "filter" is hard to understand

But, you can use this method for a while in parallel with your usual method.

My boss won’t have much tolerance for mistakes I make ramping up the learning curve–it’s the sort of penalty that makes learning new things not worth it. I don’t think they realize that LO lacks features that Office 2010 had (sort by color/format) and will not be forgiving of errors I make as I ramp up the learning curve on VLOOKUP. I can understand their confusion/frustration; why would/does LO still lack features that were in Excel 10 years ago (I no longer have access to Excel) (and have been requested by LO users for 6 years?)? They may realize it’s not my fault but will not have anyone else to blame, either. I may or may not experiment with VLOOKUP it in my personal time, but in the meantime at work I’ll just be searching for and manually deleting the 200 lines that are intermixed in the other thousands or tens of thousands of lines based on conditional formatting. It won’t be efficient, but it’ll be accurate.

Your position is quite understandable. If the proposed solution does not inspire confidence and cannot be considered reliable, then it could help solve at least part of the problem (the rest of it manually). Can we think about this task together with you? Can you link to an example of your document with at least a couple of lines of fictional data? It would help to understand the data structure and think. There is another option, use the Base application. The SQL query tool is very effective. Best regards!

I would love to think through this task together. I can not however figure out how to upload a file and link to it as you did. Also I’m not sure how to create a fake document that gets the point across since the real documents are troves of real sensitive customer data. Two spreadsheets, each several columns of customer data, each row a new customer. I just copy the entire column of account numbers from the sales sheet, paste all those account numbers at the bottom of the column of account numbers on the marketing sheet, then highlight the dupes. Now all the customers who have made a transaction are highlighted so I can remove them. Then on the marketing sheet I remove the account numbers I had (temporarily) pasted from the Sales sheet, and presto – clean marketing sheet containing only accounts that are not on the sales sheet. And it only takes like 2 minutes – unless you can’t sort by color/format, in which case you’re hunting and pecking forever.

Before that, prepare fake documents.

  1. Make copies of your original documents and keep them in a safe place.

  2. Create a folder for fake documents.

  3. Save each of the documents with a new name (Save As) in this folder. The name should be very different so as not to confuse you in the future. For example, TempClients123, TempMarketing456.

  4. Open each of the fake documents and remove all data lines from it. Correct the column names, if they contain any confidential data. And, make sure filtering is disabled. Leave only rows with column headings.

  5. Look again at all Sheets in the fake documents. Only the column names should remain.

  6. Close the application with documents.

  7. Do Upload as described below.

@Frants, Answers are not intended for comments or adding information.

Every user can upload ODF files to his/her question. See AskLibO: How many "karma" are required for …? 2020

Thanks! There is a lot written there, but I could not figure it out, sorry. Just help, give step-by-step instructions on what to do to the user to upload documents. Best regards!
I think I’m starting to understand …

To upload your documents:

In the form of your question, click the Edit button.
Buttons will appear above the form, click the “attachment” button. A dialog box will appear to select a file. Repeat this for each document.

Don’t forget to click Save.

@Frants, I’m not angry, I just wanted to avoid confusion. I just changed a word in step 7 of your instructions to match the current position of the comments. Thanks for the help given. I’m learning too. Regards!

It works. I don’t like it one bit, it took too long to figure out, and I fear it represents a lot of technical debt for me, but it works.

update: It doesn’t work: Frequent super lag when sort by column of calculations

try an extension Remove Duplicates Fast » Extensions

I will try that. One concern I have is that sometimes I want to remove both instances of duplicates, and occasionally I just want to remove the duplicate and leave behind one instance. Sometimes I want to do some of each in the same document/process! The method I described above in Excel allows you to review all the “dupes” – as they have been highlighted/formatted by conditional format, and then sorted by their format/highlight so they’re all in one place – and then I can assess them and process them in batches, deleting both instances of some and deleting only the dupe on others.

That didn’t work. Again, Conditional Formatting finds several more dupes than this extension does. I took a sheet of 113 lines, 7 of which are/have duplicates according to Conditional Formatting (and verified by me). When I use the Remove Duplicates extension, it reports “We found and deleted 2 duplicated values.
Now we have only 111 unique values” which is just wrong – there are still 4 lines that are/have dupes in the list. The dupes are still highlighted by conditional formatting! That’s a really, really, really big error rate (greater than 50%!).

edit: It does find all 7 if I have it analyze only column A. If I have it analyze Columns A-E, it only finds 4 dupes. Weird. But I guess it works to some degree. Still, it doesn’t accomplish what I need which is the ability to identify dupes and organize them together for analysis before deleting most but not all.

also, this plugin only removes the dupe, not ALSO the original instance. More often than not, I want both instances removed. Only occasionally do I want to leave behind one instance as this extension does by default and it doesn’t appear to be an option the user can adjust.

One sheet for the marketing list. Other sheet for the transactions. A VLOOKUP formula near the marketing list echo the account number if the same account number is at the transaction sheet.

You can sort VLOOKUP and marketing list columns and delete all rows that show numbers for VLOOKUP.

No need to copy from transaction list to marketing list. Only check that VLOOKUP column has the formula.

See the sample file.

Made with LibreOffice 6.4.7.2 (x86); OS: Windows 6.1.


Add Answer is reserved for solutions.

Press edit below your question if you want to add more information; also can comment an answer.

Check the mark (Correct answer mark) to the left of the answer that solves your question.

=IFERROR(VLOOKUP(B3,$transaction.A:A,1,0),"–") that is so beyond my paygrade friend, like BIG lol – as I said elsewhere, I’ve never used a comma in a calculation, just to begin wtih. I think i’ll sooner hunt and peck for the lines that have been highlighted by Conditional Formatting. It’s just very hard from my competency level to gauge how much time and effort I will need to put in to get to a competency level where that calculation is sensible. I’m kinda thinking that just doing it the hard way will save me more time than learning how to do that, but really how can I know, but best guess. I mean…that just looks crazy, I can’t begin to make sense of it this moment. Sorry. As I said to the other person, I trust that this is a valid solution to my problem for a competent user, but it is not a solution I can use at my level.

@default_abuser, Just share a file with one row with fictitious data. I adapt the formula, and you only need to copy and paste the formula in the right place in a copy of your document to give it a try.

I had to remove the green Answer check due to this issue of lag. The lag is so substantial it often locks up Calc for a minute, sometimes 2, and I’m really afraid it’s going to corrupt the file. THis is absolutely NOT a reasonable solution given this. Sorting a column that contains calculations seems to kill LOCalc.

@default_abuser, But it wasn’t my answer that had the green check mark. Again, just share a file with one row with fictitious data. With my proposed answer the lag is negligible.