Check for repetitions inside / outside a given area

hiwwfler · December 11, 2017, 7:22pm

hi there,
I have a calc document with 12 lines of values forming a “block” (e.g. A5-AZ5 to (including) A16-AZ16).
Now I need to check whether there are any repetitions inside the block in the B-Cells (e.g. B5-B16) and if so delete them. Furthermore I need to check if the values of the AM-Cells are repeated anywhere outside the box (e.g. AM5-AM16) and if so delete them. This pattern would have to be repeated till line 1100 where the document ends. So the next “box” would be A17-AZ17 to A28-AZ28).
I have vague feeling that regex might be appropriate but I’ve no clue what to search for or do exactly in my case…
Any help would be much appreciated!

librebel · December 12, 2017, 3:45am

Hello @hiwwfler,

Please clarify regarding column AM:

Does the phrase “and if so delete them” mean: delete the values (rows) inside column AM that have a duplicate outside of column AM, or does it mean: delete the values outside of column AM that have a duplicate inside of column AM?

jimk · December 12, 2017, 5:32am

You made a pretty good attempt at asking the question, but we’ll need a lot more information to be able to answer fully. It sounds simple, but there are many variations on what exactly a duplicate means. Perhaps you could provide a small example showing the input and the desired result.

If B7 = B8 but C7 <> C8, then is that a duplicate? If they are duplicates, then does all of row 8 get deleted, or should only the cell B8 be set to empty? If a row gets deleted, should all of the data be

jimk · December 12, 2017, 5:32am

shifted up, or should the row be left empty?

Does row 7 get deleted too? Should all duplicate rows be deleted automatically, or should they be marked so that you can evaluate them on a case by case basis? Or, instead of deleting, do you simply want a report of all unique values? Is there complex formatting in the blocks, or could it all be moved into a database easily for processing?

jimk · December 12, 2017, 5:41am

It might also be better to split this question up into several smaller questions and ask each one separately. Finally, you might want to take a look at these suggested guidelines for questions, although as I mentioned earlier, you have followed this already to some extent.

This is certainly the longest comment I’ve ever posted. Sorry about that. At least my answer is reasonably short.

hiwwfler · December 12, 2017, 11:21am

I’d like to have the rows outside of column AM deleted, if they contain duplicate values.
I do not know what “<>” means in your next question but since your formula did already solve the B-Cells problem I guess it does not matter anymore.
I do not care if duplicate values are just marked or are instantaneously deleted. Since I have no experience with databases I do not know if they could be moved into one, but the AM-Cells e.g. contain street names and thus letters and spaces.

hiwwfler · December 12, 2017, 11:24am

thanks for honoring my attempt of explaining my problem! I’m glad you’ve managed to understand (most of) it and decided to help me!

jimk · December 12, 2017, 1:53pm

In a spreadsheet formula, <> means either greater or less than, in other words, not equal. It sounds like we could move the data into LibreOffice Base, but what you are asking can apparently be done well enough in Calc, so there is no need.

jimk · December 12, 2017, 5:17am

To start with, let’s say that the data is on “Sheet1” and that “Sheet2” is currently empty. (Otherwise, add another empty sheet to use instead).

In cell A5 on Sheet2, enter the following formula to calculate the block number for that row.

=FLOOR((ROW()-5)/12)

Now in B5, enter this formula to find duplicate values in the current block.

=IF(COUNTIF(INDIRECT("Sheet1.B"&A5*12+5&":Sheet1.B"&A5*12+16);"="&$Sheet1.B5)>1;Sheet1.B5;"")

For the AM cells, see if this does what you need. It simply checks for duplicates across all blocks. Enter it in C5.

=IF(COUNTIF(Sheet1.AM$5:Sheet1.AM$1100;"="&Sheet1.AM5)>1;Sheet1.AM5;"")

Now, select A5 to C5 and fill the formulas down to row 1100.

Now that you can see which values are duplicates, those rows could be selected and deleted. Or, you could use formulas or filters or even a macro to automatically hide or delete them.

EDIT:

Try this in C5 instead. This subtracts all matches from the number of matches in the current block. If there is a difference, then there must be at least one duplicate outside of the current block, so it marks the current row.

=IF(COUNTIF(Sheet1.AM$5:Sheet1.AM$1100;"="&Sheet1.AM5) - COUNTIF(INDIRECT("Sheet1.AM"&A5*12+5&":Sheet1.AM"&A5*12+16);"="&$Sheet1.AM5)>=1; Sheet1.AM5;"")

hiwwfler · December 12, 2017, 11:13am

Thank you so much for your help! Your formula worked flawlessly to detect duplicate values in the B-Cells! As for the AM-Cells I’m not sure what your formula is supposed to do. What it does is just showing the exact same values as in sheet1 for AM5-1100.

hiwwfler · December 12, 2017, 11:36am

To clarify: Let’s say AM5 contains “main street” and so does AM10, AM16 and AM205. Since AM5, AM10 and AM16 are within the same block I do not care, but AM205 is a duplicate outside that block and should be marked or deleted. That’s what I’m trying to achieve for blocks from line 5 to 1100 (2nd block is line 17-28 and so on).

jimk · December 12, 2017, 1:47pm

In that example, because AM5 = AM205, they are duplicates of each other and would both be marked. Likewise, AM10 = AM205 so it would be a duplicate as well, and the same for AM16 = AM205. So, wouldn’t all four of them be marked as duplicates? Now, if AM205 was empty, then none of them should be marked as duplicate. Is that how it should work? See edited answer for a formula that gives these results.

hiwwfler · December 12, 2017, 6:02pm

worked like a charm! thank you very much!

AlexKemp · October 22, 2020, 11:12pm