Macro to remove duplicates in a column

How do i set a column range to its unique values ?
ive seen some code that deletes or hides duplicate rows in a column but i just want the column to contain its unique values without deleting any row
Looking for such a macro

I feel to have read a few thousand questions concerning “duplicates” since I started to visit (and contribute to) support sites (CMS / Q&A) concerning LibO, OO.o, AOO.
In many cases it was about filtering rows concerning more than one columns.
There wasn’t a single questioner who took the trouble to precisely tell what he (f/m) regarded to be a duplicate. Regard the case concerning texts? If not, which specimen of the “duplicates” to keep as the “unique” one? Regard the text “123” a duplicate of the number 123? Regard number formats? 

There were many rejecting the suggestion to use the standard tool.
“Macro please!” Two words to make a volunteer probably spend a relevant amount of time on a question starting with guesses.
No word about the reasons for what a “macro” is thought to be needed instead of an interactive solution based on the UI. No word about what the questioner already had tried.

Throw a glance on
https://forum.openoffice.org/en/forum/search.php?keywords=%2Bmacro+%2Bremove+%2Bduplicates++

2 Likes

I cant understand what the confusion is about
if i haven’t mentioned specifically if its to do with texts numbers or whatever criteria it simply means that part is not important
All i want to know is, is there a way to set a given column to its unique values based on any criteria using a macro.
i just want to know which feature or property to use for it in code
i tried some examples that use properties like filter.duplicates=false etc.
but like i said in my question, it doesnt give me what i need
so is it possible at all to achieve that ?
if yes then someone can show any simple example code , thats all

Are you still waiting? Almost an hour ago @karolus gave you a working code, a minute ago you again write “show any simple example code
”

thanks very much Karry
trying it just now let me see

Whether there are ways or not often depends on details.
“
to set a given column to its unique values
” is very unclear to me.
As you were already told, you can use a filter. This can also be done by a macro (not exactly simple).
But why? The macro would need to be triggered. Would you want to do so automatically after any change in the contents of the column? 


That’s the reason for what we need to know what you actually think to need - exactly.
This is getting annoying.

It wont be annoying at all if u understand my question so I am not to blame.
so for eg. if I ask you how do I pop up a message box then are u going to need the reason and situation for the pop up ???
u would simply tell me the “msgbox” line that’s it
similarly the macro im asking for can be triggered by anything it doesn’t matter.
Its just about whether that particular action can be executed or not.
i said in my question quite clearly that the other things i tried is not what i need coz they delete or hide the duplicate rows, thus affecting its adjacent rows too.
I want to know if the deleting can be avoided and simply leave only the unique values in the column.
whether Libre has this available or not ?
anyway i will try the solution suggested by him in a while
but thanks for your interest and we are friends

Select the Column and go to

→Data→Filter→StandardFilter:
Criteria: <not empty>
Options:
    [x]no Duplicates

thanks but Macro please

but example of Spreadsheet please

u need example ??? for this ?
just take any 1 column in a sheet with some duplicate values and i just need a macro to remove the duplicates and leave the column with unique values
rows should not be deleted or hidden coz there could be other columns next to our column and they should not be affected

Do you want to get an answer or do you prefer to blame experienced volunteers for remarks based on their experience.
Unfortunately I can’t point you to a site for community support meeting your expectations.
Farewell !

aber bitteschön:
please select Cellrange in Question and run:

def remove_duplicates(*_):
    doc = XSCRIPTCONTEXT.getDocument()
    selection = doc.CurrentSelection
    sheet = selection.Spreadsheet
    data = selection.DataArray
    out = []
    for row in data:
        if not row in out and row[0]:
            out.append(row)
    cursor = sheet.createCursorByRange(selection)
    cursor.clearContents( 2**10-1 ) #sum of Cellflags[1]
    cursor.collapseToSize(1,len(out) )
    cursor.setDataArray(out)

[1]Constants' Group CellFlags

gerngeschehn

2 Likes

Some Basic example for filter is here: LibreOffice Calc: Filter via Macro?
I also know the good article, but in Czech (you can use some translator) OpenOffice.cz | Filtrovåní dat v buƈkåch

Example:
standard-filter.ods (242.8 kB)

1 Like

thanks Kammy

Based on python function by @karolus.
LO Basic. Well, there will be a little more lines.
Note 1: The Collection object here is responsible for “uniqueness”.
Note 2: All values are treated as strings. If the data contains the same numeric values in the form of a number and a string, the first one will be left.
Note 3: Case insensitive. The first occurrence will also be left.
Note 4: The cursor.DataArray can accept both an array of arrays and a 2D array.
Note 5: Formulas will be replaced with their values.
Note 6: All formats and comments on cells will be deleted.
Note 7: The merged cells will be converted to a single cell.
If there are merged cells, the current range can be interrupted when selecting using Ctrl+Shift+↓.

The advantages of Python data structures and operators (list, in) are obvious.

Before running the macro, select the required range (not the entire column).

Sub RemoveDuplicates
	Dim col As New Collection
	Dim selection As Object, sheet As Object, cursor As Object
	Dim data(), out()
	Dim i&, item
	Dim prompt$

	prompt = Chr(10) _
	 & "To remove duplicates, select the current region yourself" & Chr(10) _
	 & "and call this utility again."

	selection = ThisComponent.CurrentSelection
	If Not selection.supportsService("com.sun.star.sheet.SheetCellRange") Then
		MsgBox "The cell range is not selected." & prompt _
		 , MB_ICONEXCLAMATION, "Selection Error"
		Exit Sub
	ElseIf selection.Columns.Count > 1 Then
		MsgBox "Only a single-column cell range is supported." & prompt _
		 , MB_ICONEXCLAMATION, "Selection Error"
		Exit Sub
	End If
	sheet = selection.Spreadsheet
	If selection.Rows.Count = sheet.Rows.Count Then
		MsgBox "The entire column selection is not allowed." & prompt _
		 , MB_ICONEXCLAMATION, "Selection Error"
		Exit Sub		
	End If

	data = selection.DataArray

	On Error Resume Next
	For i = 0 To UBound(data)
		item = data(i)(0)
		col.Add item, CStr(item)
	Next
	On Error GoTo 0

	out = CollectionTo2DArray(col)
	cursor = sheet.createCursorByRange(selection)
	With cursor
		.clearContents(2^10 - 1)  'sum of CellFlags (1023)
		.collapseToSize(1, UBound(out) + 1)  'nColumns:=1, nRows
		.DataArray = out
	End With
End Sub

Function CollectionTo2DArray(c As Collection)
	Dim i&
	Dim a(): ReDim a(0 To c.Count - 1, 0)

	For i = 1 To c.Count
		a(i - 1, 0) = c(i)
	Next
	CollectionTo2DArray = a
End Function

Fixed an error:
col.Additem, CStr(item)

Quite a decent result. Usually you should allow the selection of the entire column and in the code intersect the selection with the area used current region.

2 Likes

Everything is as you asked.

1 Like

To get the unique values is also possible with the ScriptForge library like this.

Sub getUniques
	dim oDoc as object, oSheet as object, oRange as object, data(), dataUniques(), i&
	oDoc=ThisComponent
	oSheet=oDoc.CurrentController.ActiveSheet
	oRange=oSheet.getCellRangeByName("A1:A100") 'your column
	data=oRange.getDataArray()
	dim data1D(ubound(data))
	for i=lbound(data) to ubound(data) 'transform array from array(array(value)) to array(value)
		data1d(i)=data(i)(0)
	next i
	GlobalScope.BasicLibraries.loadLibrary("ScriptForge") 'load ScriptForge library
	dataUniques=Unique(data1D, true) 'true=case sensitive
	xray dataUniques
End Sub

But I think it will be faster in Python.

1 Like

LO Basic. This is an advantage, if it is important. But the Сollection object will cope with duplicates faster (no sorting & ReDim Preserve). Python doesn’t count.


Edit: And it looks like Unique() to change the order of the rows due to sorting (did not check).

1 Like