Macro Calc: Convert column to date - similar "Text to Columns..."

How to make a macro similar to the behavior of the command Main menu> Data> Text to Columns…

Have a lot of data that comes from a csv, some columns are dates but they are in text format. I need to convert to date format to do some sorting.

Via macro I was able to modify the column format to date, but the content is not changed by this formatting modification. Now I need a macro that does the same function as the “Text to Columns…” command, I tried using setFormulaArray to do this but the result was not as expected, some cells were in date format others in text format.

def test_date_format(*args):
    doc = XSCRIPTCONTEXT.getDocument()
    sheet = doc.getCurrentController().getActiveSheet()
    date_column = sheet['D:D']
    
    #Modify column D to date format (DD/MM/YYYY), it works OK
    #number_format_key = 36
    numbers = doc.NumberFormats
    locale = doc.CharLocale
    number_format_key = numbers.queryKey('DD/MM/AAAA', locale, True)
    date_column.NumberFormat = number_format_key

    #does not work as expected, some cells are in date format, others in text format
    content_range = sheet['D2:D30']
    content_value = content_range.DataArray
    content_range.setFormulaArray(content_value)

test_date.ods (31,2,KB)

Would anyone know how to do this, can it be vba or python macro.
Thanks

setFormulaArray:
https://forum.openoffice.org/en/forum/viewtopic.php?p=346911&sid=91487894ffe209a1c1dbd88975497cad#p346911

Create a c.s.s.util.ReplaceDescriptor and replace regex .+ with & within the text cells in question.

What makes you sure that the document’s char locale is the right one? Your macro would fail with most of my documents.

1 Like

date_conversion.ods (34.5 KB)
has some disjunct ranges with non-US English date strings (DD/MM/YYYY) but with USA number formats (style “en_US”). Some of them show a leading apostrophe in the formula bar, some don’t. This is because the text "01/02/2023 evaluates to second of January in US context whereas “13/02/2023” does not evaluate to anything in US context.

The Basic macro finds the date strings, applies cell style “en_UK” and replaces the found strings with themselves. This is like re-entering all the strings into the UK context.

csv is always text, if the data is still text after the import, you have imported with the wrong options!

1 Like

Importing your data properly is by far easier than writing a repair macro. You could also use your Python skills to preprocess the text file and convert the strings to ISO strings. But even then, there is a most simple rule: ALWAYS CHECK OPTION "DETECT SPECIAL NUMBERS" (this option should not even exist, let alone turned off by default)

2 Likes

Actually this option (ALWAYS CHECK OPTION “DETECT SPECIAL NUMBERS”) was disabled here for me and it was generating the error

And what would be the best approach for the locale? Set it static like this?

locale = uno.createUnoStruct('com.sun.star.lang.Locale')
locale.Language = 'pt'
locale.Country = 'BR'
locale.Variant = ''

An unsolicited remark:
Dates should always be strings (text) that comply with ISO 8601 using the separator "-" between YYYY, MM and DD to make the date human readable, and to distinguish it from ordinary numerical text.
This is also the only way to get dates reliably and globally transferable using csv formats.
Any numeric representation formatted as date depends on a NullDate, and reading a date string from a csv (or alike) using the option “Detect special numbers” will apply the NullDate currently set for the importing application. A later transfer as a number to a sheet using a different NullDate will fail without notice. Cells of Writer tables having set a number format produce an additional mess.
(I just tried it again using a LibO V 7.5 sheet document where I had chosen the 1904-01-02 NullDate to make sure that my statement is correct.)
Thus:

  1. Import dates as strings.
  2. If they are complying with the mentioned ISO based format, you can even use them directly in calculations (automatic conversion).
    Unfortunately this automatic conversion is currently (V 24.8.0) broken for pre-Gregorian dates. "1582-10-15" - 1582-10-04" = 11. Should be 1, imo, as youi get it using the DATEVALUE() function.
  3. In different cases make sure that you know the used format, and convert it to ISO by formulas or user code.
  4. No program can decide reliably if 2/1/23 is meant to mean YYYY-01-02 or YYYY-02-01 or what century should be assumed.

The only possible NullDate settings which might be globally introduced are the first day for which the Gregorian calender was defined (1582-10-15) or the day before (1582-10-04). I havent an Excel at hand, but I was told that it still can’t handle dates with a negative representation as numbers correctly. Please tell me if this has changed.
The difference between pre-Gregorian and proleptic Gregorian also seems to be not quite clear. (I’m using "pre-Gregorian" as "Julian". However, Calc is treating 1200 correctly as a leap year the Julian way.
See also tdf#144699 created by @mikekaganski.
(Sorry for the long text. The situation was made complicated already in old times.)

… or 11, as suggested by tdf#144699 :wink:

Anyway, evaluating the mentioned difference the automatic conversion and the explicit conversion using the DATEVALUE() function should yield the same result.
In addition my opinion: Proleptic Gregorian in the sense of =TEXT(DATEVALUE("1582-10-15") - 1; "YYYY-MM-DD") returning 1582-10-14 is a very bad idea even if there should be a “standard” to this effect. As we know that’s an “undate” while “1200-02-29” is a real date in history. As we know standards also can be insufficiently considered (as 1899-12-30=0 was, e.g.).

Really? And 1582-10-15 is a date? In which specific country? :wink:

No, that’s the only reasonable idea out there. Pretending that 1582-10-15 is some special date is telling that a specific event in a specific country is more important than any other country, while using a specific calendar irrespective of any historic event (just because of its objective qualities) is a neutral technology.

Factually the Gregorian calendar was accepted as a “global” calendar by next to all countries and cultures. Even all the “orthodox” christian societies which still calculate their religious holidays the Julian way, accepted it. So did “anti-papal” societies at different times, so did Japan and China, and so did Saudi Arabia in 2016. As far as I can see, also countries which didn’t introduce it explicitly use it factually.

Of course. And that is why it is reasonable to be used in technology like LibreOffice, as the primary calendar.

But the date of 1582-10-15 in this calendar is just a historic artifact, not part of the technology. And treating it specially (a) creates technical problems; and (b) introduces a historical (in this case, only meaningful for a single place in the world) artifact into the technology. E.g., I could possibly argue (indeed, I won’t, but see the point) that since I use it in Russia, let me use Gregorian since the date when it was introduced here (1918-02-14), and Julian for the dates before, so that my use of dates corresponds to the dates in documents of my country?

Let’s not introduce additional and meaningless artifacts - it’s no better than the use of dates like 01/02/03 in the first place.

I can’t agree, but I also don’t want to argue any longer.
Sigh.
Remains the question of consistency (implicit conversion vs, conversion by dedicated function).

Hallo

For your FormulaArray you need to cast the date_strings into iso-formatted datestrings
strftime + strptime

@Villeroy is right, the CharLocale may be different than some number-format-locale

from datetime import datetime as dt

def test_date_format(*args):
    doc = XSCRIPTCONTEXT.getDocument()
    sheet = doc.getCurrentController().getActiveSheet()
    source = sheet['A2:A30']
    data = source.DataArray
    iso_dates = [[f'{dt.strptime(entry,"%d/%m/%Y"):%Y-%m-%d}'
                  for entry in row ]
                 for row in data ]
    
    numbers = doc.NumberFormats
    locale = doc.CharLocale
    number_format_key = numbers.queryKey('DD/MM/AAAA', locale, True)

    target = sheet['D2:D30']
    target.NumberFormat = number_format_key   
    target.setFormulaArray(iso_dates)

edit: 2024-08-22 changed the date-convert to f’string’-formatting

1 Like

This is an approach that is also very useful when I import the contents of a csv file via python and write it to the spreadsheet.
Thanks

convert libreoffice date to python date:

I couldn’t find any BASIC solution, so here it is if someone wants it,

Sub FormatRangeAsNumber(oSheet, oLocale, ByRef oFormats, oRange, formatStr as String)
    ' BASIC equivalent of 'Text to Columns'
    Dim formatNum as Long, i as Integer
    Dim numberArray()

    If oRange.getCellByPosition(0,0).getValue() = 0 Then
        formatNum = oFormats.queryKey(formatStr, oLocale, False)
        ' add formatStr if it doesn't exist
        If formatNum = -1 Then
            formatNum = oFormats.addNew(formatStr, oLocale)
            If formatNum = -1 Then
                MsgBox "Cannot add " & formatStr & " as NumberFormat", 0, "Fatal"
                Exit Sub
            End If
        End If

        numberArray = oRange.getFormulaArray()
        For i = LBound(numberArray) to UBound(numberArray)
            ' Convert string to actual date values
            ' You may have to parse and set it to numberArray(i)(0), if it's different from locale set in libreoffice
            numberArray(i) = Array(CLng(CDate(numberArray(i)(0))))
        Next

        oRange.setDataArray(numberArray)
        oRange.NumberFormat = formatNum
    End If
End Sub

This function can be called as follows,

Dim oLocale as new com.sun.star.lang.Locale
oLocale .Language = 'pt'
oLocale .Country = 'BR'
FormatRangeAsNumber(oSheet, oLocale, oDoc.getNumberFormats(), someRange, "DD/MM/AAAA")

Basic is a problem, but the solution in this case. Just learn how to import csv data properly, and you never need a macro to fix wrongly imported data.
Your code won’t work in most cases of failure.

1 Like

So your code relies solely on Basic’s CDate, where you simply can’t choose anything like locale used for the conversion? Then for your case, the original setFormulaArray approach would work better.

And generally, it would work better from the start, if you use an intermediate locale for cells for conversion, and then apply another final locale for representation.

1 Like

I want to import csv properly, but I haven’t decided on a tool for preprocessing. I want this macro to run headless and regularly, so the import wizard is out of the question for now.

@mikekaganski I tried setFormulaArray method, but it doesn’t seem to work for me. It just re-sets the same strings again. I set it after setting the NumberFomat.

Only search and replace seems to do the trick,

		Dim oReplace
		Dim oLocale : oLocale = new com.sun.star.lang.Locale
		Dim formatStr : formatStr = "DD/MM"
		Dim oFormats : oFormats = ThisComponent.getNumberFormats()
		Dim formatNum as Long : formatNum = oFormats.queryKey(formatStr, _
			oLocale, False)
		oReplace = bookRange.createReplaceDescriptor()
		
		If formatNum = -1 Then
			formatNum = oFormats.addNew(formatStr, oLocale)
		End If
       ' "19/08/2024" changes to "'19/08/2024" as the type is still string
		bookRange.NumberFormat = formatNum

		With oReplace
			.searchString = ".+"
			.replaceString = "&"
			.SearchRegularExpression = True
		End With
       ' After replace, the date type gets inserted
		bookRange.replaceAll(oReplace)

And this is much simpler than the CDate one, thanks @Villeroy .