Tricky, almost impossible csv import due to thousands separator and decimal symbol

I receive data from a source that is formatted as the sample below:

,Date,Description, Amount , Balance , vat , ex vat ,
13,2024/01/30,DIGITAL PAYMENT DT ABSA BANK,-2 590.00,-2 590.00,-337.83,-2 252.17,-2 252.17
14,2024/01/30,TRANSACTION CHARGE,-10,-2 600.00,-1.3,-8.7,-2 260.87
15,2024/01/30,PROOF OF PMT EMAIL, -   ,-2 600.00, -   , -   ,-2 260.87
16,2024/01/30,TRANSACTION CHARGE,-1,-2 601.00,-0.13,-0.87,-2 261.74

When I attempt to import this, the is no Local Language to select that matches this format. There also doesn’t seem to be a manual config to specify this.

We use English (South Africa) which has a space as thousands separator and a comma as decimal character. If I select that as the locale, is should recognise the thousands space, but it doesn’t, most likely because the combination of space and comma is not valid.

Knowling why it doesn’t work is I suppose half of the solution, but how do I coax the other half out of LO? If there is a way to create a customer locale it would be useful, but I don’t see such an option, or is there?

tdf#159829

Armenian locale matches your numeric input (but not dates, which you will need to format per-column).

1 Like

Note that the mix here of group separator space and decimal separator period is yet a 3rd variant and even less likely.

I have since found that Georgian fits the format too. However, surely this is not a suitable arrangement? I can’t find a table that shows the different formats for each locale either, which makes it a bit of cludge, not so?

In the earlier versions of LO, or maybe it was OO, I seem to recall one could simply select these (thousand separator, decimal character, etc) to create a custom format, or doesn’t my memory serve me correctly?

No, arbitrary separators were not possible.

Let’s try.

Option Explicit

' Show Locales settings.
Sub ShowLocaleSettings
  Dim dataArray(), i As Long, locList, thousandSep As String, decSep as String
  Dim formatFixed As String, formatDateTime As String
  Dim oLocaleData As Object, oLocale As Object, oLocaleItem As Object, oFormats, oFormat As Object
  Dim oCalcDoc As Object, oRange As Object, oColumns as Object
  
  oLocaleData=createUnoService("com.sun.star.i18n.LocaleData2")  
  With oLocaleData
    locList=.getAllInstalledLocaleNames 
    ReDim dataArray(Ubound(locList)+1)
    dataArray(0)=Array("Country", "Lang", "Var", "ThousandSep", "DecSep", "FormatFixed", "FormatDateTime")
    For i=0 To Ubound(locList)
      oLocale=locList(i)
      oLocaleItem=.getLocaleItem(oLocale)
      thousandSep=oLocaleItem.thousandSeparator
      If thousandSep=" " Then thousandSep="Space"
      If thousandSep=Chr(160) Then thousandSep="No-break Space"
      If thousandSep="." Then thousandSep="Dot"
      If thousandSep="," Then thousandSep="Comma"
      If thousandSep="'" Then thousandSep="Apos"      
      If thousandSep=Chr(1644) Then thousandSep="Arabic ThousandSep"      
      
      decSep=oLocaleItem.DecimalSeparator
      If decSep="." Then decSep="Dot"
      If decSep="," Then decSep="Comma"
      If decSep=Chr(1643) Then decSep="Arabic DecSep"      
     
      oFormats=.getAllFormats(oLocale) 
      formatFixed=""
      formatDateTime=""
      For Each oFormat In oFormats
        If oFormat.formatKey="FixedFormatskey5" Then formatFixed=oFormat.formatCode
        If oFormat.formatKey="DateTimeFormatskey2" Then formatDateTime=oFormat.formatCode
      Next oFormat
      
      dataArray(i+1)=Array(oLocale.Country, oLocale.Language, oLocale.Variant, thousandSep, decSep, formatFixed, formatDateTime)
    Next i
  End With

  ' Show DataArray
  oCalcDoc = StarDesktop.LoadComponentFromUrl("private:factory/scalc","_default",0, Array())
  oRange=oCalcDoc.Sheets(0).getCellRangeByPosition(0, 0, Ubound(DataArray(0)), Ubound(DataArray))
  oRange.setDataArray dataArray
  
  ' Title style
  oRange.getCellRangeByPosition(0, 0, oRange.Columns.Count-1, 0).CellStyle="Accent"
  
  ' Don't check spelling
  oLocale.Country=""
  oLocale.Language="zxx"
  oRange.CharLocale=oLocale
  
  oColumns=oRange.Spreadsheet.Columns
    
  ' AutoFilter
  oCalcDoc.DatabaseRanges.addNewByName("db_1", oRange.RangeAddress)
  oCalcDoc.DatabaseRanges.getByName("db_1").autoFilter=True
  
  ' OptimalWidth
  For i = 0 To oRange.Columns.Count-1
    oColumns.getByIndex(i).OptimalWidth=True
  Next i
End Sub

1 Like

Nice table!

But this

is unreliable. For one, though most times the key names are used this way, they can be arbitrarily assigned by the locale file’s editor, and even if most times the same names are used across locales they can appear on different formats. Second, you won’t retrieve the actually specified default formats this way. For that you’ll have to use either the formatIndex value corresponding to the predefined NumberFormatIndex value, or for date+time even loop over the formats of the category (formatUsage) DATE_TIME and look for isDefault of formatType medium, as some locales have more complex definitions. So, these two lines instead should be

        If oFormat.isDefault And oFormat.formatUsage="DATE_TIME" And oFormat.formatType="medium" Then formatDateTime=oFormat.formatCode
        If oFormat.formatIndex=com.sun.star.i18n.NumberFormatIndex.NUMBER_SYSTEM Then formatFixed=oFormat.formatCode

Still, you’ll notice that zh-* locales do not define DATE_TIME medium default formats (which I think in this case is a miss-assignment in the locale data definition), for such you’d have to determine the default format of formatType long if there was no formatType medium.


For better readability I’d also add the language and country names, so

    dataArray(0)=Array("Country", "Lang", "Var", "LanguageName", "CountryName", "ThousandSep", "DecSep", "FormatFixed", "FormatDateTime")
...
      oInfo=.getLanguageCountryInfo(oLocale)
      dataArray(i+1)=Array(oLocale.Country, oLocale.Language, oLocale.Variant, oInfo.LanguageDefaultName, oInfo.CountryDefaultName, thousandSep, decSep, formatFixed, formatDateTime)
1 Like

The following code formats the first 256 columns of each row in the default format of the respective locale, so you can test the number recognition and default formats with arbitrary user input. My LO hangs if I format the entire rows.

import uno

def getLocaleString(oL):
    s = '-'
    a = [oL.Language]
    if oL.Country:a.append(oL.Country)
    if oL.Variant:a.append(oL.Variant)
    return s.join(a)

def printAllLocalesToNewSpreadSheet():
    ctx = uno.getComponentContext()
    smgr = ctx.ServiceManager
    StarDesktop = smgr.createInstanceWithContext("com.sun.star.frame.Desktop", ctx)
    oDoc = StarDesktop.loadComponentFromURL("private:factory/scalc","_default",0,tuple())
    oNF = oDoc.getNumberFormats()
    oSheet = oDoc.getSheets().getByIndex(0)
    iEndCol = 255 # too much: oSheet.RangeAddress.EndColumn
    i18n = smgr.createInstance("com.sun.star.i18n.LocaleData")
    a = i18n.getAllInstalledLocaleNames()
    cCols = 7
    r = list()
    r.append(("Locale","Language","Country","Decimal","Date","Time","1000","List"))
    for i in a:
        oInfo = i18n.getLanguageCountryInfo(i)
        oItem = i18n.getLocaleItem(i)
        b = (
            getLocaleString(i),
    	    oInfo.LanguageDefaultName,
            oInfo.CountryDefaultName, 
            oItem.decimalSeparator, 
            oItem.dateSeparator, 
            oItem.timeSeparator, 
            oItem.thousandSeparator, 
            oItem.listSeparator
        )
        r.append(b)
        rc = len(r)
        nf = oNF.getStandardIndex(i)
        oSheet.getCellRangeByPosition(0, rc-1, iEndCol, rc-1).NumberFormat = nf
    oSheet.getCellRangeByPosition(0, 0, cCols, rc-1).setDataArray(tuple(r))

g_exportedScripts = (printAllLocalesToNewSpreadSheet,)

Nice indeed! Thank you!

There’s isn’t maybe a way to do the reverse of this, ie change the settings for a locale by using a macro to write new values and as such customize the locale?

No, the locales are read-only. To solve your problem, I would eliminate the “wrong” thousand separators. This can be done easily with the find/replace dialog. You may record a macro which performs the right replacements in the right columns.