Automatic locale when importing HTML into Calc

Hey there!

I’m trying to format data to HTML so that I can easily import it into Calc. One thing that I’ve figured out is how to use special HTML tags to set the formatting (e.g., have numbers with great precision but force LibreOffice to round them in a specified manner). However, I’m on a German computer with German locale. I’d like to always create my data exchange format in a way that has floating point values with a decimal point (“123.456”) instead of the German comma (“123,456”). The automatic detection only seems to look at my system locale.

Is there a way to embed into the HTML file that the locale of the file itself is en_US?

Thanks a bunch!

There is no auto-detection. It is configurable via menu:Tools>Options>Language Settings>Locale. Per default, the locale setting is taken from the operating system.
The locale setting affects the entire office suite.

In the import options dialog, the default selection for “Select the Locale to Use for Import” says “Automatic”. This kind of implies there is auto-detection?

Heh, yes, the wording is misleading. It is hardcoded to be system.

Thanks, that definitively answers the question. Altough it’s not the answer I was hoping to get :cry:

It was implemented that way back then in 2009 in i#102141. The issue contains links e.g. to screenshots used back then, to the Kohei’s blog, where the “Automatic” option is explained as intentionally equivalent to “system”, and even to the specs.

But this is (1) misleading, and (2) may be definitely improved, given that HTML has means to specify this data. So you are welcome to file a bug (to rename Automatic to not be misleading), or a feature request (to implement automatic detection of locale based on HTML metadata, if present).

A macro to pick a html file and import the file with US locale into Calc.

Sub pick_Calc_HTML_en_US()
url = pickFile(sTitle:="Select HTML", sInit:="", sFilterLabel:="Calc HTML", "*.html") 
if url = "" then exit sub
dim a(1) as New com.sun.star.beans.PropertyValue
a(0).Name = "FilterName"
a(0).Value =  "calc_HTML_WebQuery"
a(1).Name = "FilterOptions"
a(1).Value = "1033 1 1" 'that is English(USA) with special numbers
doc = StarDesktop.loadComponentFromURL(url, "_blank", 0, a())
End Sub
Function pickFile(sTitle$, sInit$, sFilterLabel$, sPattern$) As String
REM return a single file URL or ""
REM dialog starts at office default directory if sInit = ""
Dim oPicker, x()
	oPicker = CreateUnoService("com.sun.star.ui.dialogs.FilePicker")
	oPicker.setTitle(sTitle)
	oPicker.setDisplayDirectory(sInit)
	oPicker.setMultiSelectionMode(False)
	oPicker.appendFilter(sFilterLabel, sPattern)
	if oPicker.execute() then
		x() = oPicker.getFiles()
		pickFile = x(0)
	endif
End Function
1 Like

no sure what it means,
but if you use the “native” HTML Document (Calc), there should not have problem with locales.
and the HTML produced remains pretty light :

Thanks for your answer, but I’m not trying to export from LibreOffice, but import into LibreOffice. It always uses the system locale for numbers, but I want to force the auto-detection to always assume “en_US” for my HTML, regardless of the system locale. Is that possible?

sure.
I mentionned the HTML produced so that you can see encoding your decimal is pretty obvious, e.g. sdval="1.2" sdnum="4108;"

And for reference : Saving and Opening Sheets in HTML

Except that if I try to import this right back into LibreOffice and leave the setting at “auto-detect”, it truncates values (123.456 becomes 123 because it expects to see “123,456”).

No.
On import (when detecting numbers), it uses the locale defined in the import filter settings dialog:

image

And in the imported document, it uses the locale configured in LibreOffice itself:

Right, when I select “Automatic” on the import dialog, it uses whatever preferences the user has configured.

This is stupid, however, because it means a document might import well for one user (with the correct configuration) and might not work for another. I could obviously force users to always select “Custom: United States”, but that is inconvenient.

Hence my question if there is any way to specify this within the HTML document itself where it makes most sense, i.e., the document declares itself "hey I’m en_US so there’s a decimal point that looks like ‘.’. I do not grasp why this should be a configuration-specific option in the first place to be honest.

Much like a HTML document specifies the character encoding (e.g., UTF-8 or ISO-8859-1) in a “meta name=“charset”” tag is what I was looking for.

:thinking:
not was I see with these attributes sdval="123.456" sdnum="4108;"

Do you have a German system locale set?

image
gives
image

(same with French, or English)