Ask Your Question
1

The content is html and the file's extenion name is xls

asked 2015-07-11 06:21:09 +0200

lolax gravatar image

updated 2015-08-24 17:17:10 +0200

Alex Kemp gravatar image

LibreOffice 4.3.7 , WIndows 7

The file's content is html and its extenion name is xls, I open the file from Calc and got garbled characters. Then I open that file from notepad and choose the encode from UTF-8 into Unicode and save it, then open it from Calc again and all characters are correct. so would you please guide me what the root cause is ? since it's okay to open it from MS excel without changing the encode.

BTW, that xls file was generated by MS visual studio. Thanks.

C:\fakepath\testutf8.xls C:\fakepath\testunicode.xls

edit retag flag offensive close merge delete

Comments

1

Is doesn't matter from where you open the file, from Calc or from Writer or from any other module. It is crucial, what filter do you select. So with the original encoding try "Web Page Query (Calc)" or "Html Document (Calc)". First select the file and then select the filter, otherwise you will not see the file. In general it can be, that the file itself has an information like "charset" or "encoding", but the actual coding differs. Can you provide the file?

Regina gravatar imageRegina ( 2015-07-11 13:07:48 +0200 )edit

If I change the extension from xls into html, then it was opened by Writer, it has table style and shows the correct characters.

lolax gravatar imagelolax ( 2015-07-11 15:28:56 +0200 )edit

2 Answers

Sort by » oldest newest most voted
0

answered 2015-07-11 14:07:25 +0200

Alex Kemp gravatar image

updated 2015-07-11 21:07:24 +0200

The 3-letter filename extension (eg .exe, .htm, .xls) is advisory rather than enforced. In Windows™, the system maintains a database of such sets of 3 letters & uses that database to know what program to use to open the file. That facility has been used extensively by hackers (as one example, renaming an EXE file to ZIP; the user part of the OS will report it as a zip archive, but the system knows that it is an executable file and will allow it to hack your system if it is double-clicked).

The previous paragraph also explains why you are experiencing such encoding problems:- it is a question of the degree of intelligence encoded--or not--into the program that you are using. As a good example, your web browser does not care what the images within your web-page are called; in other words, it does NOT use a Database of file extensions in the Windows™ user-level manner, but rather "sniffs" the file to discover it's mime-type. That means, as just one example, that you could call a jpeg image "img.gif" & your browser would still open it correctly.

PS
You say that:-

The file's content is html

That is almost certainly wrong. I would expect that the mime-type is XML rather than HTML.

PPS
I'm glad that you got it all sorted, lolax, but it is almost impossible to say why you are having such problems. There are too many variables + the language does no help. As one example, you say:-

lolax: I open that file from notepad and choose the encode from UTF-8 into Unicode and save it

utf8 is an 8-bit character encoding for Unicode code-points, chosen by almost all Linux distributions as their default text encoding + chosen by the W3C as the default for xml + html5. Microsoft™ made their own, 16-bit version of Unicode for Windows NT™, and insist on calling it "Unicode" (as if it was the only one - `Unicode' is the standard for the code-points & not the encoding mechanism).

There are many types of encoding mechanisms for Unicode code-points (eg ucs-2, utf-7, utf-8, utf-16, utf-32, and so on) ('Windows™ Unicode" is actually utf-16 internally).

An XLS file is actually a zip file archive. Inside the archive are many files, many of which are in the XML format. The first statement of an xml file is a statement of the character encoding of the text within. The default encoding for all xml files is `utf-8'.

Finally, it may well be possible to save an XLS/ODS file as a `web-page document', but it will no longer be a spreadsheet in the sense of an Excel/Calc spreadsheet. I hope that that is self-evident.

If this helps then please tick the answer (✔).

edit flag offensive delete link more

Comments

Let me explain more detail, when I open the file from MS excel, then it pops up a message "The file you are trying to open 'blabla.xls' is in a different format than specified by the file extension...... blahblahblah..... Do you want to open the file now ?", I clicked Yes and got the correct characters. Then I tried to do the save as and saw the Save as type is Web Page(.htm;.html), I changed into Excel 97-2003 workbook(*.xls), then open it from Calc and it showed the correct characters.

lolax gravatar imagelolax ( 2015-07-11 15:24:25 +0200 )edit

BTW, the data has Asian characters.

lolax gravatar imagelolax ( 2015-07-11 15:36:37 +0200 )edit

Actually I also tried to open that file from the WPS Office 10 and got the correct characters without changing anything..

lolax gravatar imagelolax ( 2015-07-12 05:53:27 +0200 )edit
0

answered 2015-07-11 18:28:19 +0200

Regina gravatar image

updated 2015-07-13 17:28:39 +0200

I have nothing written about changing the extension. Start LibreOffice, then use File > Open. First select the file, so that it is in the file name field. Then click on Button "All files". That opens the filter drop down list. Select "Web Page Query (Calc)" or "Html Document (Calc)". One of them will likely work. If not, please provide an example file.

I have examined the files now. They are neither correct HTML nor XML. But they contain only a fragment from an Html body. You cannot use the "Html Document (Calc)"-filter, because it is not a complete Html file. You would need to embed the content

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
        "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8">
<title>Dummy Title</title>
</head>
<body>
             Here your file content
</body>
</html>

It is a missing feature of Calc, that it cannot handle Html-fragments. I suggest you write a feature request, that files containing Html-tables are imported to Calc, when the appropriate filter is chosen, even in case they do not have a complete Html source.

Writer gets the encoding correct in both cases, when you use the HTML filter for Writer or for Writer/Web. So in Writer there is no need to change any encoding.

edit flag offensive delete link more

Comments

still got garbled characters for "Web Page Query Calc" and "Html Document Calc". The following I pasted some cells from the Calc. .

            æ—©çマ­(8:00 ~ 16:00)                中çマ­(16:00 ~ 24:00)               晚çマ­(24:00 ~ 8:00)

編號 æœヘå‹™ å­ミæœヘå‹™ æœヘå‹™åヘ€åŸŸ 工程師 組 級 課 級 經 çミ† 工程師 組 級 課 級 經 çミ† 工程師

lolax gravatar imagelolax ( 2015-07-12 05:48:15 +0200 )edit

Sorry, I couldn't upload the file, since it showed ">3 points required to upload files"

lolax gravatar imagelolax ( 2015-07-13 06:44:40 +0200 )edit

I have vote your question. Now you should be able to upload the file.

Regina gravatar imageRegina ( 2015-07-13 14:08:49 +0200 )edit

I already uploaded two files, testutf8 has garbled characters.

lolax gravatar imagelolax ( 2015-07-13 15:16:12 +0200 )edit

already added a statement in the program, therefore the xls file include the <meta http-equiv="content-type" content="application/ms-excel; charset=utf-8">, then got the correct characters. Thanks.

lolax gravatar imagelolax ( 2015-07-19 12:23:03 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2015-07-11 06:21:09 +0200

Seen: 5,191 times

Last updated: Jul 13 '15