Import a very large .xls file that seems to contain mainly XML tage

Hi Folks!
I have a 19 meg file, supposedly of prices for a bond fund, downloaded from here:
https://www.ishares.com/us/products/239458/ishares-core-total-us-bond-market-etf#/

The file has a .xls extension. When I try to open it with CALC, I see an XLS header, then what appears to be formatting information, and then about 72 thousand lines that are mainly variants of this:

  • ss:Cell ss:StyleID=“Left”> <ss:Data

  • ss:Type=“String”>Jul 31 </ss:Cell>

  • ss:Cell ss:StyleID=“Right”> <ss:Data

  • ss:Type=“Number”>1.88</ss:Data>

  • /ss:Cell>

Note that I have removed the leading “<” from each of the lines above as they are otherwise rendered invisible by the markup interface. My understanding is that the new excel uses an xml format internally, so I tried changing theextension to .xlsx, but I still didn’t get anything resembling a spreadsheet. Does anyone know what is going on here? And is there a way to import this into CALC to get a spreadsheet-like file? I don’t care if it is pretty. I’ll be re-exporting it to a .csv file in any case.

I enabled the experimental Data → XML source feature and tried again with that. I successful pointed it at the file, and then it gives me a big blank box labeled “map to document” an a blank line labeled “mapped cell”. I don’t know what to do with these fields. I am hoping that it does not mean that I need to figure out what all there XML tags do and then translatethem into something CALC likes better. If I have to do that much work I think I’d rater exit from the LibreOffice world and ty to do ir with an R package or something.

Hello,

(assuming you are talking about file iShares-Core-US-Aggregate-Bond-ETF_fund.xls, which is neither a .xls file [binary format] nor a .xlsx file [zip format])

The first (but unvisible in standard editors) character is <feff>, which causes:

malformed_xml_error: unsupported encoding. only 8 bit encodings are supported (offset=3)

Remove this first character and you can import using Microsoft Excel 2003 XML (*.xml;*.xls) importfilter. In a second test it worked just using File -> Open and selecting the file.

Screeenshot from vi-editor


Tested using LibreOffice:
Version: 7.0.2.2, Build ID: 8349ace3c3162073abd90d81fd06dcfb6b36b994  
CPU threads: 8; OS: Linux 5.3; UI render: default; VCL: kf5
Locale: en-US (en_US.UTF-8); UI: en-US,Calc: threaded

Hope that helps.

Wow. Just wow. I would never have found that in a million years. It really makes me appreciate the expertise of contributers to this community. Thanks Opaaque!