Odd QueryEmptyCells behavior on pasted columns with floats

osxtra · May 20, 2022, 8:03pm

LO: 7.3.3.2 Community (Build d1d0ea68f081ee2800a922cac8f79445e4603348)
OS: OSX 10.15.7

Am having difficulty with data copied and pasted programatically from one Calc doc to another, where one of the source columns contains floats.

Normally I paste as “SV” (strings and values) for an entire range, but have found that with the problem column, unless I paste as “V” (values only), running the QueryEmptyCells method on an object produces an incorrect array of RangeAddress elements.

String and integer data do not seem to be affected; only floats (the column is formatted to two decimals with one leading zero, and the source cells contain either no data, or a number such as 0.13, etc.)

Am thinking it’s something in the source file but don’t see anything “wrong” with the data. The extension says it’s an XLS, but it’s actually in HTML format. Saving that doc as an ODS then re-opening and using the native Libre doc as the source produces the same effect. Examining the file does show the correct float value (“0.13”, etc.) Copying from a cell in the ODS saved version to a text editor shows the same value. Libre opens the “fake” XLS just fine, but I do have to use a two-element com.sun.star.beans.PropertyValue array:
aProps(0).Name = “FilterName”
aProps(0).Value = “HTML (Starcalc)”
aProps(1).Name = “FilterOptions”
aProps(1).Value = “59,34,0,1”

Values for element 1 of the PropertyValue item were gleaned from an ( OO Forum Post. Neither OO or LO docs seem to have much info on how to construct these filters, and wading into the LO source didn’t help much either. If this is the source of the trouble, and there is such a resource out there, please pass it along!)

In any event, am not thinking it’s the data per se, rather how Libre is interpreting it.

If I manually enter integer or string data into any row of the “problem” column, or delete the float value so there is no data in that cell, QueryEmptyCells seems to correctly evaluate the column. Otherwise, it shows shows just one array element, from 1 before the highest row that was pasted (row indices are zero-based) to the max of 1048575, and CMD-Down in that column from row 1 just goes straight to the end of the tab, ignoring any rows with data. (This makes sense, as internally CMD-Down is most likely employing QueryEmptyCells to know where to position the cursor.)

I can also get QueryEmptyCells to work properly by clearing direct formatting for the column, then manually entering float values.

Without manually entering the floats, I can get around the issue by pasting the float columns as just “V”, rather than “SV”, but that means hard-coding which columns to be treated as “V” and which as “SV”.

FWIW, am copying with
myDispatcher.executeDispatch(sourceFrame, “.uno:Copy”, “”, 0, Array())
and pasting with
myDispatcher.executeDispatch(targetFrame, “.uno:InsertContents”, “”, 0, pasteFlags)
where
myDispatcher → an object created by: createUnoService(“com.sun.star.frame.DispatchHelper”)
sourceFrame → the Frame of the source document (a com.sun.star.comp.framework.Frame object)
targetFrame → the Frame of the target document (same type as sourceFrame, just pointing to a different object)
pasteFlags → normally “SV” for Strings and Values, unless I use the workaround for the “float” columns and employ multiple copy’n’paste operations.

I can also tell it’s definitely something in the source data, as CMD-Down acts the same way on it, but since the source “reads” OK, can’t put my finger on why. If I manually copy the float column and paste it as values, things work as expected.

Has anyone else run across this issue, or have any insight on why these float values are being interpreted incorrectly, but String and Integers would not?

Please advise. Thanks!

Carl

joshua4 · May 20, 2022, 9:28pm

Can you upload an example pair of ODS’s? Perhaps just attach it to your original post?
.
Does the macro work in such a way that instead of pasting you could assign to the cell? I’m thinking about avoiding Dispatcher, picking your MIME type from the clipboard, and using Enum TypeClass then assigning that directly to the cell .String or .Value. You might look at Getting unformatted text from clipboard - The Document Foundation Wiki.

osxtra · May 21, 2022, 11:37am

Thanks for the clipboard idea. May use that down the road.

Going to forget about this issue though, as after playing with the data a little more, have determined it’s just something squirrelley with what’s getting copied.

Again, I’m having the code open source files that have an XLS extension, but are actually HTML. They’re downloaded from a provider’s website, so I have no control over file format.

A couple of the columns are hyperlinks, such as:
803752

Weird thing is, any time I munged those columns (to anonymize the data before posting), the problem would go away.

Also weird, I had some code in place to remove the hyperlinks before performing the copy’n’paste:

myDispatcher = createUnoService("com.sun.star.frame.DispatchHelper")
myDispatcher.executeDispatch(sourceFrame, ".uno:SelectAll", "", 0, Array())
myDispatcher.executeDispatch(sourceFrame, ".uno:ResetAttributes", "", 0, Array())
myDispatcher.executeDispatch(sourceFrame, ".uno:Deselect", "", 0, array())

Same as before, sourceFrame is the Frame of the source document object’s CurrentController, which is instantiated by iterating the hasMoreElements property after assigning an object to the createEnumeration method of StarDesktop.getComponents(), looking for the title of the desired doc.

So, it’s unclear how the supposedly now plain text could cause an issue when pasting.

Even though I can’t provide the original source file, did manage to create an ODS containing a copy of the “invisible” numeric data; it’s attached here.

Bad Float Data.xls (270.5 KB)

In doing that, noticed the Int columns were messed up too, hadn’t snapped to that before.

But like I said, not going to worry about it anymore. It’s just something off in the source file, nothing wrong with LO, nothing I can’t work around.

Will just paste as values and move on…

joshua4 · May 21, 2022, 4:05pm

I cannot replicate the issue with manual cut/paste, for sure. I think you are correct to find a workaround and keep moving on. It still may have to do with the information type LO is picking from the clipboard, so the examples I gave just might be salient. (if anyone else comes to this thread and needs a more specific fix.)

However, I would implore you to take your obvious skill set and start using MRI or XRay to get even more familiar with the API itself and get away from the DispatchHelper.

Villeroy · May 21, 2022, 4:33pm

May be this helps somehow. At least it detects all types of cells correctly as far as I can tell.

ms777 · May 22, 2022, 10:28am

You may want to have a look at Determine csv filter options. This allows to get the FilterOptions by going once through a manual import.

Of course you have to replace
aProps(2).Value = "Text - txt - csv (StarCalc)"
by
aProps(2).Value = "HTML (Starcalc)"

Good luck,

ms777

osxtra · May 22, 2022, 9:32pm

You’re absolutely right. I don’t use dispatcher for much - clipboard copy & paste, removing hyperlinks, and deselecting. I do use xRay and am familiar with the API, but have had those uno calls in my “functions” helper module for a while now. This would be a good opportunity to refactor, or, another item on the never-ending to-do list, just bite the bullet and rewrite everything using Python instead of Basic. Only going on 200 subs and functions to convert, should be a breeze…

Lupp · May 29, 2022, 3:16pm

I would avoid dispatcher commands in such a case, and also the get/setTransferable, but transfer data by the get- and set- methods for the DataArray properties.
In addition I wouldn’t use a specialized filter to load a component from the “strange” file, but leave the decisions to the
clever automatisms of StarDesktop.loadComponentFromURL() like in

loadArgs(0).Name  = "Hidden"                     REM Only used to suppress prompts.
loadArgs(0).Value = True
sourceDoc   = StarDesktop.loadComponentFromURL(url, "_blank", 0, loadArgs)
sourceCtrl  = sourceDoc.CurrentController
sourceCtrl.Frame.ContainerWindow.Visible = True  REM See above.

Where url stands for a complete filepath (in URL notation) of the file to import
(your Bad Float Data.xls e.g. - file names without spaces are better).

I’m not very familiar with Html / XML,. but would like to state that the attached file contains two opening tags , but only one closing tag . From my limited point of view there were additional strange things. Buit this may simply be MS gibberish. The W3 verifier doesn’t accept the file.

Villeroy · May 21, 2022, 5:58pm

When you open the html file, LO shows an import dialog. Choose Englsh(Any) as import language and your floats will be floats actually. The problem does not occur when the default locale is English anyway.

Villeroy · May 21, 2022, 7:14pm

Just in case you are trying to normalize this mess:
normalize.odb (41.2 KB)

osxtra · May 22, 2022, 9:40pm

Thanks, am doing this programmatically, so not pressing anything but a push button on the “Control Panel” tab…

Villeroy · May 23, 2022, 11:38am

How can you program any macros without knowing the underlying application?

osxtra · May 23, 2022, 12:49pm

Few can really understand all of a language, and I sure don’t claim to be an expert in any one of them, but do have a fair smattering of several, and am always interested in bettering my understanding.

Am non-denominational when it comes to programming. Use the right tool for the task at hand.

Didn’t understand, after taking a battery of tests upon enlisting, why they said I had a good aptitude for language. I speak, read and write English (well, American). Hated Spanish class in high school. My mate is German, and I know just enough of that to make her laugh or get cranky that I don’t know more. (My Grandma and Grampa were German and Dutch. They used to pun back and forth in both languages, simultaneously. Ugh, no wonder I never figured either one out!)

It took awhile for me to realize that programming is a language, too. Just not a human one. And, like any language, there are many dialects.

Have been hacking on Basic since the Trash-80 & Commodore days, so while I might not know all the ins & outs of LO’s API, I can get by, and thanks to some decades of experience do know there are only so many concepts: Variables; Functions and Subroutines (now called Properties and Methods in class-based coding); Iteration; Instantiation; Etc.

It’s just a matter of learning how to speak in whatever language you’re using.
I do use xRay and keep the Props and Methods of often-used objects as text files for reference. It comes in handy.

Never was a C or Java coder, but thanks to PHP and other similar languages can keep from getting totally lost when reading code of that “flavor”. Have been playing with Python. Resisted for years, but have grown to appreciate it. Comprehensions are awesome, but I wish they had a better word for them. Am super-excited they’ve finally added a Switch statement (sort of. See PEP 634). Can’t count the number of Bash helper scripts I’ve written over the years. Main platform these days is Mac, but I have various 'doze and *nix machines too. Again, best tool for the job.

I mainly use LO Basic for automation. The app that started this post imports files downloaded from the web via a Python script using Selenium, massages that data a bit, and finally creates an output spreadsheet for others to view. Look ma, almost no hands!

Was thinking about the C64 recently. I still have my old monitor, and it works! Wish I had the main unit and floppy drive, but they’re long gone. These days, not sure I’d be up to keeping track of all those two-character variable names that implementation of Basic employed. At least, not in my head. Would probably have a text file or spreadsheet helper to keep track of them for me…

Villeroy · May 23, 2022, 3:19pm

You should not use any “automation” before you are very familiar with the program you try to automate. Regarding Calc, you should know the most fundamental spreadsheet facts which are the same since the early 90ies.

Data types (text vs. number).
Appearance (values vs. formatting).
Relative vs. absolute references.
Using functions and their positional parameters.
Quirky LOOKUP function and derivates (same LOOKUP since Visicalc of 1979).

Without some most fundamental basics all your Basic code will fail, no matter how proficient you are in writing Basic or Python or any other macro langauge.
In this case, you are unaware of Calc’s locale settings when importing HTML.

osxtra · May 23, 2022, 5:44pm

Well, putting aside the fact that experimentation is a valuable part of becoming “very familiar” with something, I do happen to be comfortable with all items on your laundry list of basic knowledge (though not sure how the VisiCalc reference is relevant).

Thanks, though, haven’t thought about that program in years!

So far me being ignorant of locale issues, that’s not true, but I don’t believe this is one, so didn’t mention it in the original post.

According to Online Libre Help
(for v6, but doubt it’s changed for v7)

If “Use English” in a part of settings is NOT checked, it’ll use the default locale.

For my settings:
While Load/Save → HTML Compatibility → “Use English…”: NOT Checked,
Language Settings → Language → Locale setting: Default - English (USA)

The values are clearly (at least, according to US) floats:

 <td class='right'>0.06</td>

Even were this a locale issue, why running QueryEmptyAddresses on the column object would indicate cells that clearly have data in them is a separate issue, and why I posted this to begin with. The format of the data doesn't matter in this context. It shouldn't matter *what* the data is, so long as it is present. If it's there, QueryEmptyCells should find it. (Or, actually, not find it, since it's looking for cells without data).

So far as opening the document itself, yes, did have to pass
aProps(0).Name = "FilterName"

aProps(0).Value = “HTML (Starcalc)”

aProps(1).Name = “FilterOptions”

aProps(1).Value = “59,34,0,1”

in the call to stardesktop.LoadComponentFromURL

which I got from reading one of the few sources I could track down, namely

An OO Forum post

Adding this prevented the import popup from occurring. (Hard to automate when you have to click things.)

AFAIK, the only relevant number in element 1’s value string is the zero, which indicates the character set. If you believe that value string is the culprit, I’d like to know why!

(More importantly, I’d like to know where it’s documented.)

In any event, there’s nothing in the HTML which indicates locale, and when manually importing the sheet it’s OK, so am not suspecting the data itself to be part of the problem.

Am sure there is some obscure bug being triggered due to the import filter, just not sure where to look or how to correct it. If you have a glimmering, am all ears.

Sure, I could probably sed all the html out and turn it into a regular CSV before opening the source doc. Pasting as values from source to destination fixes the problem though.

Thanks for your input!

Villeroy · May 23, 2022, 6:07pm

Well, you did not notice that your decimals are text. Visicalc also had 2 data types number and text (they called all text “label”). It had the same LOOKUP function that people can not use properly until today. Visicalc had relative and absolute references and functions with positional parameters.

P.S. running my “showFilterOptions” macro against your HTML, I get “3081 1” describing English(Australia) and “Detect Special Nummbers”. The latter is does not matter in this particular case but it is the right option in 99% of all practical use cases where currencies, dates, times are involved. Your FilterOptions look like the filter options for a csv file.
After pasting this into the table of my database example, all I have to do is loading the view without any stupid Basic code.

osxtra · May 29, 2022, 10:32am

Well, thanks for that info, have had trouble finding docs on determining the correct filter options (which I’m only using to avoid the popup, as the source file isn’t really a spreadsheet). The API talks about them but doesn’t enumerate the various options, and even grepping through the source hasn’t produced any useful info on what all values are available, and what they do.

Not asking you to share your code, but could you give me a clue as to how your “showFilterOptions” macro works? What objects is it referencing to extract that info?

There’s still the issue of QueryEmptyCells incorrectly populating its RangeAddress array, but not going to worry about that anymore…

Villeroy · May 29, 2022, 6:39pm

You linked to it.

And this forum’s search box finds it as well.

osxtra · May 30, 2022, 1:19pm

That’s true, it mentions it, but the link goes to a 404.

Took another look at the online wiki and found the document to which that post was referring.
(The online version of the WiKi does not seem to contain all the info as the PDF. Specifically, the section about “Filter Options for the CSV filter” is not present online. Probably something to do with the PDF referencing v3 of OO, while they’re on v4 now.)

Thanks for getting me to take another look!

Though referencing a previous version, the doc still contains valuable info. Would be nice if TDF would have a reference section on their site containing links such as the above. Perhaps for v8 of LO the documentation team could refactor so everything would stay “in house” rather than having to link back to OO. That’d be a lot of work though…