Text to column solving hour filter problem : how is it working?

Hello everyone,

I solved my problem, but I’m unable to understand why, and I’m curious about it. Also wondering if there is another way to solve it.

So, I have this column filled with times. The format is an hour and displayed as “HH:MM:SS”. When adding a filter to the column with condition “<” and value “00:07:12” nothing is displayed, while I know there are cells that should be.

Investigating the problem, I notice that it works for an “=”. When I change the format to a classic numerical value, it changes to numbers. So, I start to think the format might be a problem, but it actually works in functions : D2 & D4 being two of my hours, “IF(D2<D4;“True”;“False”)” works. Therefore, the software is able to understand the format as a numerical value. Hence, I believe the problem must be somewhere else. :thinking:

Looking for answers on the internet, I find this post. I don’t understand everything, but I try out “Data > Text to columns…” and I change the column type from Standard to Text. Applying the filter : hurrah! It’s working! But I absolutely don’t know why. On the previous post, Bungler says that since it’s text, it cannot be filtered properly. But my tests let me think it is a proper numerical value. And anyway, the new column type is… Text. :exploding_head:

My brain started to melt, and here I am. What did I get wrong? Is there any other way to solve it? I tried to find a documentation on “Text to column”, “Column type” but I found nothing relevant.

Thanks for any explanation you might come with!

time_pivot.ods (47.1 KB)

Text to column splits text values. As a side effect, this feature can convert text to true spreadsheet values. If you already have correct spreadsheet values (formatted numbers), text to columns is useless. An arithmetic calculator such as Calc has more powerful tools to deal with numbers.

1 Like

The evil side of spreadsheets is that there are no column types. Each single cell can take both possible types of values. Each cell can have either text or number even if all other cells in the same column are the other type. Formulas return either text or number or errors as a special third type.

  • A spreadsheet has no column types (no fields).
  • A spreadsheet has no tables. Any rectangle of cells may be interpreted as a “table” or not. You can clutter all relevant data of a fully functional calculation model across completely uncoherent cells.
  • A spreadsheet has no records. With a single inadvertant click on a sort button you may distort any records you have typed or imported into a rectangle of cells. Wrong references or wrong usage of lookup functions makes completely unrelated data appear related.

A spreadsheets is a simplified programming language mostly for arithmetics, but most commonly misused as a database replacement.
You get the hang of it once you learned to distinguish the following:

  • Text vs. Number (fundamental with any programming language)
  • Relative vs. Absolute referencing
  • Data vs. Formatting (beginners waste hours with completely useless formatting details)
  • The “special numbers” derived from data type “Number” (percent, date, time etc)
  • The idiosyncrasies of lookup functions (which by default assume a calculation model rather than a record set).
  • If you misuse a spreadsheet in order to evade a database program, you can ignore all rules of database design at the cost of a horrible unmanagable ooze of data. Better you learn some fundamental design rules in order to make use of the spreadsheet’s limited database functionality. Better use a database right from the start, may be with a spreadsheet as prototype.
1 Like

Thanks for the answer! This is what I was looking for :slight_smile:

So, if I get it right concerning the problem I had with my hours:

  • as long as it is a number, there is no problem
  • when I format it as hours, it’s a “special number”: my sort won’t work but my IF function will, because functions (I guess it converts automatically special numbers to classic numbers for the sake of executing the function)

Then, is there a way to convert my hours to a true spreadsheet value without using text to column side effect? My guess is it’s related to the Data vs. Formatting difference you talked about

Also, I’m not sure what you wanted to show with your time_pivot spreadsheet in your previous message :thinking: Is it just an example of the “powerful tools” you talked about? Actually interesting, but not the result I’m looking for. Thanks though!

Note that you never provided a sample spreadsheet with data and examples what worked, and what didn’t. With that, it would be really simple to point to specific things, and understand and explain what happened really clearly. Otherwise, it’s just a matter of guessing.

Yes indeed, my bad. I didn’t think of that at first.

Here’s the file I’m working on.
musics.ods (46.2 KB)

When I add a filter, I can sort by ascending or descending order on the duration. However, when I add a condition to display only the duration > 02:00:00 nothing is displayed. Here’s the file with the condition.
musics condition.ods (32.7 KB)

As I said, I am able to solve this problem by using “Text to column” on the whole column, but I wanted to understand how it’s working. Now, I’m looking for a proper way to convert the hours to a usable data without relying on the side effect of “Text to column”, while keeping the HH:MM:SS display/format.

No. When you format your cells containing numbers, they still contain ordinary numbers - e.g., cell E185 contains 0,00138888888888889.

The magic happens at two stages:

  1. When you enter values to cells. You type your keyboard, and characters are passed to the edit box; when you are finished, Calc does its magic to convert the string that you typed into a cell value; at this point, it tries to disambiguate formulas, numbers, and text (the three possible different types of the content). And if the string turned out to be convertible to a number (using one of many conversion rules that it knows, including treating it as time), the cell gets that number as the value (and as a side effect, the recognized conversion rule sometimes also sets the cell’s format - as happens when you enter times).

  2. When Calc needs to show you the contents of the cell, it uses the value and the format to create a display string.

But unfortunately, the box in the Standard Filter does not do that magic, and only checks if the entered string matches simple number - and if not (e.g., string 00:02:00), it treats the entered string as not a number. Then your condition looks like this: you ask Calc to only show you strings where values like 0,00233796296296296, 0,0015625, 0,00195601851851852
… are “smaller” than string "00:02:00". And that string is not a number 0,00138888888888889.

Maybe @erAck could explain why that edit box in Standard Filter does not behave smarter - likely, there would be some compatibility concerns?

It remains the exact same value when you apply any format code such as HH to display 2-digit hours. The remaining minutes and seconds are left out. You just cheat yourself about the actual cell value.
Formatting must never change the value of a cell. You see the actual decimal cell value when you remove the number format. Try out now. It won’t change a single value.

A spreadsheet has not only no tables, records, fields, it has no dates and times neither.
Times are fractions of days.
Value → Formatted as time
0 → 00:00
0.25 → 06:00
0.5 → 12:00
0.75 → 18:00

Cell A1 = 0.5
=HOUR(A1) → 12, no matter if the cell is formatted as currency, percent, decimal, time or whatever. The hour of 50% is 12 and the hour of 50 Cent is 12 because both values are the same 0.5 as 12:00, 12pm or 0.5 formatted as HH which displays the true value 0.5 as 12.

Dates in spreadsheets are day numbers:
Value → Formatted as date
0 → 1899-12-30
1 → 1899-12-31
2 → 1900-01-01

44942 → 2023-01-16
44942.5 → 2023-01-16 12:00
When you subtract 2 “date values” you subtract 2 day numbers, The result is the difference in days.

1 Like

For reference. In Excel, in a similar situation, the standard filter will replace the string “01:00:00” (or the cell value from the drop-down list) with the corresponding number. When we open the filter again, we will see “greater than” 0.0416666666666667. Related tag from *.xlsx\...\sheet1.xml:

<customFilter operator="greaterThan" val="4.1666666666666664E-2"/>
1 Like

Formatting as ‘YYYY-MM-DD HH:MM:SS’, it’s possible to use the standard filter, selecting date-time.

Seems it only doesn’t work for time values, but it does for dates.

1 Like

Works for me anyway. . The filter dialog displays the times but does not filter properly. This bug affects Base too. As a work-around, Calc can read the correct filter criteria from cell range (“advanced filtering”).

Sorry, but seems the filtered values have nothing to do with the filter, I think.

musics.ods (63.3 KB)

1 Like

I know that it worked for someone else on Excel indeed, you can sort it and only display the cells according to the condition.

I’m not sure why you are trying to show Villeroy. The equal condition works indeed, it displays properly the value in the drop-down menu, but the execution isn’t correct. Here’s what happens for me :

From what I gather from your conversations, the data in the cells are either number or strings, format being the way to display, it doesn’t change anything deep beneath. So, the software should be able to read and use it as such, except when it’s the Standard Filter as mikekaganski explained. However, it seems to work for Villeroy, so there might be a global parameter of Calc involved?

I’m not aware of any such parameter. Even for the most simple collections I use databases where all this simply works as expected. Databases have tables with records and fields. Fields have types such as dates, times, date-times, integer numbers, decimals of fixed and floating precisions. Column names are separated from data, calculations with empty values always return empty.

There was never a sign that it worked for @Villeroy. There was a screenshot that the drop-down list has the values.

Without having dug into I’d guess just because it’s not implemented…

I’m coming late, and I didn’t read everything thoroughly. I also rarely use the standard filter. However:

The filtering UI doesn’t offer options concerning the type of data - and here we have a case where this even wouldn’t suffice because the issue is concerning data treated internally as numbers, but displayed to the user - depending on his settings- in some stubborn format.

“Our” user wants to enter the limits for durations in a specific format.
That format is made for TOD and there are zillions of variants of formatting date and time related data actually used. It seems the only formats actually not used in the context are those standardized by proper authority (ISO).
The colon-separated formats e.g. are exclusively specified for TOD, but are all the time misused for durations. So far the issue is clearly “human”.

On the other hand the recommended ways to express (format) durations are made regarding the needs of science and engineering, an use generally only one unit for one value. That’s not actually convenient in the music field e.g. where we may have to wait >100 years for the acceptance of standardized communication of values. In other words: The deprecated misuse is the actual standard.

And then there may be software simply disregarding the problems and the risks related to misunderstandings, and applying a kind of “uneducated guess” in the hope people will find that “great”.

Is there a solution?
Yes: Abandon the idea to sort and to filter data represented by numbers regarding the current formatting. Convert explicitly to textual representations regarding a few needs:

  1. Choose the “format” in a way that the lexicographical sort is the same as the numerical.
    This requires to give room for as many leading zeros as may at maximum occur… In the given case I think (e.g.) that we can ignore titles oif 100 h or more.
  2. Enter limits in exactly the same textual representation not adding or removing any spaces e.g.
  3. Concerning filters prefer the AdvancedFilter then. It allows for calculated conversion in dedicated cells. You don’t need to enter long strings manually into the filter dialog.

See the demo:
disask86473FilteringByDurations_Was_musics.ods (74.7 KB)

Sorry if I told something again already posted in better words by others.

1 Like

https://bugs.documentfoundation.org/show_bug.cgi?id=153057

musics.odb (38.6 KB) (fixed rounding issues and number formats)

This database is a workaround for the broken filtering functionality.
Form “Filtered Titles” takes separate minutes and hours instead of time values. This is even easier to enter since you don’t need the 0: prefix for the hours.
The white boxes on top take your filter criteria. Title and Artist search case-insensitively for given substrings. The nueric values for year, minute and seconds take start values and end values. Genres and flags are selectable from listboxes. You may enter descriptions in the “FLAGS” table. They will be displayed instead of the A,B,C-Flags.
I added a boolean column labeled “Check” where you can mark arbitrary titles. The white check box on top is a triple-state check box (on/off/empty). All empty search criteria are ignored. If you clear all white boxes, you get the whole record set of the titles table.
On top of the time columns, there is a box showing the overall time for all displayed title in HH:MM:SS format.

In the report container you find a simple report, printing the title list as filtered in the form.