How to crosstabulate numeric time series data?

galaxian · April 1, 2017, 9:12am

I have some time series data, with the date in column A, and other variables in columns B, C, D…etc. So each row has data specific to that date. All the data, apart from the dates, are numeric.

I want to create a two (or more) column list where each row has the range of values from B in the first column and the average of the corresponding values in C in the second column. So for example a cell in the first column would be for values of B that are equal to or more than 7 and less than 8, and the adjacent cell in the second column would be the average of all the cells in column C where the value in B in the same row is equal to or more than 7 and less than 8. As well as the average I would also like to have columns for the standard deviation, max, min, etc.

What is the easiest or quickest way of doing that please?

So far I have made a column for each range of values in column B, and then the values from C are shown in the column only if B is within a particular range. The columns are then averaged, etc. But this creates a very large table and is tedious to set up.

I have been reading about Pivot Tables, but have seen nothing about sorting numeric values.

Lupp · April 1, 2017, 10:08am

“… where each row has the range of values from B in the first column and the average of the corresponding values in C …”
How should a “range of values” - how defined? - be represented in one row of one coulmn which is a single cell?
If you actually mean that the first column should contain the range of lead values one by one, you also have to specify this range to more detail. Otherwise you would need next to infinitely many rows.

Lupp · April 1, 2017, 10:16am

If you mean the actually occurring values of the range:

There currently are no standard functions allowing to create the contents of this firtst column by simple formulas.
Available means would either require helper columns and rather inefficient and complicated formulas - or a certain amount of user programming. This the more if you need the result in a specific order.

If you can provide a list of allowed values in advance the task is simple.

Lupp · April 1, 2017, 10:35am

I did not understand for what reason you did not achieve your goals using the Pivot-Table tool. This assuming “…If you actually mean that the first column should contain the range of lead values one by one, …” from my first comment on the question is addressing your actual needs.

See this demo.

(Edited with respect to the update by @galaxian of his own explanations posted as an answer)
Since you seem to be familiar with descriptive statistics to some degree, you surely won’t expect me to know a general procedure to create "the" appropriate partition of the range of lead values for your grouping of collateral values. You may define such a partition arbitrarily and try to adapt it to your needs in steps. You may also use some helper columns and whatever for a density analysis or whatever …
Also see this alternative demo.

(Editing again:)
( @galaxian: )“The sizes of the bins are arbitrary and are my choice. In practice I want each bin to have at least 5 instances in it to reduce chance results.”
That’s a bid first of all. However, I doubt I would get happy myself doing it this way. Applied statistics might carry enough scars caused by willfulness, anyway. Might not a kind of zoom including a changeably sized neighbourhood of a central value be a choice? This if there is not a multi-centered density distribution (Analysis required! The FREQUENCY function may be helpful.).
If you provide me with a realistic dataset and with some additional background information, I could try to find a suggestion. However, I have to emphasize again that I am not an expert in descriptive statistics, but just a curious amateur.

galaxian · April 2, 2017, 9:13am

Thank you Lupp for your work, but none of the numbers are integers. They are all different as well. They are all multi-digit decimal numbers, eg. 0.972357063, 13.32698436. So unless pivot tables can categorise numbers into ranges, there would be as many rows as there are numbers.

(Since there are hundreds of numbers then one or two of them might be integers or have the same value by chance).

Edit/update: Thanks again Lupp. The numbers from column B can be grouped as whatever seems most useful for that particular data, eg example rows could be >=0.5 AND <0.6, >=0.6 AND <0.7, >=0.7 AND <0.8 and so on. That example puts the data in 0.1 “bins”, but other sizes of bin could be used instead.

The data is non-linear and almost certainly non-Gaussian, and there is also no prior suggestion that they will form a nice curve or line. Parametric statistics like a regression equation would give misleading results.

So what’s the easiest way without using pivot tables please?

Another edit/update (3April17): Thanks Lupp, will give you some sort of bonus point if I can find how to do that. The sizes of the bins are arbitrary and are my choice. In practice I want each bin to have at least 5 instances in it to reduce chance results. The way you calculated in your latest spreadsheet was similar to how I had been doing it. I had hoped there might be a quicker simpler way. I have this morning been finding conditional functions that suit looking things up in long unsorted lists of non-integers. The ones I have found are:
averageif(
averageifs(
choose(
countif(
countifs(
hlookup(
if(
index(
lookup(
match(
offset(
prob(
sumif(
sumifs(
vlookup(
Are there any others please? Some of the array functions also look interesting. Unfortunately there is not any conditional function for standard deviation. I found most of these in the OpenOffice spreadsheet documentation, where they are arranged more clearly.

Is there any way of calculating a conditional standard deviation with a similar “if” to averageif(? Or for that matter a conditional max, or min?

Lupp · April 2, 2017, 9:34am

It’s not about the numbers being integers, but about a precise specification of how to group. Formuals can do this better than Pivot tables, but how to do it in any specific case might not be a matter of a few words.Depending on the details there will be some complications to handle.
Please regard my first comment on your original question (OQ): How should a “range of values” (how defined?) be represented in one row of one coulmn which is a single cell?
(Two cells needed?)

Lupp · April 3, 2017, 1:38pm

The SomewhatIFS functions imo suffer from the disadvantages of the ‘Criteria’ concept. I, though, use them myself. Where I deprecate criteria for their depending on automatic conversions without my explicit control, I am able to find a workaround.
I would doubt if a DEVIATIONIFS function would be usable under strict regulations in the sense of well justified statistics, but I am not a “senior expert” with this, and would first of all need to know the details of the case before judging.

Lupp · April 3, 2017, 1:48pm

LOOKUP I never use and HLOOKUP rarely is of much value. VLOOKUP may do now and then but generally a combination of MATCH with INDEX or OFFSET is more flexible.
The sophisticated statistical functions I use next to never. I had to experience so many bugs, very annoying and though long-living ones among them, that I don’t trust in hidden (to me) code whose reliablity I did not analyse and cross-check.
I don’t use specific array functions too much, but often rely on forced array-evaluation.

AlexKemp · September 28, 2020, 9:25pm