Calc: how to chart series "slices"?

ajlittoz · March 5, 2017, 4:43pm

This is my challenge: I have several very long time-series in a sheet. Column A is date and columns B-… are samples from sensors. Of course, I can display the raw time-series under various formats and styles.

However, I’d like to see “de-seasonalised” trends in these data. By “de-seasonalised” I mean, e.g., all samples taken at 8:00, 9:00, 10:00 … as 24 independent series for hourly samples, or all samples for January, … as 12 independent series for monthly samples.

AFAIK, there is no way to select a range of cells with a step to say “take every other n cells”. All I can do is chart B1:B999 and any fancy addition like A1:B999:12 is immediately flagged as an error.

I looked at pivot tables but they can’t spread a 1080-element vector into a 12x90-matrix, which would do the trick.

The nearest I got is to manually enumerate which cells are in the series as B1;B13;B25 … However this is impractical when the series contains more than, say, 10 elements. As I mentionned, I have many series and they are very long.

I’d prefer not to dump the series into a text file and use awk, perl or other macro processor to group the samples differently. I’d like to do it all from inside LO.

Is there a way to define a subset of regularly spaced cells within a range like B1:B10, taking 1 every other n cells, so that this reference can be directly input into the “data range” for a chart?

EDIT: Following @karolus question, my problem is akin to subsampling.

EDIT 2017-03-07: Lupp’s updated.ods for @Lupp examination

Lupp · March 7, 2017, 2:43pm

Sorry. I cannot read this chart. The info added by Max, Average, Min may be of value, and three timeseries sufficiently distinct, I can read. The 24 “old” series ar more like a noise to my eye. They roughly display the “degree of chaos” if anything. Probably much ado about little information.
This is, of course, only an opinion formed by my specific experiences.
In a case of neatly stacked series things may be readable.
I will add a reworked demo derived from yours to my answer.

ajlittoz · March 7, 2017, 5:22pm

The intent was not to be able to read the 24 series. As you pointed out earlier, beyond 3-5 series any chart is too “dense” to be of value, all the more as I dimmed the lines. The “degree of chaos” might be the right description for what I did. The only information derived (and it can be derived independently from “chaos” display) is: samples above the average are more widely scattered and less numerous than those below because the interval avg-max is wider than min-avg.

Lupp · March 5, 2017, 5:41pm

You can do this best (imo) by slecting the subseries in helper columns.
I would prefer to demonstrate this in an example over writing 100 lines.

Look into this demonstration.

(Editing with respect to the comments below:)
“Any other solution without intermediate table?” (@ajlittoz)
No.
I am afraid you would better accept an extra column for each series of your interleaved scheme.
However, you can get it much more compact once assumed you can assure a few specialisations - and no completely missing sample rows as the most relevant assurance. In this case you don’t need to select matches or to search for matches concerning the TOD, but can calculate the indices of rows to access by very simple formulae.

Anyway 24 series put in one chart are by far too many to display even if there is much less noise than in my example. I did not use charts too extensively, but I did. From my experience I would judge 3 series as the maximum for readability if there are the slightest complications. A few more may be acceptable if they are neatly stacked.

See this new demo.
There are a few typos in the chart titles, and probably elswhere, too. I lacked the patience to rectify that. Sorry.

(Editing again: ) The additional demo announced in my comment to the OQ.

ajlittoz · March 5, 2017, 8:47pm

Thanks for the demo, I had somewhat an idea of this sort. However, I find it impractical: to get 24 time lines in the same chart from one sample column, your scheme needs 48 helper columns which I try to avoid. Nevertheless, your solution allows for missing samples in the original column and this is valuable in many circumstances.

Any other solution without intermediate table?

Lupp · March 5, 2017, 9:40pm

There is a general shortcoming of charts: They do not accept calculated ranges. (Validation ranges can be formulae!). But even if a version of LibO to come should support calculated ranges for charts, the formulae to get selections would be rather complicated. This may be significantly different if you can assure an absolutely constant time step from one sample to the next one. In this case the selection needn’t match the time but can generate the indices. Gaps forbidden!

ajlittoz · March 6, 2017, 6:45am

Never mind for typos, the idea is the important thing, not the details, which anyway are different in every application.

Your intermediate table with calculated indexes is exactly what I’m looking for: it is a “constant” sized table with automatic update whenever the original data changes. I did not know the use cases of OFFSET(). Thanks.

ajlittoz · March 6, 2017, 6:46am

Of course, I agree with you about chart readibility concerning number of series.

But even your overcrowded cases are interesting: they show the same global behaviour over time and can be the “early bird” for further study.

ajlittoz · March 6, 2017, 6:50am

My remark about missing samples was triggered by your first implementation. I keep the tip in an obscure corner of my brain.

My original concern is simply constant-step subsampling and your second suggestion fully fits my need.

ajlittoz · March 6, 2017, 9:42pm

I have played a bit with your updated demo. Your “incredibly overcrowded” chart has no readability issue as long as the lines do not bear information: I have dimmed them using very thin lines with transparency 50%. I added Min and Max lines with medium width and 0% transparency and an Average think line with 0% transparency again. Color has been selected to give maximum contrast (thus highlight) against raw data. Now, information is in the 3 added lines (with background references).

Lupp · March 6, 2017, 11:10pm

@ajlittoz: You leave me curious. Would you mind to attach a realistic example of how you use the concept?

ajlittoz · March 7, 2017, 7:16am

I’ve attached the file to the initial question. The idea is to dim enough the clutter so as not to distract attention from the main focus, but to leave it visible enough so that you can grab other properties, such as standard deviation, difference between average and median, …

karolus · March 5, 2017, 8:52pm

Hallo

@ajlittoz: Are you really after slicing (eg. take exactly one Dataset every Hour|month?

Personally I would prefer aggregating data by Hour or Month
Given the Example-Data by @Lupp, I came up with:

Exampledata.ods

Edit: Another Try with calculated Hour and Filter instead grouping:
Exampledata2.ods

ajlittoz · March 5, 2017, 9:02pm

Thanks but I rejected the pivot table idea because it computes a single value from the time-stamped extracted sub-series: average, standard deviation, min, max, … whereas I want to see how the sample evolves over the days.

ajlittoz · March 6, 2017, 8:07am