Pivot table for distinct count?

Consider the following table in Calc (The table has 25,000 entries):

|Vehicle|Colour|

|Car | Red |

|Car | Red |

|Car | Blue |

|Car | Green |

|Boat | Red |

|Boat | White |

I want to count how many distinct colours there are for each vehicle:

|Vehicle|Colour count|

| Car | 3 |

| Boat | 2 |

What is the best way to achieve this?

I’ve tried a pivot table with:

Row fields: Car, Colour

Data fields: Colour(count)

This yields:
|Car | Red | 2|

|Car | Blue | 1|

|Car | Green | 1|

|Boat | Red | 1|

|Boat | White | 1|

Do you have any hint on how to transform this into what I need?

I think, that is a task for a database. Generate a new database file with the embedded HSQLDB. Mark your data in Calc and drag it to the table container in the Database UI. Check to generate a primary key and to use the first line as column names in the import wizard. Import all columns.

Generate a query in SQL view using the SQL command

SELECT "Vehicel", COUNT( DISTINCT "Colour" ) FROM "Table1" GROUP BY "Vehicel"

I have assumed, that you use the default table name Table1.

Unfortunately the function COUNT(DISTINCT …) is not available for a spreadsheet as database, so you need to convert your data to the HSQLDB format. Do you really want to calculate on that data? It is likely, that the things you want to do can be better done in a database anyway.


And here a suggestion to do it using a pivot table. Ask_PivotCountDistinct.ods.

Main idea: Put the category as column field and the characteristic attributes as row field. Put characteristic attributes as data field with aggregate function COUNT. Show row total. If the category does not has the attribute, the field is empty, but the other fields contain numbers. Therefore a simple count of the data area will give the number of distinct attributes. To get the range of the data area I search for the text “Total Result”. This way the pivot table may shrink or grow depending on the data. The top-left field of the pivot table is named, so that it can easily be used in the pivot table layout dialog and in my range calculation. For other details look at the used formulas.

Yeah, thanks for that Regina. Importing into a DBMS will allow for much more flexibility. If I don’t find any other answer within Calc I might go down that route.

I did never design databases of large scale but I HAD TO use some, alas.
Form my personal experience: I got much more flexibility and better productivity using spreadsheets - but I had, of course to accept some disadvantages. Mostly I was one of very few users and I always had the structure in mind.
The main advantage for me was easy PREPARATION of data for the “base tables/columns” by formulae and the (sometimes short term, “ad hoc”) evaluation of reports without being restricted to SQL or something similar.

I did never design large scale databases - but I HAD TO use some, alas.
From my personal experience: I got much more flexibility and productivity using spreadsheets. I had to accept some disadvantages, of course. In most cases I was the only or one of very few users and we always had the structure in mind. A main advantage was being able to easily PREPARE data for the actual base data to be queried and reported about. The other one was not being restricted to SQL or something similar and being able to easily implement ad-hoc evaluation of reported data (“second step reports”).

With pivot tables others are much better than I am and Regina’s database suggestion is well worth considering. Nevertheless I personally might be tempted to avoid designing and maintaining a database - and to avoid interactive means as well. I can show you how I would presumedly do it: by formulae accepting the need for some helpers. You may have a look into the attached example document ask34668CountDistinct001.ods. (I expect the suggestion to be doomed by some professionals.)

EDITING:

I have to add an alert. It wasn’t the first time that I came over a problem with filling formulae containing range addresses into adjacent cells. I missed regarding!

What Problem? Open the newly attached file and read, please, if interested. ask34668CountDistinct001c.ods

Nice hack. Has the advantage of keeping things within Calc. I have to try the performance on a 25k entry table! :slight_smile:

I sometimes experimented with sorting by formulae on that scale and cannot recommend it. If you know the complete list of ‘Types’ (Car/Boat/…) in advance it (Sheet1) should work with acceptable speed for 25 kRow. It’s a pity that spreadsheet software generally (as I see it) does not provide effective sorting by formulae while sorting by ‘Data’>‘Sort …’ is performing well (25E3 alpha keys < 1/2 s on my system just measured).

IMO it’s desirable getting the same algorithm at hand implemented as an array function MSORT() with some controlling parameters. Alphanumeric lexicographical sorting might even be done with linear time complexity by a function.

Hi Lupp, maybe this fantastic extension to sort in the OpenOffice forum it’s what you are looking for?
[Python] Array Sort add-in function.

Hi mariosv, many thanks for the hint, but seems I am no longer capable of …?..
I installed the add-on function Pysort V1 from XPySort.zip (hidden oxt) and “tried to try” it on an example but did not succeed. I seem to stubbornly misinterpret some explanation about the parameters. Could you provide a simple ods example using PYSORT()?

Try this

Row fields: Car

Data fields: Colour(count)