Calc: How to find anomalies

By chance, I found two anomalies in my numerical data. Numbers prefixed with ticks, making them text.
.
They became apparent when I used the autosum Sigma after clicking on the cell below the last data row. The range in the resulting formula did not reach to the top of the column. It stopped below the cells with the ticks.

Also, again by chance, I found ticks prefixing dates when I sorted on that column. Those I fixed with @newbie-02 regex solution from 2020:

What other anomalies are common / might exist? and how can they be detected?

How is the data obtained, by receiving the file as is, importing a CSV or another type of file?

Lets suppose your first source, an ods I received.

View > Value Highlighting will change the color of numeric fields

Value Highlighting Help

2 Likes

@lodf2023
Ha! just came across that myself, @RoryOF in a 2023 thread here:
.
https://forum.openoffice.org/en/forum/viewtopic.php?p=539924#p539924
.
It’ll help with text ticks, so its a start, not sure about other anomalies.

I hear ya. I think try to use Styles more and Direct Formatting less. And, actively check for errors or reasonable values with functions or conditional formatting.

I’m using a additional sheet named checksum with various checksums.
.
For example, using known-to-be-good data calculate various SUM SUMIF SUMIFS, duplicate them, turn one set into literals with Data > Calculate > Formula to Value, then subtract the pairs. All the results will be zero.
.
Sum the absolute values of the results into a tally cell.
.
Going forward, changes in the monitored data will cause one or more of the checksums to be nonzero, and the tally to be nonzero.
.
In the data sheet, conditionally format background color of the header row so as to alert for nonzero values of the tally cell.

Other anomalies.

  • anomaly (how discovered) (how fixed)
  • mix of vertical and angled double quotes (save as CSV) (delete from autocorrect, then find-replace)
  • mix of vertical and angled single quotes (save as CSV) (delete from autocorrect, then find replace)
  • trailing spaces (Data > More Filters > Standard filter > Options > No Duplications) (regex [ ]+$ find replace)
  • multiple spaces (Data > More Filters > Standard filter > Options > No Duplications) (regex [ ]+ find replace)
  • mix of borders, 6-6-2-2 and 5-5-2-2 (columns not precisely aligned) (select all, then set border)
  • In columns of text, some cells were formatted as Text, some as Number (inconsistent quoting in CSVs) (Select columns, then format as Text @)

In a true database

  • you can not store text in numeric columns
  • you can not store numbers in date columns
  • there are many more distinct types of data where spreadsheets have text and number only
  • you can not store records (rows) if values are missing where they are not allowed to be missing
  • you can not store any duplicate records (rows) that you have defined as duplicates
  • you can not store any pointer (reference) to another table’s record if that record does not exist in the other table.
  • Every text (name, ID, address, description) is stored only once. Any typos can be fixed in one place.
  • formatting is meaningless; it’s all about data

When importing database data into Calc through linked import ranges:

  • you can rely on all the above rules. There can not be inconsistent nor incomplete data.
  • your record set will be complete (there are no references like A1:A90 that should be A1:A100)
  • your formulas adjacent to the linked import ranges will expand and shrink with the imported record set

I made a template to analyse spreadsheet data if and how they can be copied into a database. In the attached document derived from that template, I imported 3 database columns on the first sheet, added some errors and pushed the macro button on sheet2 to expand the formula ranges.
The simple formulas in rows 11 through 20 apply some most simple sheet functions to find the right column type for the database table. Finally the sheet concatenates a SQL statement to generate the right table for the data to be copied from sheet into the database without violating database rules
ask132245.ods (69.6 KB)
.