Calc: automatically count how many characters there are in all sheets (within cells and comments)?

alfons11 · March 30, 2025, 4:02pm

Let´s assume I have an .ods / .xlsx (.odt can be exported) file opened and edited in Calc. I need to edit it on my phone and then on my pc. (It is not very comfortable to simply copy the changes to the original documents.) If I have an .odt, I export it to .xlsx for practical reasons. Than, I edit within WPS Office and transfer to my computer (here I work in Calc). There are some concerns about compatibility. I need to ensure that no UNINTENTIONAL changes of text (of cells or comments) have been made.

The changes of formatting are perhaps inevitable but that´s not what I am concerned about. I would need to have f.e. some macro that counts all characters within all cells and comments (invisible characters should be excluded). As I edit my file on phone, I would add some condition, f. e.: exclude all characters after some unusually sequence (f. e. „!q“) which would also not been included. So, if I would write all new content after this sequence, I would be able to edit the file and the character counter in both files (the previous computer version and the later version from my cellphone) should display the same number. If the numbers are different, I should be told what cells exactly are different.

I try to write some macro with AI but it fails all the time. Can my task be accomplished? Should I try harder or give up?

Thank you.

LO 24.8.5.2 (X86_64)

CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: cs-CZ (cs_CZ); UI: cs-CZ

Wanderer · March 31, 2025, 5:32am

Party. I see some exeptions:

A simple counter can not tell you where a change occured. So for this you would need to store a full grid of counters A1:7, A2:14…

Another problem is : character-count can not find all changes.
=40*A1" equals A1+B1`, so programmers created crc, md5, sha256… as much better alternatives.

Hard to judge. IMHO AI can help, if you have some skills with programming and the problems are solved elsewhere.

My first attempt would usually be to check, if “track changes” activated helps, but never tried with conversions to excel-formats and back.
.
Another option would be to compare exported .csv- versions of the files. There are existing programs like diff/meld for text.
.
I once wrote a program for specific datasets to compare and remember there is a lot of minor detail to be handled to differ between minor changes like “Test” vs "Test" and real edits.

EarnestAl · March 31, 2025, 8:30am

Have you considered using Collabora Office on your phone instead of WPS Office? Collabora shares a lot of code with LibreOffice and uses .odf as default format.

alfons11 · April 16, 2025, 7:12pm

Thank you for your tip. I tried it, but it’s too slow on my phone, and the comments aren’t displayed correctly (comments are crucial for me)."

alfons11 · April 16, 2025, 7:23pm

Thank you for all the advice. I approached it differently. ChatGPT wrote a Python script that compares both versions of the file. For each sheet, I can see how many cells have changed (I´m interested only in text), and then I get a detailed report with specific changes that I can scan if I’m suspicious about any of them. I don’t use any sequence that would exclude the row from the count.

The Python script:

import openpyxl
from copy import copy
import re
import os

# === Cesty ===
BASE_PATH = r"xxxxxx"
PUVODNI_SOUBOR = os.path.join(BASE_PATH, "1_KOSTKA.xlsx")
UPRAVENY_SOUBOR = os.path.join(BASE_PATH, "1_KOSTKA_OPRAVENO.xlsx")
VYSTUPNI_SOUBOR = os.path.join(BASE_PATH, "1_KOSTKA_KONTROLA_ZNAKY.xlsx")
TXT_SOUBOR = os.path.join(BASE_PATH, "kontrola_znaku_souhrn.txt")

# === Seznam zkratek ===
ZKRATKY = [
    "cer.", "sed.", "sedtm", "modsv", "ruz.", "fia.", "modstr", "zelkr",
    "zlutozel", "zlu.", "hne.", "ora.", "modtm", "fiatm", "zeltm", "pur.", "mag."
]

def normalize(text):
    """Znormalizuje text pro srovnání – odstraní mezery, převede na malá písmena."""
    return re.sub(r"\s+", "", str(text).lower())

def pocet_znaku(text):
    """Spočítá počet znaků bez neviditelných a nadbytečných znaků."""
    return len(re.sub(r"[\s\u200b-\u200f\u202a-\u202e]", "", str(text or "").strip()))

# === Načti sešity ===
print("🔄 Načítám původní soubor...")
wb_orig = openpyxl.load_workbook(PUVODNI_SOUBOR)
print("🔄 Načítám upravený soubor...")
wb_new = openpyxl.load_workbook(UPRAVENY_SOUBOR)
wb_out = openpyxl.Workbook()
wb_out.remove(wb_out.active)

# === Inicializace ===
souhrn = {}
zmeny_detail = []
komentar_souhrn = {}
celkem_puv = celkem_novy = 0
celkem_kpuv = celkem_knovy = 0

# === Projdi listy ===
for sheetname in wb_new.sheetnames:
    print(f"📄 {sheetname}")
    s_orig = wb_orig[sheetname] if sheetname in wb_orig.sheetnames else None
    s_new = wb_new[sheetname]
    s_out = wb_out.create_sheet(sheetname)

    max_r = max(s_new.max_row, s_orig.max_row if s_orig else 0)
    max_c = max(s_new.max_column, s_orig.max_column if s_orig else 0)

    zn_puv = zn_novy = kom_puv = kom_novy = zmeneno = 0

    for r in range(1, max_r + 1):
        for c in range(1, max_c + 1):
            co = s_orig.cell(r, c) if s_orig else None
            cn = s_new.cell(r, c)
            cout = s_out.cell(r, c)

            v1 = co.value if co else ""
            v2 = cn.value if cn else ""
            t1, t2 = str(v1), str(v2)

            zn1 = pocet_znaku(t1)
            zn2 = pocet_znaku(t2)

            zn_puv += zn1
            zn_novy += zn2

            # === Porovnání obsahu buněk ===
            if normalize(t1) != normalize(t2):
                zmeny_detail.append(f"{sheetname} [{r},{c}]: '{t1}' → '{t2}'")
                text = t2 + " !změna!"
                zmeneno += 1
            else:
                text = t2

            cout.value = text

            # === Kopírování formátu ===
            if cn.has_style:
                cout.font = copy(cn.font)
                cout.alignment = copy(cn.alignment)
                cout.border = copy(cn.border)
                cout.fill = copy(cn.fill)
                cout.number_format = cn.number_format

            # === Komentáře ===
            k1 = co.comment.text if co and co.comment else ""
            k2 = cn.comment.text if cn and cn.comment else ""
            kom_puv += pocet_znaku(k1)
            kom_novy += pocet_znaku(k2)
            if k1 != k2:
                zmeny_detail.append(f"{sheetname} [{r},{c}] KOMENTÁŘ:\n  Původní: {k1}\n  Nový:     {k2}")

    celkem_puv += zn_puv
    celkem_novy += zn_novy
    celkem_kpuv += kom_puv
    celkem_knovy += kom_novy
    souhrn[sheetname] = (zn_puv, zn_novy, zmeneno)
    komentar_souhrn[sheetname] = (kom_puv, kom_novy)

# === Ulož výstupní soubor ===
wb_out.save(VYSTUPNI_SOUBOR)

# === Ulož textový výstup ===
with open(TXT_SOUBOR, "w", encoding="utf-8") as f:
    for sheet in souhrn:
        p, n, z = souhrn[sheet]
        kp, kn = komentar_souhrn[sheet]
        f.write(f"{sheet}:\n")
        f.write(f"  - Buňky:     {p} → {n} (Δ {n - p}), změněno: {z}\n")
        f.write(f"  - Komentáře: {kp} → {kn} (Δ {kn - kp})\n\n")
    f.write(f"CELKEM:\n  - Buňky: {celkem_puv} → {celkem_novy} (Δ {celkem_novy - celkem_puv})\n")
    f.write(f"  - Komentáře: {celkem_kpuv} → {celkem_knovy} (Δ {celkem_knovy - celkem_kpuv})\n\n")
    f.write("🔍 Změny:\n")
    for z in zmeny_detail:
        f.write("- " + z + "\n")

print("✅ Hotovo. Výstupní soubor i textový přehled vytvořen.")

[erAck: edited to format code as code, see This is the guide - How to use the Ask site? - #6 by erAck]

fpy · March 31, 2025, 5:39am

see the .csv approach : How to show word & character count in Calc?