Formatting handling problem in a Python script

petrix · October 4, 2024, 8:52am

Hello!

I’m building a LibreOffice extension to sort alphabetically Sanskrit wordlists (in IAST transcription) . I’m not a professional programmer (just a Linux enthusiast involved in digital humanities), I have some roughly intermediate knowledge of Python.

I managed to build a word sorting macro (with alphabetical mapping of transliteration signs and some exceptions handling pertaining to some phonetic rules of Sanskrit). After that, somehow I built an extension around it. Everything seems to work quite smoothly. So far so good!

Fact is, if I apply the sorting to some elements of the list, which have formatting attributes (such as italics stems or chunks of words somehow marked). Their formatting disappears right after running the sorting function, and that makes all my building efforts useless.
AI obviously provides faulty solutions. Can I ask for help?
Thanks in advance

my main.py is

import uno
import unohelper
from com.sun.star.task import XJobExecutor

ID_EXTENSION = 'org.examples.sskr_wl_sorter.000'
SERVICE = ('com.sun.star.task.Job',)

# The new sorting function for both Calc and Writer documents
def sort_sanskrit_in_document():
    """Sorts Sanskrit words in LibreOffice Calc or Writer. In Calc, it sorts the whole sheet, ignoring selection."""
    
    try:
        # Connect to LibreOffice
        localContext = uno.getComponentContext()
        smgr = localContext.ServiceManager
        desktop = smgr.createInstanceWithContext("com.sun.star.frame.Desktop", localContext)

        # Get the active document
        model = desktop.getCurrentComponent()

        # Check if the document is a Calc sheet or Writer document
        if hasattr(model, "Sheets"):
            # If it's a Calc spreadsheet, sort the whole sheet, ignoring the selection
            sheet = model.CurrentController.ActiveSheet

            # Find the last non-empty row
            last_row = 0
            for i in range(sheet.Rows.getCount()):
                cell = sheet.getCellByPosition(0, i)  # First column
                if cell.getString().strip() == "":  # If empty, stop
                    break
                last_row = i

            # Find the last non-empty column
            last_col = 0
            for j in range(sheet.Columns.getCount()):
                cell = sheet.getCellByPosition(j, 0)
                if cell.getString().strip() == "":
                    break
                last_col = j

            # Select the entire range
            cell_range = sheet.getCellRangeByPosition(0, 0, last_col, last_row)

            # Read all rows
            rows = []
            for i in range(cell_range.Rows.getCount()):
                row = []
                for j in range(cell_range.Columns.getCount()):
                    cell = cell_range.getCellByPosition(j, i)
                    row.append(cell.getString().strip())
                rows.append(row)

            # Sort the rows using the first column
            sorted_rows = sorted(rows, key=lambda row: custom_sort(row[0], mapping))

            # Write the sorted rows back to the spreadsheet
            for i, row in enumerate(sorted_rows):
                for j, value in enumerate(row):
                    cell = sheet.getCellByPosition(j, i)
                    cell.setString(value)

        elif hasattr(model, "Text"):
            # If it's a Writer document
            text = model.Text
            cursor = model.CurrentController.getViewCursor()

            # Get the selected text or the entire document if nothing is selected
            if cursor.isCollapsed():
                cursor.gotoStart(False)
                cursor.gotoEnd(True)

            # Read the selected text as lines
            selected_text = cursor.getString().strip().splitlines()

            # Sort the existing lines
            sorted_lines = sorted(selected_text, key=lambda line: custom_sort(line.split()[0], mapping) if line.strip() else "")

            # Clear the selected text
            cursor.setString("")  # Replace the selected text with an empty string

            # Insert the sorted text at the current cursor position
            text.insertString(cursor, "\n".join(sorted_lines), False)  # Insert the sorted text

        else:
            raise Exception("This document is neither a spreadsheet nor a Writer document.")

    except Exception as e:
        print(f"Error during sorting: {e}")


# Custom sorting functions and mapping
def custom_sort(word, mapping):
    if not word:
        return word  # Return the empty word if there's nothing to sort

    word_for_sorting = word.replace('√', '')  # Remove the root symbol for sorting
    preprocessed_word = preprocess_word(word_for_sorting, mapping)
    transformed_word = "".join(mapping.get(char, char) for char in preprocessed_word)

    # Handle hierarchy between hyphen (-) and equals (=)
    hierarchy_weight = ''
    if '-' in word and '=' in word:
        hierarchy_weight = '1' if word.index('-') < word.index('=') else '2'
    elif '-' in word:
        hierarchy_weight = '1'
    elif '=' in word:
        hierarchy_weight = '2'

    return transformed_word + hierarchy_weight


def preprocess_word(word, mapping):
    """Pre-processes the word for sorting, removing the root symbol."""
    preprocessed_word = ""
    i = 0
    while i < len(word):
        if word[i] == '√':
            i += 1
            continue
        if word[i] == 'ṃ':
            if i < len(word) - 1 and word[i + 1] in 'kgcṭtdpb':
                nasal_replacements = {'k': 'ṅ', 'g': 'ṅ', 'c': 'ñ', 'j': 'ñ', 'ṭ': 'ṇ', 'ḍ': 'ṇ', 't': 'n', 'd': 'n', 'p': 'm', 'b': 'm'}
                preprocessed_word += nasal_replacements.get(word[i + 1], 'ṃ')
            else:
                preprocessed_word += 'ṃ'
        elif word[i] == 'ḥ':
            if i < len(word) - 1 and word[i + 1] in 'śṣs':
                preprocessed_word += word[i + 1]
            else:
                preprocessed_word += 'ḥ'
        else:
            preprocessed_word += word[i]
        i += 1
    return preprocessed_word


# Mapping for sorting Sanskrit characters
mapping = {
    'a': '01', 'ā': '02', 'i': '03', 'ī': '04', 'u': '05', 'ū': '06',
    'ṛ': '07', 'ṝ': '08', 'ḷ': '09', 'e': '10', 'ai': '11', 'o': '12',
    'au': '13', 'ṃ': '14', 'ḥ': '15', 'k': '16', 'kh': '17', 'g': '18',
    'gh': '19', 'ṅ': '20', 'c': '21', 'ch': '22', 'j': '23', 'jh': '24',
    'ñ': '25', 'ṭ': '26', 'ṭh': '27', 'ḍ': '28', 'ḍh': '29', 'ṇ': '30',
    't': '31', 'th': '32', 'd': '33', 'dh': '34', 'n': '35', 'p': '36',
    'ph': '37', 'b': '38', 'bh': '39', 'm': '40', 'y': '41', 'r': '42',
    'l': '43', 'v': '44', 'ś': '45', 'ṣ': '46', 's': '47', 'h': '48',
}

# Main Extension class
class MyFirstExtension(unohelper.Base, XJobExecutor):
    def __init__(self, ctx):
        self.ctx = ctx
        print("MyFirstExtension initialized")  # Debugging output

    # XJobExecutor trigger function
    def trigger(self, event):
        print(f"Triggered by event: {event}")  # Debugging output
        sort_sanskrit_in_document()  # Call the Sanskrit sorting function for both Calc and Writer
        return

# Register the implementation
g_ImplementationHelper = unohelper.ImplementationHelper()
g_ImplementationHelper.addImplementation(MyFirstExtension, ID_EXTENSION, SERVICE)

elmau · October 4, 2024, 4:56pm

“Their formatting disappears”

What format? shows an example before and after running your code.

petrix · October 4, 2024, 5:29pm

Hello there! Thank you deeply for your answer! I really appreciate it! I’m attaching two screenshots, one before the sorting, with random formatting applied to words, and the second one after the sorting process.

There’s also this strange thing, after sorting , all the line breakings instead of paragraphs. It’s relevant I guess, but I don’t know how to handle that too…

karolus · October 4, 2024, 6:01pm

please upload the writer document instead screenshot.
as a first hint: replace direct-formatting of the paragraphs with paragraph-styles, to simplify the task.

petrix · October 4, 2024, 6:42pm

skr_sample_list.odt (13.9 KB)

That’s the sample odt I used in the screenshots. The formatting of the first one was random,
I’ve given a first attempt to replace direct-formatting with paragraph-styles, but maybe I’m not doing it well…

Thanks for your kindness, really!

petrix · October 7, 2024, 2:11pm

I solved the problem in the question. But I have another issue (solving a problem creating another?), for which I’ll ask another question, sharing a couple of more complex codes… I don’t know if there’s the opportunity to delete this question. I can’t find a delete option

Wanderer · October 7, 2024, 5:13pm

Only for moderators. But, if you think of it: Why delete? It may help somebody else sometime later. So you are welcome to present your own solution here…

petrix · October 7, 2024, 5:32pm

Because this was a very basic code. In the new topic I shared two more complex codes that are the evolution of this one here, in one of which there’s the implementation of what I was looking for, a more complex format-handling.

The question can stay, I was just asking. Maybe I can add a link to the new topic in here so who could be interested can follow to the next episode

Why not? Here it is!

https://ask.libreoffice.org/t/how-to-implement-limited-text-range-in-my-python-script-of-an-alphabetical-sorting-libreoffice-extension/112266/2

Thanks for the input