Yearwise sort a block of texts

AniruddhaMohod · March 14, 2025, 10:05am

I am using Ubuntu 22.04 and LibreOffice Version: 25.2.1.2

I have a document file with blocks. Previously with the help of asklibreoffice and @KamilLanda I managed to extract the blocks containing a phrase in the table. Now the phrase actually is a Year in the block but outside the table. I wish the blocks should be extracted by sorting the case number yearwise. The file is attached herewith.
Exp.odt (41.8 KB)

KamilLanda · March 14, 2025, 10:21am

So is the criterion for sorting the red marked numbers?

Lupp · March 14, 2025, 11:00am

I would assume the intended order is as shown in the attached .ods.
I’m not sufficiently familiar with Writer to suggest a method for extracting the related text parts and re-joining them sorted.
(Yes. I would use a helper sheet created by the routine for sorting. Such helpers are cheap in monolithic LibreOffice.)
disask119477_Exp.ods (23.5 KB)

Lupp · March 14, 2025, 11:02am

Always post the link in such a case!

AniruddhaMohod · March 14, 2025, 11:35am

Yes, Only year is enough. But if we can sort with the case number it is well and good.

AniruddhaMohod · March 14, 2025, 11:37am

Lupp · March 14, 2025, 12:01pm

The example sheet I attached uses the SORTBY() function which was introduced with
V 24.8. You won’t see the results with your versions therefore.
However, in an automatically created helper sheet the sorting could also be done “in situ” using a SortDescriptor.

Villeroy · March 18, 2025, 2:13pm

This looks like a perfect use case for a serial letter or database report.

Lupp · March 18, 2025, 8:10pm

I’ve been following this thread pretty much from the beginning, and have been thinking about it for some hours. I also created and verified a quite reasonable alternative and extendable solution for the first subtask.
As expected, the next task was to sort by other elements…
That way things tend to move on.
Now I feel compelled to say something very general: texts that purport to contain data in embedded pieces are fundamentally unsuitable. In this case, they perpetuate a way of thinking that belongs to the 18th century.
Someone who knows the context and has time to read can recognize what the texts are about.
An IT-based solution must be designed the other way around:

Define the required data abstractly (NOT by a few examples).
Design an appropriately structured environment for the maintenance of the data.
Create a format (or several) for the output as “pretty print”.
Organise a technical way to fill the needed data into the forms.

Villeroy is right that this is basically a task for a database.
However, a solution that uses spreadsheets for both - data storage and output - can also have certain advantages, among them more flexibility for the authorised user.
Extracting data from letter-shaped documents of outdated forms to allow reorganization is not an approach for IT-supported work.

Extract everything needed in the future now once and shift to a solution as described above.

Created with support by DeepL.com (free version). The errors are mine.

karolus · March 15, 2025, 1:42pm

Some rough code for the whole job:

python

import re
from com.sun.star.text.ControlCharacter import PARAGRAPH_BREAK 

def new_doc(which):
    desktop = XSCRIPTCONTEXT.getDesktop()
    return desktop.loadComponentFromURL(f"private:factory/s{which}",
                                       "_blank",
                                       0,
                                       (),)

def clone_para(cursor, paragraph):
    for portion in paragraph:
        cursor.String = portion.String
        cursor.CharWeight = portion.CharWeight
        cursor.ParaAdjust = portion.ParaAdjust
        cursor.gotoEnd(False)

def get_sorted_source(doc):        
    rex = re.compile(r'\s/\s(\d+)\s*/\s*(\d{4})\s')
    cursor = doc.Text.createTextCursor()
    paragraphs = []
    for paragraph in doc.Text:
        try:
            if paragraph.String.startswith('--------'):
                continue
            if paragraph.String:
                paragraphs.append(paragraph)
        except:
            continue
    sort_keys = [(int(year), int(ids)) 
                  for ids, year 
                  in rex.findall(doc.Text.String)]
    tables = [table for table in doc.TextTables]
    return sorted(zip(sort_keys, paragraphs, tables), key=lambda p: p[0])    

def create_sorted_file(content):    
    doc = new_doc('writer')
    text = doc.Text    
    dc = text.createTextCursor()
    for _, paragraph, table in content:
        frame = doc.createInstance("com.sun.star.text.TextTable")
        frame.KeepTogether = True
        frame.BackColor = int("ffff88",16)
        frame.initialize(1,1)
        text.insertTextContent(dc.End, frame, 0)
        cell = frame.getCellByPosition(0,0)
        cursor = cell.Text.createTextCursor()
        cursor.gotoEnd(False)    
        clone_para(cursor, paragraph)
            
        new_table = doc.createInstance("com.sun.star.text.TextTable")
        new_table.KeepTogether = True
        new_table.BackTransparent = False
        new_table.BackColor = -1
        new_table.initialize(table.Rows.Count, table.Columns.Count)
        
        cell.Text.insertTextContent(cursor, new_table, 0)
        for name in table.CellNames:
            source = table.getCellByName(name)
            target = new_table.getCellByName(name)
            c_cursor = target.Text.createTextCursor()
            for para in source.Text:
                clone_para(c_cursor, para)
                if not c_cursor.isEndOfParagraph():
                    target.Text.insertControlCharacter(c_cursor, PARAGRAPH_BREAK, False)
        new_table.TableColumnSeparators = table.TableColumnSeparators
        dc.gotoEnd(False)    
        dc.String = f"{'-'*120}"
        text.insertControlCharacter(dc.End, PARAGRAPH_BREAK, False)

def main():
    doc = XSCRIPTCONTEXT.getDocument()
    content = get_sorted_source(doc)
    create_sorted_file(content)

see attached file with embedded python
ask_119477.ods.odt (36.1 KB)

KamilLanda · March 15, 2025, 2:13pm

The trick for sorting is transform two integers from CaseNo / Year to one double → Year.CaseNo. There is test for maximum length of CaseNo (variable dMax) to transform to double properly.
For example 1265 / 2024, 59 / 2024. OK is 2024.1265, 2024.0059 and NO 2024.59!!!.

Sorting is via QuickSort.

Because the blocks of text are swapped in document, there are also some tests → CaseNo is between IN THE COURT OF and Table; ignoring the Dashed lines in Table; and test for missing Dashed line at the end of Table (really only one block is selected).

After swapping the procedure deleteUselessDashedLines will delete the Dashed lines at the start of pages and last final Dashed line.

If there is occured some error during swapping, the UndoManager automatically returns the executed changes, so in case of error the document should be unchanged.

It wasn’t as easy as I thought, so I hope it will be functional

@AniruddhaMohod, update:

@JohnSUN noticed the bug dim p(o to 30000) → dim p(0 to 30000).
I also made two “cosmetic” changes:

faster while…wend instead of do until (there isn’t a reason to jump from loop).
oDoc.unlockControllers without if oDoc.hasControllersLocked() then for successful run of macro.
sort-the-blocks-in-text-QuickSort.odt (41.3 kB)

karolus · March 15, 2025, 2:29pm

I agree with you on that!
it looks like you has done a better job on keeping the original structure and formatting on this messy Source!!

AniruddhaMohod · March 15, 2025, 3:56pm

You are simply great.

AniruddhaMohod · March 18, 2025, 6:17am

In India in every court LibreOffice is used. In my court I used to type the order sheet in English. In some courts the order sheet used to be type in Marathi. In Marathi Roznama (Order Sheet) the beginning of the block is [:print:]{1,100} यांचे न्यायालयात and for Case No. प्रकरण क्रमांक: is typed. I tried to replace your IN THE COURT OF with [:print:]{1,100} यांचे न्यायालयात and Case No. with प्रकरण क्रमांक: but this time I did no succeed. Kindly guide me so that the blocks in Marathi Language will be sorted yearwise.
Marathi Roznama.odt (58.4 KB)

AniruddhaMohod · March 18, 2025, 6:29am

Previously with your help I managed to give Sr. No. before the case no. Some people will also need that the blocks should be sorted according to the Sr. No. Attached herewith block with Sr. No.
Sort Roznama Serial Number Wise.odt (62.8 KB)

KamilLanda · March 18, 2025, 1:46pm

for Marathi Roznama:
Unfortunately I’m absolutely not able to recognize the words in Marathi script, but I only changed the constants str1 and strNum and it seems functionally.
edit1KL - Marathi Roznama.odt (60.8 kB)

I see only the squares on screen instead of Marathi characters (and don’t know why, the devanagari fonts I have installed), so my changes are in ODT

KamilLanda · March 18, 2025, 1:49pm

for Sort Roznama Serial Number Wise:
What is Sr. No.? Is it the code like MHAM180002292017 or something else?

AniruddhaMohod · March 18, 2025, 3:05pm

The number before case no. I managed to put the serial number before case number because of you.

AniruddhaMohod · March 18, 2025, 3:08pm

The 1) before the case no.

KamilLanda · March 18, 2025, 4:24pm

And if the Sr.No.) is missing? (at 1st page between 19) and 20) is missing)

Show some msgbox for missing one?