Yearwise sort a block of texts

I am using Ubuntu 22.04 and LibreOffice Version: 25.2.1.2

I have a document file with blocks. Previously with the help of asklibreoffice and @KamilLanda I managed to extract the blocks containing a phrase in the table. Now the phrase actually is a Year in the block but outside the table. I wish the blocks should be extracted by sorting the case number yearwise. The file is attached herewith.
Exp.odt (41.8 KB)

So is the criterion for sorting the red marked numbers?

I would assume the intended order is as shown in the attached .ods.
I’m not sufficiently familiar with Writer to suggest a method for extracting the related text parts and re-joining them sorted.
(Yes. I would use a helper sheet created by the routine for sorting. Such helpers are cheap in monolithic LibreOffice.)
disask119477_Exp.ods (23.5 KB)

Always post the link in such a case!

Yes, Only year is enough. But if we can sort with the case number it is well and good.

The example sheet I attached uses the SORTBY() function which was introduced with
V 24.8. You won’t see the results with your versions therefore.
However, in an automatically created helper sheet the sorting could also be done “in situ” using a SortDescriptor.

This looks like a perfect use case for a serial letter or database report.

I’ve been following this thread pretty much from the beginning, and have been thinking about it for some hours. I also created and verified a quite reasonable alternative and extendable solution for the first subtask.
As expected, the next task was to sort by other elements…
That way things tend to move on.
Now I feel compelled to say something very general: texts that purport to contain data in embedded pieces are fundamentally unsuitable. In this case, they perpetuate a way of thinking that belongs to the 18th century.
Someone who knows the context and has time to read can recognize what the texts are about.
An IT-based solution must be designed the other way around:

  • Define the required data abstractly (NOT by a few examples).
  • Design an appropriately structured environment for the maintenance of the data.
  • Create a format (or several) for the output as “pretty print”.
  • Organise a technical way to fill the needed data into the forms.

Villeroy is right that this is basically a task for a database.
However, a solution that uses spreadsheets for both - data storage and output - can also have certain advantages, among them more flexibility for the authorised user.
Extracting data from letter-shaped documents of outdated forms to allow reorganization is not an approach for IT-supported work.

Extract everything needed in the future now once and shift to a solution as described above.

Created with support by DeepL.com (free version). The errors are mine.

1 Like

Some rough code for the whole job:

python

import re
from com.sun.star.text.ControlCharacter import PARAGRAPH_BREAK 

def new_doc(which):
    desktop = XSCRIPTCONTEXT.getDesktop()
    return desktop.loadComponentFromURL(f"private:factory/s{which}",
                                       "_blank",
                                       0,
                                       (),)

def clone_para(cursor, paragraph):
    for portion in paragraph:
        cursor.String = portion.String
        cursor.CharWeight = portion.CharWeight
        cursor.ParaAdjust = portion.ParaAdjust
        cursor.gotoEnd(False)

def get_sorted_source(doc):        
    rex = re.compile(r'\s/\s(\d+)\s*/\s*(\d{4})\s')
    cursor = doc.Text.createTextCursor()
    paragraphs = []
    for paragraph in doc.Text:
        try:
            if paragraph.String.startswith('--------'):
                continue
            if paragraph.String:
                paragraphs.append(paragraph)
        except:
            continue
    sort_keys = [(int(year), int(ids)) 
                  for ids, year 
                  in rex.findall(doc.Text.String)]
    tables = [table for table in doc.TextTables]
    return sorted(zip(sort_keys, paragraphs, tables), key=lambda p: p[0])    

def create_sorted_file(content):    
    doc = new_doc('writer')
    text = doc.Text    
    dc = text.createTextCursor()
    for _, paragraph, table in content:
        frame = doc.createInstance("com.sun.star.text.TextTable")
        frame.KeepTogether = True
        frame.BackColor = int("ffff88",16)
        frame.initialize(1,1)
        text.insertTextContent(dc.End, frame, 0)
        cell = frame.getCellByPosition(0,0)
        cursor = cell.Text.createTextCursor()
        cursor.gotoEnd(False)    
        clone_para(cursor, paragraph)
            
        new_table = doc.createInstance("com.sun.star.text.TextTable")
        new_table.KeepTogether = True
        new_table.BackTransparent = False
        new_table.BackColor = -1
        new_table.initialize(table.Rows.Count, table.Columns.Count)
        
        cell.Text.insertTextContent(cursor, new_table, 0)
        for name in table.CellNames:
            source = table.getCellByName(name)
            target = new_table.getCellByName(name)
            c_cursor = target.Text.createTextCursor()
            for para in source.Text:
                clone_para(c_cursor, para)
                if not c_cursor.isEndOfParagraph():
                    target.Text.insertControlCharacter(c_cursor, PARAGRAPH_BREAK, False)
        new_table.TableColumnSeparators = table.TableColumnSeparators
        dc.gotoEnd(False)    
        dc.String = f"{'-'*120}"
        text.insertControlCharacter(dc.End, PARAGRAPH_BREAK, False)

def main():
    doc = XSCRIPTCONTEXT.getDocument()
    content = get_sorted_source(doc)
    create_sorted_file(content)    

see attached file with embedded python
ask_119477.ods.odt (36.1 KB)

1 Like

The trick for sorting is transform two integers from CaseNo / Year to one double → Year.CaseNo. There is test for maximum length of CaseNo (variable dMax) to transform to double properly.
For example 1265 / 2024, 59 / 2024. OK is 2024.1265, 2024.0059 and NO 2024.59!!!.

Sorting is via QuickSort.

Because the blocks of text are swapped in document, there are also some tests → CaseNo is between IN THE COURT OF and Table; ignoring the Dashed lines in Table; and test for missing Dashed line at the end of Table (really only one block is selected).

After swapping the procedure deleteUselessDashedLines will delete the Dashed lines at the start of pages and last final Dashed line.

If there is occured some error during swapping, the UndoManager automatically returns the executed changes, so in case of error the document should be unchanged.

It wasn’t as easy as I thought, so I hope it will be functional :slight_smile:


@AniruddhaMohod, update:

@JohnSUN noticed the bug dim p(o to 30000)dim p(0 to 30000).
I also made two “cosmetic” changes:

  1. faster while…wend instead of do until (there isn’t a reason to jump from loop).
  2. oDoc.unlockControllers without if oDoc.hasControllersLocked() then for successful run of macro.
    sort-the-blocks-in-text-QuickSort.odt (41.3 kB)
3 Likes

I agree with you on that! :scream:
it looks like you has done a better job on keeping the original structure and formatting on this messy Source!!

You are simply great.

In India in every court LibreOffice is used. In my court I used to type the order sheet in English. In some courts the order sheet used to be type in Marathi. In Marathi Roznama (Order Sheet) the beginning of the block is [:print:]{1,100} यांचे न्यायालयात and for Case No. प्रकरण क्रमांक: is typed. I tried to replace your IN THE COURT OF with [:print:]{1,100} यांचे न्यायालयात and Case No. with प्रकरण क्रमांक: but this time I did no succeed. Kindly guide me so that the blocks in Marathi Language will be sorted yearwise.
Marathi Roznama.odt (58.4 KB)

Previously with your help I managed to give Sr. No. before the case no. Some people will also need that the blocks should be sorted according to the Sr. No. Attached herewith block with Sr. No.
Sort Roznama Serial Number Wise.odt (62.8 KB)

for Marathi Roznama:
Unfortunately I’m absolutely not able to recognize the words in Marathi script, but I only changed the constants str1 and strNum and it seems functionally.
edit1KL - Marathi Roznama.odt (60.8 kB)

I see only the squares on screen instead of Marathi characters (and don’t know why, the devanagari fonts I have installed), so my changes are in ODT

1 Like

for Sort Roznama Serial Number Wise:
What is Sr. No.? Is it the code like MHAM180002292017 or something else?

The number before case no. I managed to put the serial number before case number because of you.

Sr. No. before case no
The 1) before the case no.

And if the Sr.No.) is missing? (at 1st page between 19) and 20) is missing)


Show some msgbox for missing one?