Converted PDF to both ODT and DOCX file formats, unable to apply Default Page Style without deleting white space

Converted PDF to both ODT and DOCX file formats, unable to apply Default Page Style without deleting white space first. So by that I mean the last text on a page might be followed by a bunch of white space but it doesn’t appear to have line breaks after that.

If I press delete after that last text on the page it will bring up the text that starts on the following page to that paragraph, and then I can hit enter for a line break and a new paragraph.

Prior to do this for each page however, if I try to select all the pages in the document and apply the Default Page Style with the Navigator, the Default Page Style won’t apply to that page. And of course I can’t delete any of the hundreds of custom page styles until I apply the new page style, in this case the Default Page Style.

I tried find and using \n and also enabled view formatting marks and I don’t see any page breaks, so I don’t think that’s the cause? Of course not sure. I can’t figure out what I am “deleting” when I press delete and bring up the text from the following page?

As always, OS name, LO version and save format (here both odt and docx) are valuable bits of information.
Have you drastically clipped your screenshot? It looks as if you have no margins in your pages.

Tell us more about the conversion process from PDF. Is DOCX the primary output format (from which you create an odt document)? Which program did the conversion?

For best assessment of the problem, attach a 5-page sample (not more) of the converted document.

This is to be expected if you work on an .odt created from DOCX because DOCX has no notion of page style. Depending on the PDF conversion process, pages can be considered as independent from each other (highly likely because PDF is not a flow-oriented format but a display one) and a hard page break is imposed between pages.

… and share a sample PDF file and a sample ODT file after conversion. How/Where did you convert it (online? locally?).

Thanks!

I don’t remember how I converted it, since I tried a few different tools as I recall. I believe it might have been Adobe’s online convert to Doc tool. I saved it as an ODT directly from a DOCX file. Unfortunately I have not found a free PDF to ODT converter that seems to do a good a job converting as does Adobe’s online tool - can you recommend one?

I am using windows 11 and Libre v. 7.5.0.3 (x86_64).

The Left and Right Margins were set to 0" which is why I am trying to apply the “Default Page Style” to the entire document - so I can fix the margins and loose the old custom styles and be able to delete them.
image

The Screenshot is 1 full page and the first few lines of the next page.

I suspected something along the lines of hard break having been added between pages. Is there anyway to delete that without having to go to each page and move text around manually?

When I convert from pdf I always copy the text from the output document and Paste as unformatted text into a new document and apply Text Body to everything. I then go through the new document and apply Page styles, Heading styles, character styles, etc. based on the appearance of the original or as I prefer.

The fly in the ointment are lists which then replace numbering with a simple number and spaces. There are some workarounds such as pasting the formatted list over the unformatted list and then replacing the imported style. I still find it quicker and easier to apply formats to simple text than to try and repair hundreds of different formats.

Sometimes doubled unusual spaces are inserted so I also do a Find and Replace with Regular Expressions ticked Find \s\s and Replace with a single space

Thanks! So when I pasted as unformatted text I lost all the images, but I did cut and paste with RTF and that enabled me to delete all the custom page styles and just apply a default page style. I am left wondering though what “element” or “object” was lost when I pasted to RTF that was preventing me from applying a singular default page style and deleting those other custom page styles to begin with.

Who know what vagaries OCR software gets up to? I have a very accurate OCR but it is 20 or so years old now, I assume the new version would be better but maybe not better enough.
Here are just some of the styles it generated from 11 typewritten pages, so all the same font exactly, except some double-strike and underlined characters. This was without saving the page layouts which I just won’t do.

I prefer to get images from the original document rather than the OCR processed version as I think there might be some processing of images during the OCR. Also, I can get the document tidied up before trying to introduce images because images need thought when it comes to layout.