Is there a "fool-proof" way to detect a new page in content.xml?

We need to parse the content.xml of a document, and for our purposes we would like to be able to programatically know when a page starts, and when a page ends.

Through trial and error we have deduced 5 separate ways that a page may be flagged as a “new page”

  1. A node with an attribute named “text:style-name” and the value of the attribute is “body” with no “text:span”

  2. A node name of “text:soft-page-break”

  3. A node with an attribute named fo:break-before

  4. A node with an attribute named fo:break-after

  5. A node with an attribute named style:page-number

Yet this does not seem to be fool-proof, and occasionally there will be a page that does not fit into these criteria, or it fits only one of the five.

Is there some universal method that we could use to detect a new page? I’m sure there has to be something because LibreOffice and other programs seem to be able to do it.

Is there something special that we should be doing like (CNTRL + ENTER) on documents to signify a manual page break? That would be the least desirable solution as we would have to go back and edit many documents by hand.

Like-wise I have not really been able to find any documentation about LibreOffice that might shed some light on this

Anyway, if you guys have any ideas let me know!

For which purpose would you want to parse the XML? There may be other solutions if you describe the ultimate goal.