Writer: Unwanted Page Breaks before Linked HTML Sections

Hi, folks,

I am fairly new to LibreOffice but have a complex problem that
I have not been able to resolve with net-research alone. I am
creating a Writer document with generated regions. This is a
print document, not a web page, but the generated region are
in HTML. They are created by C# software that I have written.

Writer’s “Insert Section” feature is incredibly useful for
this. I create a link to the HTML file for each generated
region, and Writer can update them whenever the HTML files
change. This is exactly the behavior I need.

The portion of the document containing the generated regions
has two columns. I want the text in the generated sections to
flow with the text entered in Writer, but instead Writer begins
each HTML Link Section with a page break. As each generated
region is only a paragraph or two long, this is unworkable.

The page break is part of the read-only section. Strangely,
though, I can’t delete it by pressing delete just before it
but I can delete it by clicking on the dotted line and selecting
“Delete Page Break” from the context menu. If I disable the
read-only protection on the section and select “Edit Page Break,”
I can see that it is inserted as a result of a paragraph style
that has “Insert Page Break Before” marked:

Screencap

I can unmark it and the page break goes away, but it comes
back whenever the section-link is updated. Investigating the
Content.xml file in the ODT archive, I find that the offending
paragraph style is called “Sect5”, but that doesn’t show anywhere
that I can find in the Writer GUI – I think it’s auto-generated
by “HTML (StarOffice)” import filter:

XML Screencap

I noted that imported HTML is hardcoded to the “HTML” page
style, so I made sure that the part of my document with the
generated regions has this page style as well. Despite this
matching, I still get the page break.

I have tried using “soffice --headless --convert-to” to
change the mini-HTML files into other formats such as RTF or
DOC before linking them in the hopes this would fix the issue,
but it doesn’t. Strangely, Writer still uses the “HTML” page
type for the linked section despite it being RTF/DOC.

I have complete control over the HTML generated thanks to
it coming from C# code I write, but I don’t know anything
about the internals of how tags are imported. Are there any
tags or techniques I can put in the HTML that will prevent
this behavior?

A tangential question: is there any “backdoor” way to access
Writer’s styles or add index entries to HTML destined to be
imported into Writer? I know that “(h3)” maps to “Heading 3”,
and believe that’s hard-coded, but sadly CSS classes do not
seem to map to styles with the same name (which would sure be
useful).

More broadly, can anyone with more experience suggest a better
approach for this general desire to include generated regions
that flow with the text entered in Writer but have markup
simpler than trying to generate ODT files with C#? I don’t need
to generate HTML specifically, but I need to know something
else is going to work better before I put in the effort to code
for that.

EDIT: Loaded linked images onto AskLO server and made links local.

I don’t have an answer for this yet as there do appear to be (as you indicate) a number of problems with the manner in which linked sections are handled, within the context of HTML source files. Whether this is due to the indicated filter or the handling of the HTML page style I am uncertain.

I don’t have and answer neither, but I had similar problem linking documents in a Master document (which are introduced as section). Some sections get inserted in the line beneath the cursor position, and some forced a page-break, with no aparent cause. It was resolved making an exhaustive garbage format cleaning of the text, but if your text is born from your C# routines, it’s suposed to be exactly formated ¿isn’t it?

I have added a bug for a similar question at

https://bugs.freedesktop.org/show_bug.cgi?id=71098

I now believe that this issue is a hardcoded rendering one. There is nothing that I can see in the XML to suggest that the inserted sections (HTML or otherwise) should include a prior page break, yet they appear to.

Investigating the Content.xml file in the ODT archive, I find that the
offending paragraph style is called “Sect5”, but that doesn’t show
anywhere that I can find in the Writer GUI – I think it’s auto-generated by
“HTML (StarOffice)” import filter

Yes. This is correct. It is similar to a direct formatting style i.e., there is no corresponding entry in styles.xml