Pasting HTML fails after one page

LIbreOffice Writer, current version. Just like it says in the title. When I try to paste html, as html, the paste fails, and no content is actually pasted after the first page break. I tried both options.

Plain text works, although that’s rather useless, unless I want to spend hours reformatting text as a document.

What the heck?

EDIT: Current as in… when you click the update button, it says ‘You are current’.
Win7, x64

HTML as in… pasting in HTML from an HTML page. Specifically, the central column of this page:

…but there were about 3 other random pages that I tested that all did the same thing.

EDIT 2:
Version: 7.0.3.1 (x64)
Build ID: d7547858d014d4cf69878db179d326fc3483e082
CPU threads: 8; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL

EDIT 3:
I don’t think there is a solution. I tried to tidy up the code with HTML-tidy, W3C Validation, and NVU’s grandchild, BlueGriffin. I also discovered that the Chrome developer tools no longer have any breadcrumbs of those tools left in it (that I could find). Absolutely nothing automagically repairs the HTML so that Writer renders the HTML correctly.

We either A) need to aggressively attempt to import HTML and render it properly. or B) aggressively attempt to get everybody on the planet to generate compliant code.

I don’t see either one happening.

Perhaps if we could find some way to get the rendered output out of a browser or html editor that renders it porperly, and imported THAT instead of the raw HTML? DIdn’t there used to be a tool that did that back in the day?

Oh, additionally, I have absolutely no CLUE why Hagar’s machines import more of the page than mine. That’s a head scratcher right there.

What is “current version”? It has been ported on many many OSes and ports don’t proceed to the same pace. So, what is your OS? your LO version?

“paste html”: do you mean you paste a fully functional block recognized as HTML or an HTML-looking text sequence? How and from where did you copy the sequence? What do you get after pasting? Some text saying <h1>Heading</h1><p>paragraph</p> or fully formatted paragraphs with styles attached?

To provide this information, edit your question. On this site, answers are reserved for solutions.

EDIT: Current as in… when you click the update button, it says ‘You are current’

This does not provide additional information. Please paste Help -> About LibreOffice information.

Don’t try to paste HTML pages from web sites, you won’t copy reliably the formatting, all the more when the original formatting does not follow recommended HTML good practice.

  • In the example link, the page is erroneously formatted exclusively with <br> tags instead of subdividing text into paragraphs with <p></p>.

  • The list is manually numbered instead of using <ol> <li> … </li> … `.

  • I could also mention the invalid use of a <table> to display the pictures aside a block of text which forms logically a single flow with the text below this “2-column part”.

  • There are also strange uses of <h2>.

With such a badly formatted HTML page, don’t expect Writer to sort things out for you. The only thing it does correctly is to render bold words unfortunately with direct formatting which is never recommended. The best you can do is paste as Unformatted text and reformat manually to restore a “normal” text (see the very important first item I mention).

What is the “first page break”? One you added yourself? There is no page break in HTML. Do you mean the limit of the page in Writer?

In this case, depending on the limit of the copy, you may have included the infamous table in the web page. By default, Writer table cells do not cross page limits. You must enable this manually with Table>Properties after clicking on the cell.

Once again this will require manual action.

Don’t expect visual fidelity. HTML and ODF (the standard upon which Writer is based) are founded on different principles. They allow roughly the same formatting appearance (with more typograhical features on Writer side) but this need manual customization.

To show the community your question has been answered, click the ✓ next to the correct answer, and “upvote” by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

In case you need clarification, edit your question (not an answer which is reserved for solutions) or comment the relevant answer.

PS: How do you expect us to guess your LO version? When you click the update button, a lot of “local” information is sent to the server, notably your OS version, which then selects what it thinks to be the “best” answer. Therefore what is current for a user computer will not be the same for another user computer.

Pasting HTML doesn’t work.

You’re not pasting the RIGHT html.

Seriously??

For the record, I was using a ‘reading mode’ extension to get a ‘pretty print’ copy, then copy and paste the html. Still, html is pretty old in the tooth, LibreOffice ought to do better than THAT when trying to paste it.

Also, I tested it with a few other pages as well. Same results.

Yes, I meant the end of the LibreOffice first ‘page’. I copy and paste perhaps 8 pages of content, and LibreOffice chokes after page one. On page two I get a residual blank column or some such thing, and nothing else.

Apologies, I figured ‘current’ was current. I didn’t know that it was a multiple choice question.

Have you fully read my explanation and pondered about the consequences?

  1. The source HTML is buggy IMHO because it does not follow the recommendations in HTML 5 or even HTML 4.01. HTML recommends semantic markup. Here we have a single text block with line breaks <br> and list simulation with manual numbering and <strong> markup (terminated with <br>).

    With such a botched HTML, Writer or any other “smart” app can do nothing but what is presently done: render this as a single paragraph, because this is what it is in HTML.

  2. Images which are supposed to illustrate some text should not be inserted via tables but use the common <img> element which position is controlled by CSS properties, notably float. This can then be translated as a frame in Writer and be anchored properly.

  3. Usage of HTML table for large cell content will cause, by default, clipping of content at page boundary (what you exprience). Needs many fix to allow to cross boundary.

Of course, when you switch to Web mode, the last inconvenient disappears. But I assume that you really want to make some memo for yourself which ultimately can be printed. In this case, you must get a correct “usual” text document, i.e. make some manual adjustments. Unfortunately, your source HTML is rather buggy.

No, I didn’t consider it. You’re making excuses, ajlittoz. I installed five different ‘Reader Mode’ extensions from the the Chrome store, and all of them made the page look cleaner and better.

NONE of them made the content look like a webpage got sick and barfed all over the first page.

Plus, like I said, I tested it with three lengthy blog-style websites, and all three of them choked on the first page, in both HTML pasting modes. I was PRETTY sure I’d found a bug. Seems like a bug to me. I didn’t realize that it was a ‘feature’.

Honestly, I expected to have to manually re-arrange the text to columns beside the images in the first blog post. That’s why I was using the ‘Reader Mode’ extensions. I didn’t expect the whole mess to fail THAT miserably, and I’m still losing a bunch of content when pasting, even if I check web view.

I GUESS I could download a bunch of Reader Mode extensions to try to do better? Perhaps I need to bust out some old HTML editors?

Why not save the web page locally on your computer directly from your web browser? Later, all you need to do is double-click on the saved .html file to open it unaltered in your browser. This way even poorly coded HTML is displayed correctly without pain.

Now that’s an easy question to answer! Because Microsoft SUCKS! You see, back in the day, in order to try to enforce their monopoly on web browsing, they diverged away from core HTML coding and created their own branded coding that only works in Microsoft products. W3C compliance splintered and fragmented, and the internet has never been the same. The original Firefox browsers went a LONG long way to bending the mass market back TOWARDS W3C compliance, but it has always stayed splintered, mainly because Micro$oft refuses to give up their death grip on their branded ‘features’ that keep the global efforts splintered.

SO… Any HTML page you save today in any given browser may or may not render at all in another browser today, and will probably be useless in a three years or so. If you’re going to archive HTML pages, be prepared to keep eerything ‘in a closet’ with offline utilities that can render the pages unassisted.

Also, I specifically need to form these in to PDF documents. to use