Pilcrow 'open' changed my spacing? or What causes this bug i'm having?

You mentioned in a previous post you “imported” (pasted) bits from a DOC(X) documents and Word styles were added. This contradicts your statement about pasting as unformatted which would have inserted text only without formatting. Formatting in DOC(X) is based on different technical primitives with require conversion to be handled by Writer. This generally results in overlaid direct formatting for the primitives without basic equivalence.

“Importing” from alien sources without precaution is the beginning of document damage or “pollution” in the benign case. You should look on this side. If the “pollution” is rather light, it can be fixed and spurious styles removed. However if the “pollution density” is too high, the only cleaning solution is to paste unformatted in a blank document and restyle everything.

if so it’s a typo. i usually import unformatted, but this time i just copy pasted ‘as is’ to keep the subscript and superscripts intact and then i changed the style in the current document. sorry in case i made a typo and i am confusing things more (i redrafted my response several times in order to reply to everyone lol. this stated here is the correct anyway :point_up:)

this is my suspicion. that’s why i said that the previous formatting (form the template i was using in a previous project) were still visible on the side bar even though i am technically not using them.

yeah, this i will avoid until finished. so far the bug haven’t reappeared at all since you asked me to test Ctrl+M and i opened a different file etc. i fear the issue must be the ‘Default Paragraph Style’ i use, because in between writing sessions i may open older projects or different ones (odt and doc) so if the default there is different than the default i use then it would impact one another as you said, correct? only thing i am not sure is if this is triggering the bug or not (i cannot reproduce it and frankly i don’t wish to try on purpose to bug it). All this combined with the large file, the mess from previous copy paste and the zotero direct formatting. :smiling_face_with_tear:


update: the bug just happened again!

as i was writing my response above the power went out and my computer shut down. when i re-opened the computed the LO prompt asked me to recover the file. after seeing that unfortunately i lost about 0.5-1 hour of work, i noticed the open pilcrow again! so this time i close the file and open a previous version which was fine. then i reopen the current version and the pilcrow was normal again.

since i am experience power cuts this month, could it be related? i haven’t opened any other file on purpose, so i guess the ‘default paragraph style’ hypothesis is less strong now, or at least not as evident to me now.

so bug happened, fixed by closing and opening, the only thing change was the sudden shut down. as work as the temporary fix works it’s fine for now until i finish this project. any ideas on what may be causing it after this new info is welcome.

In Word, there is a feature of “paragraph mark formatting”, originally missing in Writer. That feature affects many Word-specific features, in very complicated ways; e.g., a paragraph mark may be made hidden, to effectively create a “merged” paragraph, consisting of two others, which themselves can have e.g. own numbering.

In 2019 (version 6.4), commit 5ba30f588d6e41a13d68b1461345fca7a7ca61ac implemented an initial, run-time support for that formatting - which meant, that we read it, stored internally, used where needed to correctly display the document, and wrote it back to DOCX - but we didn’t store it in ODT.

In 2022 (version 7.2), commit 6249858a8972aef077e0249bd93cfe8f01bce4d6 implemented saving of the marker formatting into ODT. That was done using a special empty text span at the end of the paragraph, which has own autostyle, representing the marker formatting.

(In 2023 (version 7.6), commit 1a88efa8e02a6d765dab13c7110443bb9e6acecf changed how it’s done - but that’s uninteresting for your 7.4.)


Why am I writing this? My suspicion is, that the feature may play some role in your case. There still is no UI to manage the stored paragraph mark formatting (even in the latest version) - meaning that you can’t add them anew (the only way to get them is from a DOCX), you can’t edit their property, and you can’t delete them using the program dialogs / controls, so the only way to get rid of it is to edit the XML.

If you feel like testing this, you may edit the ODT’s content.xml, to replace the following regex:

<text:span text:style-name="T\d+"/></text:p>

with

</text:p>

This removes those empty trailing spans from the paragraphs. I don’t know if it makes any difference for you, just a blind shot.
Indeed, making such changes must be done, only having a backup of your document.

1 Like

thank you deeply for the detailed comprehensive response.

unfortunately i know little to nothing on this. thus i don’t know where it is/how to access this, and i don’t know if this is a system wise / LO installation file or a document specific file. meaning i am not sure i can tinker it safely mid-project without causing further issues (due to incompetence or miss-clicking (on my part)

if i understand correctly that means the copy-paste from the formatted docx i did month ago possibly “contaminated” my document with “paragraph mark formatting” from the source microsoft template, correct?

i am positive on trying this but i have some questions.

  1. if i do this in a back-up file it won’t affect my LO installation or other documents, correct?
  2. i don’t know where to find or how to access the xml. if it’s something easy to find online i will look it up but if you have a comprehensive guide handy i will prefer your guide over any other i would find on my own (is this thread relevant?)
  3. i guess after opening the xml i should first check if the document is “contaminated” with the “paragraph mark formatting” line before trying to replace anything? does this make sense, and if so should i just search for the <text:span text:style-name="T\d+"/></text:p> ? or is this expected to be present in the xml either way so there is no point of first checking and reporting back here before moving on replacing?
  4. finally depending on the answers above i would like to also understand what the replace does, because from my limited undestanding i get that <text opens the command and </text closes it. so removing the opening part (with all the formatting inside) would be ok leaving the closing </ on its own, or i guess the replacement does something more? (i understand my confusion is a product of my partial or even distorted knowledge, so if it’s too technical or irrelevant based on the responses in points 1-3, then please do not attempt to explain point 4 to me. i trust your word as a dev and it’s not that important for troubleshooting, but mostly a curiosity on my behalf ^^)

thanks again for your time :v:

Yes.

That thread is not good for you. Using a Flat ODT would actually look like a nice option - but (1) FODT has its own set of bugs, so it is possible to loose something when converting to FODT; and (2) the resulting FODT from your 100MB-ODT would be really huge, so many tools would fail on it, or at least be painfully slow.

An ODT is in fact a ZIP file. Having your File1.odt, you may simply rename the file to File1.odt.zip, and then open it using any ZIP archive manager. That will allow you to see the content.xml right in the root of the archive. The idea is to extract it, edit it, and then replace the old version in the archive with the edited one. After your modifications, the updated File1.odt.zip should be renamed back to File1.odt, to be usable by Writer.

Why not check first? That would be a reasonable curiosity. The “free from paragraph marks formatting” documents are not expected to contain those. Open the XML using a plain text editor (like e.g. Notepad++), and search for that string, but make sure that you check “use regular expressions” (or similar wording) option in the search dialog.

The pair of <tag> ... </tag> is only required when there is the .... When the element is empty, i.e., there is neither content, nor sub-elements, in it, it may close itself right away - which is done using the trailing />, as in the discussed case, when the element (whole of it, auto-closing itself, and needing no separate closing pair - its presence would be a hard XML error in this case) should look like

<text:span text:style-name="T1"/>

immediately prior to the closing tag of the enclosing paragraph (text:p element).

1 Like

it seems my file is not contaminated, am i reading this right?

step-by-step process i did: renamed with .zip → extracted the content.xml → opened with mousepad (a simple txt editor) → ctrl+F the expression <text:span text:style-name="T\d+"/></text:p> → the result was 0 occurrences (pictured above) :thinking: :person_shrugging:


PS: if you wish and the content.xml is not privacy sensitive i can link it/upload it here or in PM for you to check (file size 4.9 MB)?

No. You didn’t enable regular expressions.

content.xml contains most of the text in your document (not images, not headers/footers, but all the rest).

don’t know how i missed this :sweat: thanks a ton for your patience lol so there are 111 occurrences! my document is polluted for sure hehe. they become 0 after replacing successfully

– 111 regex occurrences –

– find and replace successful –
find and replace successful

replaced the content.xml in the zip and renamed to odt. the cleaned document opens okay, zotero references work correctly. it seems everything is fine and the document is fine (or until it isn’t again? lol) since i cannot replicate the issue only time will tell. the title links, image captions etc all seem fine so if this was the source of the issue then it’s unaffected by zotero and related to the microsoft template copy-paste i did early on :sweat_smile:


Meanwhile a new clue/another thing i noticed today, that may or may not be relevant:

when writing on the file, after saving by ctrl+S if i click back to any line of the main text and check on the side bar the spacing is 0.25 instead of 0.14, and stays like this until i click on a title (which has different spacing). then when clicking back on the main text the spacing on the side bar says 0.14 alright. this happened consistently on the ‘bugged document’ even without the “pilcrow bug” happening as well, i.e., the pilcrow was ‘normal’ and the bugged document length/visible formatting was as expected when the side bar displayed the 0.25 spacing and after ‘fixing’ it to 0.14 as well. interestingly this also happened in the current cleaned document (where i removed the regex as mentioned). so is this a bug or just a normal thing, e.g. a LO “lag” after save? if it’s a bug could it be related to my other issue? i can reproduce this anytime by saving the file, so if it’s any relevant i can send image proof and/or more info, or file a bug report if it isn’t something normal/known already.


PS.1: i won’t mark your response as the solution yet, until i work a bit on the new document, close reopen restart the pc, open with other documents, etc, so that we make sure that this was it :v: again thanks for your patience and help


PS.2:

thanks a lot for your reply. didn’t notice the / at the end of the expression since didn’t know they can self-close, so that explains it clearly :+1:

It’s easy: I added it in an edit.

2 Likes

lol thanks for clarifying :laughing:


:exclamation: Anyway i got cocky and fucked around to find out :innocent: and managed to reproduce the bug!

I opened the cleaned odt document and then i opened a different docx project file from a colleague. i typed a space in their document and went back to check on my own cleaned odt. an behold the bug is there again. at least i know how to reproduce it now, and it’s possible that it is related to other docx in a way i cannot imagine. image/proof/info of the reproduced “pilcrow” bug:

  • the formatting bug
    image

  • our friend the ‘open’ pilcrow
    image

  • both
    image

  • here i entered a paragraph then deleted it so it displays “normally” now, but it’s still bugged in all other aspects
    before
    image
    after
    image
    notice the spacing is “0.14” based on the side bar info
    image
    yeah sure buddy… i can see it’s wider than before lol
    image
    *additional info: the doc file i opened to reproduce the bug has a 0.16 spacing so it makes sense if formatting is transferred by the ‘Default paragraph style’ from the doc to the odt as @ajlittoz implies here and i further discussed here :+1:

interestingly the bugged file page numbers is not bigger much bigger than before the bug (now the page count is about 160-165 after working on the doc yestarday) it may have to do with the fact that the lines are cut instead of widened in spacing

also interestingly the doc file i opened to trigger the bug is unaffected (until it isn’t lol)
image


follow-up: i closed without saving both files, opened an older version of my project (which was fine - no pilcrow bug), closed it and opened the version i bugged and it was also fine. i checked its content.xml and contains 0 of the <text:span text:style-name="T\d+"/></text:p> regex, so it may not be related after all :thinking:

image