Error! Reference source not found converting doc to pdf with writer

HI,

I am using your api to convert msword docs to pdf. The msword doc has bookmarks in it.
When I convert I get: Error! Reference source not found, on all the bookmarked fields.

Is there a flag or method call to get those exported with the conversion to pdf??

Thanks,

Eric Clarke test-doc.doc

What happens if you open the .doc manually into LO Writer? Are the same errors displayed? If so, are you sure the bookmarks are really defined? If you only have references to absent bookmarks, nothing can be done.

Does the same .doc open correctly in M$ Word?

By the way, OS name (assumed to be Windows) and LO version.

The screenshot is useless. Attach the original file (reduced to 1 or 2 pages provided it still exhibits the problem).

To attach a file: edit your question (you can’t attach to a comment), enter at least 2 blank lines at end, use the “paper clip” tool to select a file.

I can’t upload because the file contains PII and Hippa laws prevent me from sending. If i remove the PII the all the document references disappear. I just need to know if you api and a flag or method, that you know of, that can suppress the references or not update when we open the file programatically??

Can you try this file. I changed most data and kept the references. Let me know if this works

file is attached: 0A34339F-8A33-436E-84C4-039DA42C7782.doc

Updated link: test-doc.doc

I open your sample file in Writer (because I have no Word)

I get Error: Reference source not found.

The Navigator (F5 or sidepane) shows no bookmark, which is confirmed by looking at the bookmark dictionary with Insert>Bookmark.

The target bookmark is NG_MACRO. With such a name, you may have attempted to define the bookmark with a macro and the macro dictionary is also empty.

Macros will not be converted from .doc to Writer because the languages are not not the same.

If you’re looking for a workaround, edit your question to describe the intent of this bookmark reference (from a user point of view, i.e. it should echo such part of the document where the data is created in this way). Give the goal of the document and how you use it. This will help to understand your workflow and to suggest an alternative.

EDIT 202-09-14

I more carefully had a look at the attached file with an hexadecimal editor. What we find in the binary .doc is something like

 Date of call # REF NG_MACRO "STANDARD" "tc_date" # 08/02/2016 #

where I use # to represent various binary bytes. The binary bytes are likely to be function encodings for the strings which follow. I am not familiar with DOC format but it is likely that the field/bookmark name is NG_MACRO containing a value of type "tc_date" with "STANDARD" formatting. Last updated value is stored here as 08/02/2016 in case it is not available or not updated.

I opened the .doc file in Writer and saved it as .fodt. I examined the resulting XML with a text editor. The field is translated as:

 <text:bookmark-ref text:reference-format="text" text:ref-name="NG_MACRO">Error: Reference source not found</text:bookmark-ref>

Note that it is considered as a bookmark. A cross-reference would have been translated as text:reference-ref. The formatting code is not kept, nor the type. After all, a bookmark is a shortcut for a location in the document and can’t have a time value while a field can:

 <text:date style:data-style-name="N37" text:date-value="2020-09-14T16:43:21.129685132" text:fixed="true">09/14/20</text:date>

In this example, note that type, formatting and last used value are clearly mentioned.

What I didn’t show is how a bookmark reference is translated in XML. An ODF reference to a bookmark doesn’t cache the bookmark target (why should it because the bookmark is supposed to be defined in the same document?)

My opinion is the original document erroneously used the bookmark feature for a field reference (such as current date of insertion, fixed = not updatable). Writer will not transform a bookmark ref into something else, even if DOC data may suggest otherwise to us humans.

Another possibility is the original file is incomplete: some part containing the bookmark is missing. Writer cannot regenerate the missing part. Who could?

It it opens correctly in Word, you should try to convert to PDF from Word itself. If you can’t do it from Word because it has no PDF export command, the workaround is to install a print-to-PDF printer driver. My Linux box has a generic “Print to File” driver which can print to a variety of formats, including PostScript and PDF. There is a trend presently for printers to adopt PDF as a native format. Consequently, most printer drivers also accept to generate PDF files. All you have to do is intercept the file in the queue before it is sent to the physical device.

To show the community your question has been answered, click the ✓ next to the correct answer, and “upvote” by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

In case you need clarification, edit your question (not an answer which is reserved for solutions) or comment the relevant answer.

Thanks for the reply,

These word docs are sent to us by a client. We have 25-30 million of these. Our intent is to use the libreoffice api and convert each one to a pdf. Unfortunately, we get the same error. Does the libre office api have anything to remove the bookmark (NG_MACRO) programmatically before we call the convert method to conver to a pdf???

I’ve never written a Writer macro, preferring to focus my attention on styles. Hope for macro gurus to come by this question.

Meanwhile, if you have Word, try to see if the document contains macros and what they do. This could give a hint about getting rid of the bookmark before converting.

I work with beachrunner and just wanted to try to clear up some confusion. we don’t care about the macro. if you open the file up in Notepad, you can see the value has already been calculated and saved in the document before it was sent to us. we want to IGNORE the reference/bookmark/macro and simply display the value that is already in the document instead of trying to reference the bookmark/macro that does not exist.

Then attach a meaningful sample file. The one provided in the question is reduced to the cross-reference to the missing bookmark. The sample file should be representative if its usage, i.e. if I understand right, contain the “value” in addition to the faulty field. Waiting for it to experiment.

the attached file does have the date.

here is a screenshot, comparing libreOffice, Word, and what the contents of the file look like in notepad:

Also, we are using the libreOffice API via Java to do these conversions. there is an option called “UpdateDocMode” that can be set to “NO_UPDATE” (i.e. a value of 0). We figured this would cause the reference to not be looked at and use the existing field data - but it does not seem to do that - hence why we came to the forums.

@ajlittoz - were you able to see the issue? any ideas?
edit: i just noticed your edit my apologies -
cannot just print to pdf. i am trying to make an API that does conversions. This project is over 50million documents that have this issue. I was using libreOffice to help. it looks like we may have to go with a different converter as most converters use the exisitng value just fine libreOffice is one of the few that doesn’t work. the problem with a lot of other converters however is that they do not look very good (i.e. tables and other text formatting doesn’t convert well), which is why I was really hoping libreOffice would work. Especially with the “NO_UPDATE” option, as that appears to be its purpose - but perhaps wasn’t extended to bookmarks. I may try to open an issue ticket to see if its something the devs would potentially add.

Thanks for your help.

Have you considered my suggestion to use a print-to-file driver which would produce a PDF file without a dedicated converter? You request Word to print the document “as usual” and the driver would do the conversion on the fly. This way, formatting is done by Word, guaranteeing correct positioning. Another approach, again with a print-to-file driver, would be to create a PostScript file and use a PS 2 PDF converter.

Requesting Word to print documents can be done through scripting.

It opens fine in msword. Attached bookmarked_page.docxis the screen shot of the bookmark. If i remove the bookmark name then i get the same error in the word doc.

It shows the same errors in libre writer.

The fields were updated before the .doc was saved- so we just want to open the doc without it trying to update from the references and display the existing/current values

Is this possible with the libre api??

Please note for the future:

  1. The answer box is reserved for answers,
  2. For communication use the comments.
  3. To add addional information / clarification please edit your original question.



    This helps keeps the site usable for everyone.

    Thanks.



Ask/Getting Started - The Document Foundation Wiki
https://wiki.documentfoundation.org/Ask/Getting_Started