How can I stop - (i.e. hex 2d) being erroneously converted to hex ad on export to pdf?

asked 2021-03-12 20:05:23 +0200

Chris Austin gravatar image

updated 2021-03-12 20:53:10 +0200

I am using LibreOffice Version: 4.3.3.2, Build ID: 430m0(Build:2). There seems to be a bug in Export to pdf. Ordinary hyphens, i.e. - (hex 2d) get erroneously converted to hex ad, on export to pdf. They LOOK all right in the pdf document, but if you select text including such a hyphen in the pdf document, and copy it to clipboard, you get hex ad instead of hex 2d. This shows up if you try to copy and paste a url containing hyphens, properly hex 2d, into a browser address bar. I have also checked using a hex editor, specifically KHexEdit, that the dashes which should be hex 2d are hex ad. I have also done the copy and paste from pdf operation in 2 different pdf viewers, namely KDE Okular, and also in the Evince document viewer, which is a Gnome type of document viewer. I have URL Recognition UNCHECKED in Tools | AutoCorrect Options, but this is not actually relevant, since the problem occurs for hex 2d hyphens that are not in URLs, too. I have tried unchecking Replace dashes in Tools | AutoCorrect options, but that does not seem to help either. I am using Debian GNU/Linux, Jessie, and the Trinity Desktop Environment, but I doubt this is relevant to the problem.

In response to the comment by ajlittoz below, I did actually check that if I copy and paste directly from LibreOffice, I do correctly get hex 2d for hyphens, that is both for hyphens that I type in myself, and for hyphens when I paste in a url from the clipboard. I checked that by copying and pasting from LibreOffice into KHexEdit. I have done that again just now for the url of this page, which contains lots of hex 2d hyphens. So the hex 2d hyphens are correctly hex 2d in the text in LibreOffice, the erroneous conversion to hex ad occurs when I do Export to pdf. Thank you ajlittoz for the quick reply.

edit retag flag offensive close merge delete

Comments

LibreOffice Version: 4.3.3.2, Build ID: 430m0(Build:2).

This is from Oct 2014 (EOL of 4.3 series of releases was May 27, 2015) - more than 6 years old . I doubt that you get a fix for that.

Opaque gravatar imageOpaque ( 2021-03-12 20:08:41 +0200 )edit

U+00AD is SOFT HYPHEN. It doesn't appear magically. You must type Ctrl+- to get it. Are you sure you didn't type this way?

To make sure, enable View>Formatting Marks and View>Field Shadings (sorry, I don't remember the menu commands for 4.3.3.2). Soft hyphens will show with a gray background.

Note: 4.3.3.2 is completely outdated. You should update. But it will not solve the problem if you entered soft hyphens instead of hyphens.

Please do not use Add Answer but edit your original question to enhance the details of your question (answers are reserved for solutions to a problem on this Q&A site).

ajlittoz gravatar imageajlittoz ( 2021-03-12 20:13:10 +0200 )edit

Thank you ajlittoz for the quick reply. I did reply by editing my original question, to add a reply to your comment at the end, but I found that the new paragraph I added is merged with the original paragraph, so the reply is not clearly visible. So I will add the reply again here: I did actually check that if I copy and paste directly from LibreOffice, I do correctly get hex 2d for hyphens, that is both for hyphens that I type in myself, and for hyphens when I paste in a url from the clipboard. I checked that by copying and pasting from LibreOffice into KHexEdit. I have done that again just now for the url of this page, which contains lots of hex 2d hyphens. So the hex 2d hyphens are correctly hex 2d in the text in LibreOffice, the erroneous conversion to hex ad occurs ...(plus)

Chris Austin gravatar imageChris Austin ( 2021-03-12 20:43:29 +0200 )edit

Please try to make your question "breathe". It ends up in HTML where lines are concatenated together unless they are separated by double-Enter = paragraph break. This could make it easier to read.

If it occurs only when exporting to PDF, update to current LO version in your distro (eventually update to a newer distro release) and see if problem persists. You won't get any serious answer with such an obsolete version.

ajlittoz gravatar imageajlittoz ( 2021-03-12 20:44:01 +0200 )edit

Thank you for the tip about double-Enter = paragraph break. I have corrected the revised question to use that. I am going to see if I can fix the pdf by using pdf2ps, then doing an automated find and replace of all soft hyphens by hex 2d hyphens in the PostScript, then using ps2pdf. Unfortunately Debian 8 jessie is now oldoldstable, and it looks as though the LibreOffice Writer version for jessie is still 4.3.3.2, they call it libreoffice-writer (1:4.3.3-2+deb8u13). Thanks also for the tip about U+00AD is SOFT HYPHEN. Possibly that might mean I could do the automated find and replace in the PostScript using KWrite, rather than KHexEdit.

Chris Austin gravatar imageChris Austin ( 2021-03-12 21:32:55 +0200 )edit