How to disable Asian Typography option "Apply spacing between Asian and non-Asian text" when using command line to convert file to pdf

fangzheng · June 11, 2024, 3:04am

I need to use command line to convert some files to pdf. They contains Chinese and English, I found that in the output pdf files, some spacing are added automatically because of the feature “Asian Typography”.

I want to know how to disable this option “Apply spacing between Asian and non-Asian text”. I need to keep the output pdf the same with original file.

mikekaganski · June 11, 2024, 3:11am

If the original file didn’t contain such a feature, PDF conversion must not add it. Adding it in this case would be a bug. Please file it, attaching the sample document that you know should not contain that feature, but that exports to PDF with that feature enabled.

fangzheng · June 11, 2024, 7:02am

Thanks for you reply. The original file is a html (readme.html) as follows, sorry I can’t upload it:

<html><body>
<h1>打包流程</h1>
<p>手工触发一个pipeline</p>
<h1>使用方法</h1>
<ol start="0">
<li>
<p>初次部署：执行initialize.sh 创建docker secret.</p>
</li>
<li>
<p>将artifacts下载下来，可以看到包含该部署包需要的images.txt以及用于下载镜像的脚本</p>
</li>
<li>
<p>在公司的某一台服务器上执行export_images.sh 会将所有的镜像导出到Docker_images目录</p>
</li>
<li>
<p>重新将安装包和镜像压缩成tar</p>
</li>
<li>
<p>将安装tar拷贝到场内</p>
</li>
<li>
<p>执行load_images.sh 导入镜像</p>
</li>
<li>
<p>执行push_images.sh <docker-reigstry>:<registry-port> push镜像到客户环境中的registry中</p>
</li>
<li>
<p>按照客户环境的具体配置，配置一下对应的参数(请查看各个模块下的values.yaml配置，添加上对应的配置)</p>
</li>
<li>
<p>helmfile中配置的namespace 是sage-gpt, 需要创建一个docker secret(假设docker-server的用户名和密码是testuser/testpassword, 默认部署脚本的docker registry的密码是testuser/testpassword)
<code>kubectl create secret docker-registry docker-secret --docker-server=&lt;docker-reigstry&gt;:&lt;registry-port&gt; --docker-username=testuser --docker-password=testpassword -n &lt;namespace&gt;</code></p>
</li>
<li>
<p>准备环境变量:
<code>export DOCKER_SECRET=docker-secret</code>
<code>export DOCKER_REGISTRY=harbor.4pd.io</code></p>
</li>
<li>
<p>执行helmfile apply 更新客户环境 或者使用helmfile sync 更新到客户环境中</p>
</li>
</ol>
</body></html>

The command line is

soffice --convert-to pdf readme.html

The output pdf is readme.pdf (50.8 KB)

You can see before “initialize.sh” a spacing is added in the output pdf.

ajlittoz · June 11, 2024, 7:22am

@mikekaganski will correct me if I’m wrong: your file is HTML, i.e. a non-native format for Writer though there are provisions to handle it. Such a file needs first a conversion to internal representation. HTML contains absolutely no Writer metadata. Consequently, it does not inherit settings from a previous session. These settings come with default values.

You didn’t tell the purpose of the file. An HTML is intended for website pages. A PDF version has a totally different usage.

IMHO, you should try to “print to file” through a PDF driver directly from your browser.

PS:

I rather seem to see a larger space after initialize.sh but your screenshot does not allow to see the intentionally added spaces.

Wanderer · June 11, 2024, 7:33am

Have you tried, if your document is exported according to your needs, if you

disable this option “Apply spacing between Asian and non-Asian text”

If yes, it may be possible to call a macro directly from command-line, wich makes the settings and could then directly add the export to pdf. But obviously the macro can not be inside the html.

fangzheng · June 11, 2024, 7:42am

Thanks for your reply.

If I use the libreoffice GUI, I can disable the option “Apply spacing between Asian and non-Asian text” and export right pdf directly. But I don’t know how to use command line to disable the option, I also don’t know how to use macro to disable the option.

Can you give me an example?

fangzheng · June 11, 2024, 7:53am

I rather seem to see a larger space after initialize.sh but your screenshot does not allow to see the intentionally added spaces.

I use PDFBOX to read the text of the output pdf, and can make sure there is a spacing before “initialize.sh”

You didn’t tell the purpose of the file. An HTML is intended for website pages. A PDF version has a totally different usage.

My use case is that users will upload documents in various formats to our system, including html. I need to convert them to pdf for some reasons. I need to keep the converted pdf has the same text with original file.

mikekaganski · June 11, 2024, 10:09am

The correct description of the problem would be “LibreOffice imports HTMLs with unexpected spaces between Asian characters and Western characters”. Which would then require analyzing why the setting is applied to the imported document (is it default, that maybe shouldn’t apply to HTML? is it a decision from the filter, which maybe needs re-thinking?). Anyway, no bug report meant no problem that could be resolved in the end.

fangzheng · June 11, 2024, 11:08am

From your reply, I’m not sure what the next todo is. Is the result described above expected? Is there a way to achieve the goal (disable the option “Apply spacing between Asian and non-Asian text”)

mikekaganski · June 11, 2024, 11:18am

Please see: