Ask Your Question
0

Is there a way to control the font with --convert-to txt to docx

asked 2018-11-13 22:10:45 +0200

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

I'm running LO 6 on Manjaro Linux.

I want to programmatically create .docx files from .txt. This works fine already except that I need a mono-spaced font so that the content will be formatted correctly. LO seems to want to use Liberation Mono but it does not exist in some contexts (i.e. Android, Windows) and a proportional font gets substituted. I'd like to be able to specify that something like Courier New be used. Is this possible?

Thanks,

~ray

edit retag flag offensive close merge delete

Comments

1

Please don't post as wiki. I never saw it coming out useful.

Lupp gravatar imageLupp ( 2018-11-13 23:13:51 +0200 )edit

sry.. first time.. there was no explanation for the checkbox that I could see but I had seen this referred to as a wiki, so....

raybert gravatar imageraybert ( 2018-11-13 23:38:16 +0200 )edit

3 Answers

Sort by » oldest newest most voted
1

answered 2018-11-14 22:33:27 +0200

raybert gravatar image

updated 2018-11-14 22:35:06 +0200

FWIW, I came up with my own solution for this (at least until a better one is found). I wrote a script that unzips the docx file and uses xmlstarlet to change the fonts in styles.xml (then re-zips). This is working fine even if not the "cleanest" solution. Here's what it looks like (in case it's helpful to anyone):

#!/bin/bash

FN="myfile.docx"
TMP="xx"

function fail() { echo "error: $*"; exit 1; }

[ -e $FN ] || fail "$FN not found"

WD=$PWD
function cleanup()
{
    cd "$WD"
    [ -e $TMP ] && rm -rf $TMP >/dev/null
}
trap cleanup EXIT
set -e

mkdir $TMP
cd $TMP
unzip -q ../$FN
cd word
xmlstarlet ed -u '//@w:ascii|//@w:hAnsi' -v 'Courier New' styles.xml > newstyles.xml
mv newstyles.xml styles.xml
cd ..
rm ../$FN
zip -rq ../$FN *
edit flag offensive delete link more

Comments

Yes, to manipulate the file directly is an appropriate way if you know the needed details I mostly try to avoid it (except for repairs), and I never did it with docx.
In the specific case it's about a font name exclusively. If many style attributes or complete styles should be changed/updated, the usage of a LibO file as a platform may be preferrable.

Lupp gravatar imageLupp ( 2018-11-14 23:36:52 +0200 )edit
1

answered 2018-11-15 07:23:11 +0200

updated 2018-11-15 08:51:40 +0200

It is very easy to define font name when converting from plain text files.

LibreOffice help includes an example for conversion from plain text file:

--infilter="Text (encoded):UTF8,LF,,"

The help didn't specify what were those missing parameters after LF (now I added that, and it will go to the next help version), but here they are:

  1. UTF8 is encoding used to decode the file.
  2. LF is line ending format (CR and CRLF are the other allowed options; if missing, CRLF is used on Windows, and LF on all other platforms).
  3. Font name.
  4. BCP 47 Language tag.

So, the command line could be like this:

soffice --infilter="Text (encoded):UTF8,,Courier New,en-US" --convert-to docx path/to/file.txt

to convert a UTF8-encoded plain-text file with default line endings, using Courier New font, and English (USA) language for the imported text.

A side note

LibreOffice is quite smart when it comes to default fonts it uses, which are possibly not available on other systems. For instance, for Liberation Mono, it defines a substitute font in the generated docx (see word/fontTable.xml), which is Courier New; as well as the font properties (fixed-pitch "modern" font), which allows to find proper substitutions on any system.

edit flag offensive delete link more

Comments

@Mike Kaganski: Thanks. Concerning the sole definition of a font to use, I was very discontent with my suggestion (which was sketched for also loading a page style originally).
However the poor documentation of filter parameters and of .uno-commands remains an annoying issue.

Lupp gravatar imageLupp ( 2018-11-15 13:54:39 +0200 )edit

Thanks! This is just what I was hoping for. Unfortunately, my results are mixed and ultimately still better with xmlstarlet.

The difference seems to be that --infilter only changes the PreformattedText style; it leaves Default set to Liberation. This seems to be interpreted differently in different contexts.

For example, Word on Win10 comes up in "reader mode" by default and it displays the --infilter version in a proportional font (the xmlstarlet version is displayed in mono). Both ...

raybert gravatar imageraybert ( 2018-11-15 23:19:04 +0200 )edit

... versions do display correctly (ie in mono) when you switch to "print mode".

On Android, I've tested with both WPS Office and Polaris Office. Polaris behaves like Word/Win10 (proportional with --infilter, mono with xmlstarlet). WPS, unfortunately, appears to do the wrong thing always. :(

Adding to the ...

raybert gravatar imageraybert ( 2018-11-15 23:19:36 +0200 )edit

... confusion in my testing, I discovered that LO by default on my laptop running Arch uses Courier New but my desktop running Manjaro uses Liberation (both are fully updated). I'm not sure why; haven't found any config or package difference so far.

Anyway, I guess I'll be sticking with xmlstarlet for now, but thanks anyway for this information!

raybert gravatar imageraybert ( 2018-11-15 23:19:55 +0200 )edit
0

answered 2018-11-13 23:13:09 +0200

Lupp gravatar image

updated 2018-11-14 23:30:16 +0200

(Don't think it's OS dependent - except for the syntax of the template's pathname, of course.)
You need to load the character style you want from a template during the process.
See the recent related thread https://forum.openoffice.org/en/forum... in that other very valuable forum. My answer https://forum.openoffice.org/en/forum... there may help you.

===Edit 2018-11-.14 22:20 regarding the comments===
I made a demo to better explain what I meant. It is attached here as an archive (.ods is a fake. Remove it.) containing a template, two plain-text files as examples, and the spreadsheet file where a parameter range is in B2:C4, and some scripts are in the module scripts of the Standard library.
Extract the files to a common empty folder, and open the spreadsheet file. Adapt the pathnames to the actual situation and save the ods anew. A commandline for Win is prepared now in cell A20. Adapt the elements to the needs of a Mac.
If you now run the command from a terminal you will need to act on a prompt. This is unavoidable for security reasons if code from a file shall run. An alternative is, to move the scripts module to the local Standard library and to change location=docukent to location=application in the query part of the command URI.

edit flag offensive delete link more

Comments

thanks, but I don't see how to do this programmatically (I'm doing it from a Makefile)?

I did try to make an .ott and load it with -n but didn't work -- -n causes a writer window to open and doesn't seem to effect --convert-to . Adding --headless (which, I know, shouldn't do anything) causes it to hang (need to ^C it).

I know I can load the new .docx into LO and edit the font manually (or with a macro) but my question is whether it can be done programmatically.

raybert gravatar imageraybert ( 2018-11-13 23:42:31 +0200 )edit

What I suggested didn't require to load a .docx and to change there anything manually. It requires to run some code based on the LibO API and/or .uno: commands (these again executed with the help of an API service). If it was my job I would probably use a spreadsheet as a kind of batch. Any single plain-text file would be opened, reformatted by loading styles from a template, and then stored to the target format. That's a conversion, isn't it? I dont think it's feasible by a CL option.

Lupp gravatar imageLupp ( 2018-11-14 01:10:30 +0200 )edit

I guess I didn't read it carefully enough. I'll take another look. In the meantime though I came up with another way to fix the problem; I'll add a new answer.

raybert gravatar imageraybert ( 2018-11-14 22:27:54 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-11-13 22:10:45 +0200

Seen: 196 times

Last updated: Nov 15 '18