Converting ms office words forms to LibreOffice Writer docs

A. background

Some 7 years ago, I catalogued about 240 specimens of a mineral collection. I did this on a windows-7 home box, using microsoft office word home and student 2010. Neither windows-7 nor microsoft office word 2010 are maintained any longer. I do also have microsoft office home and student 2016, but that, too, is no longer maintained. I recently installed LibreOffice on that windows box.

I also have a Linux Fedora-34 workstation. I have LibreOffice on this workstation.

I need to get all those catalogue forms properly converted to LibreOffice documents.

B. problem

When I try to load any of those catalogue forms in LibreOffice, the document is not even close to being correctly formatted. Also, some of the original text does not show up.

C. questions

Using what I currently have, how do I get all those 240ish catalogue forms properly converted as quickly and easily as is practical?

D. example

Since these files are huge, I put them on the google drive rather than attaching them to this thread.

ā€œspecimen label 239ā€
(this is one of the actual catalogue forms)
"specimen label 239.docx - Google Docs

ā€œMSOffice2010_capture.JPGā€
(this shows how the catalogue form appears in microsoft office 2010 word)
"MSOffice2010_capture.JPG - Google Drive

ā€œMSOffice2016_capture.JPGā€
(this shows how the catalogue form appears in microsoft office 2016 word)
"MSOffice2016_capture.JPG - Google Drive

ā€œLowriter_capture.JPGā€
(this shows how the catalogue form appears in LibreOffice writer)
"LOwriter_capture.JPG - Google Drive

E. version information

(source box)

windows 7 home premium service pack 1

Microsoft Office Home and Student 2010
version 14.0.7268.5000 (32-bit)

Microsoft Office Home and Student 2016
version 2002 (build 12527.22079 click-to-run)
microsoft word 2016 MSO (16.0.12527,22086) 32-bit

LibreOffice version information (on the windows-7 box):
Version: 7.2.5.2 (x64) / LibreOffice Community

(target workstation)

Fedora 34 (last updated February 03, 2022)

LibreOffice Version: 7.1.8.1
Build ID: 10(Build:1)
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-US (en_US.utf8); UI: en-US
Calc: threaded

Thank-you in advance for your help.
Bill.

I opened your docx in Word 2019, selected everything (Ctrl+A) copied and pasted it into LibreOffice 7.1.8.1 and all the data pasted in order from what I could see. The problematic MS frames/text boxes were lost which is no bad thing as they have to be converted, either from Word, or opening in Writer to Text boxes which are drawing objects.

Canā€™t get word 2019. It requires upgrading windows, which requires new hardware, which requires $$$$$$ that I donā€™t have.

I didnā€™t quite understand the last part of the last sentence. Getting the text from word to writer is easy, as you imply. But I need the formatting (frames, text boxes, tables) too.

On another computer with Word 2010 here is the result from simply copying and pasting everything:
specimen label 239PastedFromWord2010.odt (18.9 KB)

Every ā€œblockā€ of information seems OK and in order, some blocks are out of order possibly due to anchoring position in Word.

Microsoft said in Help on Word 2019 that frames are converted to text boxes on conversion (there is no help left for 2010), Writer converts MS frames to Text boxes. It seems that some properties of MS frames lie outside the compatibility overlap, the same way that Writer frames are converted to text boxes converting the other way. So I conclude that the frames are the problem.

BTW see [Tutorial] Differences between Microsoft and AOO/LO files (View topic) ā€¢ Apache OpenOffice Community Forum

I did further ā€œexperimentationā€ on this. Things are worse than I thought.

I tried using word-2016 to convert the word-10 document to word-16 (.docx). It didnā€™t help.

I tried using word-2016 to save the document as a .odt. It didnā€™t help.

In Fedora, using LibreOffice Writer, I imported the word document. It took considerable effort and time to get it to what it should be (except for small font-sizing issues). I could not find a way to select what should be a table. I also cannot select the frame contents (a text box?) as a whole. Frame contents (text boxes?) were very determined to shift right and/or disappear. When I finally had the document appearing in Writer as close as seemed possible to what the original looked like in word, I saved it, and exited LibreOffice. Later, I re-launched LibreOffice Writer, and some of my work was undone (lost). Frame content (text boxes?) were shifted right. Tables? and text boxes? remain un-selectable. Text selection is difficult. Summary: saving is not saving everything I did.

I said Copy (with clipboard) from Word 2010 and paste into Writer.

You have laid out your document using paragraph spacing which will cause problems.

As you want half page size cards, why not put a page break after each ā€œcardā€ and then change the paper size to 1/2 Letter in Landscape? Some cards will extend on to 2 pages because of the paragraph breaks. You can do a global Find and Replace with Regular Expressions to remove blank paragraphs which should give plenty of room for reformatting and adding further information.

Puzzlingā€¦ Two of your posts never got to my e-mail.

I booted up my windows-7 box late this afternoon, and tried the select and copy. I tried it both from office-2010 and office 2016. I see basically the same results as you. I also ā€œplayedā€ around some in other ways. Setting the margins in Writer to match those in word helps. Realistically, Iā€™d say Iā€™m an upper beginner, not intermediate, certainly not advanced , with both Writer and word. Iā€™m not comfortable with anchors, frames, and text boxes. Iā€™ll have to play around some more to see how best to set up a ā€œtemplateā€ Writer document to make the transfers quick, easy, and correct.

If I understand your 1/2 letter on landscape suggestion, I think that would make the form too narrow for the table for many of my specimens. What I have (in word) is about as good a ā€œbalanceā€ as I think I can get. Also, some 480 specimens are already catalogued. Iā€™d like to avoid re-designing the form.

Thank-you for you time and effort.

Half letter (or Memo, Statement, Mini, Invoice, Stationary is available in USA I believe) in Landscape is the same width as Letter in portrait, 5+1ā„2 Ɨ 8+1ā„2 or 140 Ɨ 216 mm. I notice that you intend each card to be double-sided so having the page size match the card would make it easier to print double-sided (flip short edge). Odd numbers are side 1, even page numbers are side 2

A rudimentary sample shown
specimen label 239HalfLetter.odt (26.0 KB)

You should be able to see the formatting marks and where paragraph breaks are, so click View > Formatting marks (Ctrl+F10)

To remove empty paragraphs in a selection, press Edit > Find and Replace. In the dialogue box that opens

  • In the Find field enter ^$
  • In the Replace field, leave it blank
  • Tick Current selection only (or leave it unticked to find in the whole document)
  • Tick the box Regular expressions
  • Click Replace All

To enter a Page Break at the end of the information on one side of a card and start a new side, press Ctrl+Enter.

This is a nice example of an incompatibility.
I downloaded the file and opened it in Writer.
It showed 2 pages with the first page being blank.
The second page is split horizontally down the middle.
Frames are displayed, but they cover frames underneath.

Tilasite

A simple conversion is IMHO not possible.

But you should definitely report the problem as a bug on Bugzilla.
How to Report Bugs in LibreOffice


Annotation:
Data that is repetitive in type is predestined for databases. From this you can derive wonderful reports.

Sorry for the long delay in responding; I really got swamped shortly after opening this thread.

The horizontal splitting of the 2 pages is as it should be.

As suggested, I submitted a bug: tdf#147360.

1 Like

I see the same problem trying to load the file into SoftMaker Office which now uses .docx as itā€™s default file format, so itā€™s not just a LibreOffice issue.

I had reasonable luck by clicking the border on each box outline for elements that should be on the first page and then dragging the anchor and dropping it on the first page.

Sorry for the long delay in responding; I really got swamped shortly after opening this thread.

I tried what you said. That gets the first page frames to the first page, but thatā€™s all.

I tried other things, they help some, but not enough. See bug 147360, which I submitted a short while ago.

I had the same kind of problem trying to transfer very old M$ MacWord archived files. The format canā€™t even be read by ā€œmodernā€ Word and even less be directly imported into LO Writer. Since I didnā€™t want to lose some inserted ā€œobjectsā€ like tables and formulas, I saved as RTF with the antiquated Word (not plain text because many artefacts would have been lost). The RTF files were transferred on my Linux box and read into Writer. There an ancillary step was necessary because in this old time Classic MacOS used a proprietary 256-character set, MacRoman, which obviously is not Unicode and non-ASCII characters come out rather strange (but deterministically). A Find & Replace step was necessary to restore intended character encoding.

When this was done, I had to fix the styles and completely reassign them.

If you can be satisfied with a plain text version of your files (but that seems to be ruled out because of the tables), changing character encoding from Windows CP-1252 to Unicode can be done with a small bash script. After that, you can import the translated file into Writer and tackle the restyling.