How to convert a PDF with tables to Calc spreadsheet?

How to convert a PDF with tables to Calc spreadsheet in LibreOffice on Linux? (preferably in command line)

1 Like

A good question but with a sad answer, read on.

Hello,

not at all using LibreOffice. PDF is an exchange / output format (vectorized page description language) and as such not designed to be imported into editors or other programs for editing. Though LibreOffice Draw can import PDFs for editing this is to be considered a “workaround solution” by design which does not provide what you are searching for. There are tools on the internet claiming to perform a conversion to some editable format (i.e word, etc) but LibreOffice is no such tool.

Think of a PDF as a vector based image or photo of a document. Reading it as if it were a CSV file would be similar to using OCR software to convert a scanned document back into an editable word processing document.

yepp - and my answer just tells, that LibreOffice in the sense of your comment is not an OCR tool. So what’s sad with the answer. Just the bad news?

For simple tables or short tables where you can use the space as a delimiter, that is, without spaces in the text of the table, you can do a manual conversion but not using Draw. Paying for a program or for conversion would be easier if you are doing more than just occasionally.

Open the pdf in a pdf reader such as Adobe Reader, copy the text (I mean text, numbers, dates as they are all just text) in the tables, paste into Calc. At this point you can add quote delimiters if needed to allow for spaces within some text, it will get tedious for more than just a few. Select all the text, click Data > Text to Columns and select Space as separator.

2 Likes

I use Convert PDF to Excel. PDF to XLS spreadsheets online and then run it through https://virustotal.com to make sure there is no mischief involved (none so far)

2 Likes

This worked well for a pdf with tables. Tables were put into cells accordingly and could be cut out etc.

Able2Extract has a 7 day free trial. $39 for 1 month or $160 to purchase.
I am using it to extract tables from pdf to calc. The manual leaves a bit to be desired but their online help is fairly fast.

You can try using Okular, a pdf reader available in the Ubuntu repositories (at least).

Search in Software Center

or type in terminal

sudo apt install okular

Okular has an option for table selection (top right) which lets you draw a box around the table and then lets you divide it up to the columns (it also firstly tries to divide it automatically but you can override).

You can then copy paste the selection to Calc. It worked pretty well for me,

2 Likes