File Conversion: Printed Receipt >> PDF >> Libre Office Calc

Attempting to find a clean format conversion from printed receipts into calc sheet to work some data analysis.

Can anyone be of assistance in applications or use of conversions.

I have also the accessibility to phone applications from the store provided application on an android.

No instructions possible, because

  • Big varieties in printed documents possible
  • PDF will not necessarily contain tabular information, and representation may vary also.
  • The next step from pdf to text/calc wil require OCR. Some programs may be able to give good results for this, but you need to test your case.

Some hints:

  • If you have simple and uniform paper to scan a direct approach via OpenCV and python may be possible.
  • Tessaract is a quite capable OCR, but to get tabular data you may need to check, if commercial OCR-software is more useful.
    .
    No step is included in LibreOffice. However it may be possible to start import scripts from macro.
    .
    PS: What is meant by “clean” in your
1 Like

As PPS:
I’m using Microsoft Lens on an Android Mobile to scan to PDF. This can be used offline. In older versions OCR was possible with Word .docx files as destination in OneDrive, so I assume this was a Cloud-service.
.
It seems OCR is now possible (default?) for all pdf. And checking instructions I found they have a “table”-mode too. But I didn’t test this much. On my (complicated) test it will need a lot of manual edits. The docs only mention OCR for the Word-option, but obviously it is needed also for the table. (EDIT: Found the hint at the end in “further information” , so extracting text aka OCR is possible in 20 languages with pdf as destination.)
.
German link to MS-Lens docs (they redirect me to german, so I can’t give a correct english link:

German hint to OCR in MS-Lens:

Note that, if and only if the PDF content is textual with a tabular structure (like when directly exported by Calc for instance) and not the result of a scan (= image), you may try Tabula which I had successfully used a while ago.