How i will convert PDF tables to .csv file as tabula work’s.
libreoffice --convert-to csv a1.pdf
not work’s at all, or any other why than i can switch between multiple formats and then came to csv
At first: There is no general way to solve this. A pdf can contain no letter at all, but a scanned image. Then you need an OCR-software to create text again.
.
Actually pdf is more a programming language to position objects on pages. They do not necessarily preserve the sequence of text or tables. But there is specialised software for extracting text out of pdf available.
.
A pdf may be opened in Draw, is therefore interpreted as a sequence of instructions to draw something, maybe letters. You can do some little changes there but it is not the right tool to extract tables.
.
LibreOffice can convert its output to pdf, as well as printing the content. But this is usually not easily reverted. Try to get your paper-printed page back into Writer or Calc. A similiar way has to be done for reading a pdf back.
I support everything said above, look for a better way. I forked https://pdf_statement_reader which uses tabula.
Saving a simple single-sheet as a .pdf file and reading it back into Calc is possible. However, since .pdf is basically a graphical representation (a set of drawing commands) that does not store cell formulas. In principle you could only import the data from the .pdf file imported via an invisible Draw project into your Calc project, converted to a .csv string. And then pump the value of each cell back to your Calc spreadsheet, using a nested “two dimensioned” for - next - loop. To make everything work you should save formulas, row/column locations and layaout into a separate “layer” data file and finally read that data back and pump the formulas and layout data in right cells. To build a comprehensive system would mainly strain your glutes and take time (maybe years).
'CSV Logic
Dim csvText As String
Dim prevY As Long
Dim firstShape As Boolean
Sub BuildCsv
firstShape = True
For Each oShape In oPage
If oShape.supportsService("com.sun.star.drawing.TextShape") Then
Dim curY As Long
curY = oShape.Position.Y
Dim curText As String
curText = oShape.getText().getString()
If firstShape Then
csvText = curText
firstShape = False
Else
If curY = prevY Then
' same row → column
csvText = csvText & ";" & curText
Else
' new line
csvText = csvText & Chr(13) & Chr(10) & curText
End If
End If
prevY = curY
End If
Next
' cleanup: remove ; before line break
csvText = Replace(csvText, ";" & vbCrLf, vbCrLf)
End Sub
So the question is: What “benefit” would there be of such a system other than the fact that importing data into a Calc Spreadsheet would take up unnecessary time?