===Edit2 2022-02-16===
Starting with V7 a somehow unclean way to use a variable in the code contained in the example attached to “Edit1” is no longer supported. Though I don’t use the code myself, I stumbled by accident over the issue and fixed it. Since my answer here was upvoted more than once, I assume there are users. They should hence only use the code contained in the new attachment. It should also work in older versions of LibO (tested down to V 4.4.7).
drawTexts2WriterV7.odg (17.1 KB)
===Edit2 End===
===Edit1 2018-09-11===
I came back to this old thread accidentally, and as I had analysed the task since again out of interest, and written and arranged some code for it, I now attach this demo containing that code. The main aspect was moved from the pdf files to the ‘Draw’ document that may or may not be created by opening a pdf. In fact the demo shows some graphic shapes created in Draw and containing text, some of them grouped…
The main enhancement is that now the DrawingDocument, its DrawPages, the ShapeColection(s) and the Shape(s) contained therein are recursively resolved. The contained text is collected first in an array together with information about the original position of the particular shape, and then sorted based on that information in advance of the output to a new TextDocument. Thus the order of text is no longer defined by the logical order of the shapes, but by the visual position basically. To do this even more precisely would require a few corrections to the coordinates of position based on the shapes’ properties.
You may play with the example rearranging the shapes on the second slide and compare the results.
The included sorting algorithm was written from scratch. It isn’t optimised concerning the sorting itself but reducing the number of needed transpositions. (It also is not optimised for the specific task where no transpositions at all would be needed.)
As always interested in criticism and suggestions.
(I personally never made much use of the proceeding.)
The code posted below is from my original answer. It was not changed during editing.
===Edit1 End===
Even when opening a pdf the way @mikekaganski pointed to, it will be “unmanageable” as he also told. In very rare cases where you urgently need to import the text content of a pdf without any formatting, you may open it in LibO Draw and then apply a “macro” collecting the texts.
Since simiiar requests reoccurred now and then I once sketched a very raw piece of code for the purpose in BASIC.
You may use it and enhance it as needed at your one risk.
REM ***** BASIC *****
REM Wolfgang Jäger (Lupp); 2016-09-05; Copyleft 0
Option Explicit
REM This procedure was sketched because questions about moving the textual
REM content from pdf files opened in 'Draw' into an actual text file come up
REM now and then, and there was not offered a solution yet, as far as I know.
REM
REM Of course, this provisional code cannot replace a thorough solution
REM to the problem (if actually needed at all).
REM In specific there is NOT MADE AN ATTEMPT TO RESOLVE GROUPS or to process
REM the 'Draw' objects regarding their position. The sequencing of texts goes
REM along the logical order of the objects.
REM For a PDF automatically imported by 'Draw' this should work.
Sub experimentalExportTextFromDrawToWriterDoc(optional pNum as Long)
Dim doc0 As Object, page As Object, shape As Object, shapeText As String
Dim doc1 As Object, tText As Object,vCur As Object, tCur As Object
Dim i As Long, j As Long, k As Long, m As Long, n As Long, low As Long, high As Long
Dim location As String, newLocation As String, alert As String
Dim unresolvedSignal As String
unresolvedSignal = "%&@~+!!\µ~*?§" REM Arbitray string not occurring somewhere else in the universe!
doc0 = ThisComponent
If NOT doc0.SupportsService("com.sun.star.drawing.DrawingDocument") Then Exit Sub
If IsMissing(pNum) Then pNum = 0
m = doc0.DrawPages().Count()
If (m<pNum) OR (pNum<0) Then
MsgBox "No page "+pNum+" available!"
Exit Sub
End If
location = doc0.GetLocation
newLocation = location+".odt"
If FileExists(newLocation) Then
alert = "Warning! The destination file "+Chr(13)+ newLocation+Chr(13)+ _
"already exists. Please delete or rename it before calling this procedure again!"
MsgBox alert
Exit Sub
End If
doc1 = StarDesktop.LoadComponentFromUrl("private:factory/swriter", "_blank", 0, Array())
doc1.GetCurrentController().GetFrame().GetContainerWindow().SetVisible(False)
doc1.StoreAsUrl(newLocation,Array())
tText = doc1.getText()
vCur = doc1.CurrentController.getViewCursor()
tCur = tText.createTextCursorByRange(vCur.GetEnd())
low = 0 : If pNum > 0 Then low = pNum-1
If pNum=0 Then
high = m - 1
Else
high = pNum - 1
End If
For i = low To high
tText.insertString(tCur, "------PAGE "+(i+1)+ "------", False)
tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
k = 0
page = doc0.DrawPages(i)
n = page.Count()
For j = 0 to n - 1
shape = page.GetByIndex(j)
shapeText = unresolvedSignal
On Error Resume Next
shapeText = shape.Text.String
On Error Goto 0
If shapeText = unresolvedSignal Then
k = k + 1
Else
tText.insertString(tCur, shapeText, False)
tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
End If
Next j
tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
tText.insertString(tCur, "There were "+k+ " unresolved objects on page "+(i+1)+".", False)
tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
tText.insertControlCharacter(tCur, com.sun.star.text.ControlCharacter.PARAGRAPH_BREAK, False)
Next i
doc1.Store
doc1.Close(True)
End Sub