When I copy text from pdf file there is no spacing between text?

asked 2019-11-04 18:04:00 +0100

Haziq gravatar image

I am trying to copying the text from pdf file. Each time I tried to copy text from the pdf file it does not give spacing between text. However when I copy text from the internet its work fine. For clarification here is the link below which gives much general idea: Above is the text which I have copy from pdf file

2 Answers

answered 2019-11-04 20:26:39 +0100

torreone gravatar image

updated 2019-11-04 20:36:37 +0100

The nick Bob_Niland__Error_7103_ in the discussion at the link

proposes the following reasonable reason:

The problem is that the PDF file may or may not have the spaces encoded as space characters, particularly at ends, but also between words and perhaps even characters. The rendering engine (Ps or PDF driver) may have chosen to break "word1 word2" into two strings with two starting coordinates and no U + 0020 space character (or alternative space characters) at all.

As for internet copying, the web pages are written in html and loaded in the browser as a text file (see it from the menu, source document item). From this text the browser derives the dom object structure then used for visualization

It is a completely different case from copying from a pdf. In a web page space are coded as space breaking or not breaking (  )

I don't think it's a topic for this forum though

There may also be an issue with the PDF viewer. Okular in Linux with KDE desktop makes a difference with the selection tools. There are 3 of them: one for "just selection" (usaully resulting in an image), one for text and one for table. With the proper tool I haven't yet encountered this problem.

Nevertheless, your explanation seems right.

ajlittoz gravatar imageajlittoz ( 2019-11-04 20:45:50 +0100 )edit

It is possible that haziq has copied directly from a browser that displayed the pdf or from a pdf reader not predisposed to distinguish the underlying content based on the characteristics

I agree with you, the pdf browser makes the difference.

torreone gravatar imagetorreone ( 2019-11-04 20:50:55 +0100 )edit

answered 2019-11-05 17:59:41 +0100

Haziq gravatar image

I am able to fixed this problem by convert PDF file in to Word format then I export the file into PDF format which does a trick below is the link of copy text which I have copy without any issue: Following are the steps which I have taken:

1) I upload the PDF file to this link which then convert the file to word format.

2) Then I open the converted file in to MS word then export it again to PDF format which solve my problem.

