PDF to Docx conversition

obireddy6 · November 7, 2019, 3:45pm

Hi Team,

We are converting pdf to docx and applying our template based styles and noticed that hyperlinks are not identifying the excat text where in pdf and causing links to other normal text as well.
While reading the paragraphs of the Docx file, we tried to identify the text containing the Hyperlinks and set the Url
After setting the Urls, the output docx has the text appended with the unwanted text (Such as ‘HYPERLINK’) and also it appends the complete url for each character of the text.

Tried below possible ways but unable to achive, Could you please help us on identifying hyperlink in docx file.
|a.|After reading the document, we tried to get all the Hyperlinks in the document
|b.|Tried to match with the text which contains the hyperlink and set the url.|
|c.|We are unable to match the text with the text containing the actual hyperlinks.|

tahir.manzoor · November 7, 2019, 4:43pm

@obireddy6

To ensure a timely and accurate response, please attach the following resources here for testing:

Your input PDF and Word document.
Please attach the output Word file that shows the undesired behavior.
Please attach the expected output Word file that shows the desired behavior.
Please create a standalone console application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.