PDF To DOCX - DOCX To PDF Conversion Issues

contactkrsna1 · November 19, 2019, 6:29am

Hi,

I used Aspose PDF Java API 19.10 to convert pdf file to docx file and found the below issues.

The Tables are not converted correctly as editable table format instead it is converting the table to an image and adding text to the table cells
Demo.pdf (102.1 KB)

Because the tables are converted to image , the ASPOSE WORDS API is unable to read the data .

Below is the pdf to word conversion code

// Load source PDF file
	com.aspose.pdf.Document doc = new com.aspose.pdf.Document(DATADIR + pdfFileName);
	// Instantiate Doc SaveOptions instance
	DocSaveOptions saveOptions = new DocSaveOptions();
	// Set output file format as DOCX
	saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
	saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
	// Set the Horizontal proximity as 2.5
	saveOptions.setRelativeHorizontalProximity(2.5f);
	// Enable the value to recognize bullets during conversion process
	saveOptions.setRecognizeBullets(true);
	// Save resultant DOCX file
	doc.save(DATADIR + pdfFileName.replace(".pdf", ".docx"), saveOptions);

Regards,
Krsna

Farhan.Raza · November 19, 2019, 4:44pm

@contactkrsna1

Thank you for contacting support.

A ticket with ID PDFJAVA-38425 is already logged in our issue management system for same problem. We have recorded your concerns and the ticket ID has been attached with this thread so that you will be notified automatically once it is resolved.

contactkrsna1 · November 20, 2019, 7:34am

Hi,
I attached the document . can you please tell how to read the content from the attached document and write in another new document as it is.

Regards,word file.zip (68.8 KB)

Krsna

Farhan.Raza · November 20, 2019, 7:32pm

@contactkrsna1

Would you please mention the format of your expected output. Do you want to create identical PDF document or a DOCX document?

contactkrsna1 · November 21, 2019, 12:11am

Hi Farhan,

I want identical DOCX document.

Regards,
Krsna

contactkrsna1 · November 21, 2019, 12:50am

Hi Farhan,
Attached another sample word1.zip, in which few hyperlinks are there. How to read the text and hyperlinks in it as it is and write it to another identical DOCX file.

word.zip has group shape and table [Note: Table should be read and written to another identical DOCX file as microsoft word table ]
word1.zip has hyperlinksword1.zip (10.3 KB)

Regards,
Krsna

awais.hafeez · November 21, 2019, 9:10am

@contactkrsna1,

Please check if the following solution is acceptable for you?

Document doc = new Document("E:\\Temp\\word1\\in.docx");
Document docTarget = new Document("E:\\Temp\\word1\\word1.docx");

foreach (Node node in docTarget.FirstSection.Body.ChildNodes)
{
    // Lets copy all content of 'word1.docx' at the end of other document
    Node importedNode = doc.ImportNode(node, true, ImportFormatMode.KeepSourceFormatting);
    doc.LastSection.Body.InsertAfter(importedNode, doc.LastSection.Body.LastParagraph);
}

doc.Save("E:\\Temp\\word1\\19.11.docx");

In case the problem still remains, please also attach your word.zip and your expected document showing the desired output here for further testing. You can create expected document by using MS Word.

contactkrsna1 · November 21, 2019, 10:56am

Hi Awais,
Thanks for you response.
I tried the above program and for some reason the program is getting hanged.

Basically , i want to read the attached word docx[word1.docx] and write the whole content to another word document [docx]using Aspose words java.

The attached input document has tables and images which is made of group shapes(word drawings) .
Eg The below attached image has group shapes and the expected should be a single image without group shape
image.jpg (44.1 KB)

The below attached image is a table and content is made of group shapes but the expected is word table
image.jpg (61.6 KB)
word1.zip (68.5 KB)

Regards,
venkata Rama Krishna. Vommi

awais.hafeez · November 21, 2019, 12:53pm

@contactkrsna1,

Please make sure that you are using the latest version of Aspose.Words for .NET i.e. 19.11 on your end. We are unable to observe this performance issue on our end. We used the following documents on our end for testing:

test documents.zip (109.6 KB)

Please try using the following code on above documents:

Document doc = new Document("E:\\Temp\\word1 (1)\\in.docx");
Document docTarget = new Document("E:\\Temp\\word1 (1)\\word1.docx");

foreach (Node node in docTarget.FirstSection.Body.ChildNodes)
{
    // Lets copy all content of 'word1.docx' at the end of other document
    Node importedNode = doc.ImportNode(node, true, ImportFormatMode.KeepSourceFormatting);
    doc.LastSection.Body.InsertAfter(importedNode, doc.LastSection.Body.LastParagraph);
}

doc.Save("E:\\Temp\\word1 (1)\\19.11.docx");