Pdf containing tables is not correctly converted

memoq · September 23, 2017, 3:15pm

Dear Aspose team,

I have a pdf file that contains a table. In the converted docx file has some issues:

The table is broken into multiple tables.
The content of some rows is placed into separate tables that are laid above the main table. This is a problem if one cell of the main table is changed then these overlaid tables are not moving with the resized background.
Some empty pages are also inserted into the document.

Could you take a look at this if you can fix these issues?

Sample pdf and docx in the pack: sample.zip (161.6 KB)

Best regards,

Gergely Vándor
31188

imran.rafique · September 23, 2017, 7:58pm

@gergelyv,

We have converted your source PDF to DOCX with the latest version 17.9 of Aspose.Pdf for .NET API and could not replicate the said issues. This is the output DOCX: PDFToDOCX_out.zip (65.5 KB)

[C#]

string dataDir = @"C:\Pdf\test339\";
// Open the source PDF document
Document pdfDocument = new Document(dataDir + "temp.pdf");
// Save the file into MS document format
pdfDocument.Save(dataDir + "PDFToDOCX_out.docx", Aspose.Pdf.SaveFormat.DocX);

memoq · October 14, 2017, 11:24am

@imran.rafique

Thanks, we will check the new version.

memoq · October 16, 2017, 11:21am

Dear Aspose team,

We have still an issue with our conversion.
We use the following save options:

DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.Format = DocSaveOptions.DocFormat.DocX;
saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow;

pdfDocument.Save(dataDir + “PDFToDOCX_out.docx”, saveOptions);

The result of the conversion contains some illegal content, however Word can open the file, but we cannot validate it to the openXML schema.
For example:
1.A)
<w:tblLook>
<w:val>“04a0”</w:val>
</w:tblLook>
1.B)
valid xml would be: <w:tblLook w:val=“04a0”/>

2.A)
<w:cr>""</w:cr>
2.B)
the valid xml would be: <w:cr/>

Could you please also fix these issues?

Best Regards,
Daniel Lengyel

imran.rafique · October 16, 2017, 9:13pm

@gergelyv,

We have logged an investigation under the ticket ID PDFNET-43519 in our issue tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates.