Comparing to the Microsoft.Office.Interop.Word app, the Aspose.Pdf Save method does not appear to render tables in the expected original format. They are picture frames with embedded text and so are not editable in the converted docx file. Numbered outlines do not have the same indentation and spacing compared to the Word app conversion. Is this a conversion limitation, or are there other options available?
In the code below, I am using the RecognitionMode.Flow option:
string strSrcFilePath = @"C:\temp\InputDocument.pdf";
using(Aspose.Pdf.Document theDoc = new Aspose.Pdf.Document(strSrcFilePath))
{
Aspose.Pdf.DocSaveOptions docSaveOptions = new Aspose.Pdf.DocSaveOptions();
docSaveOptions.Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX;
docSaveOptions.Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow;
docSaveOptions.RecognizeBullets = true;
theDoc.Save(@"C:\temp\AsposePDF_ConvertedDocument.docx",docSaveOptions);
} // using Aspose.Pdf.Document
We have converted your source PDF with the latest version 17.11 of Aspose.Pdf for .NET API. All text items are editable and numbered outlines are aligned. This is the output DOCX: AsposePDF_ConvertedDocument.zip (102.2 KB). Kindly review and let us know if you find any problematic behavior.
All text items are editable but there are two observations to note.
The converted tables are not true Word tables. The Aspose version has a picture line grid overlayed over the text cells, whereas the WordApp version has rendered a true table which can be selected when hovering over the upper left corner of the table.
The numbered outline text has some artifacts where some text lines are not rendered properly on separate lines. This can be discovered by selecting a block of text and copy/pasting into WordPad. The WordApp version shows a more expected rendering.
Attached is the WordApp converted document for comparison. I am mostly concerned about the table formatting.
We are having the same issue, the overlayed table border image doesnt always even cover the content properly, will be much better if an actuall word table with border. I cant see the attached issues on this ticket, do they have a status? Is this something being actively worked on? Is there an expected timeline for resolution? Many thanks.
Both these tickets (PDFNET-43817 and PDFNET-43818) have been identified recently and pending for the analysis. Our product team will investigate as per their development schedules. We recommend you please create a separate thread and share your problematic documents with the complete details including code. We will investigate and share our findings with you. If the problems are same, even then your scenario will be verified once the root cause is fixed.