Convert PDF to DOCX using C# | issue with text in tables

cudrea · April 12, 2021, 5:36pm

Hello,

As part of an evaluation of your tool, when using either the code examples from github or the online tool I’ve noticed that when trying to convert PDF files that contain tables to docx format, although the general layout is kept, the text from one section is not separated from the next one.

Is there a way the library could be parameterized (or some additional coding required) to ensure proper segregation of columns when one column is text and the other one a table ?

I’ve attached here the source file used (PDF), the resulting docx as well as some screenshots with the text delimitation within a document section for exemplification.Test files.zip (2.7 MB)

Thank you.

tahir.manzoor · April 12, 2021, 8:17pm

@cudrea

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-22108. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

asad.ali · April 12, 2021, 9:55pm

@cudrea

We have tested the scenario using Aspose.PDF for .NET 21.4 and were able to notice the similar issue in the generated output .docx. Hence, an issue as PDFNET-49758 has been logged in our issue management system for the sake of rectification. We will let you know as soon as the logged issue is fixed. Please be patient and spare us some time.

We are sorry for the inconvenience.

aspose.notifier · October 7, 2021, 11:26am

The issues you have found earlier (filed as WORDSNET-22108) have been fixed in this Aspose.Words for .NET 21.10 update also available on NuGet.