Conversion of PDF to Word, tables and bulleted points support

Is there any update on this now?
Its been so long now, users of our product are asking for this fix since so long and its downgrading our product.
can you please do the needful as soon as possible?

@Hiral201092

We would like to share with you that issue is complex in nature and requires certain amount of time to get resolved. It is related to our PDF to DOCX converting engine and we are in a phase of implementing new changes in it. Please note that all tables in PDF is a picture line grid overlayed over the text fragments. We are trying to recognize it in our new engine EnhancedFlow (in early beta state now).

docSaveOptions.Mode = RecognitionMode.EnhancedFlow; 

It will be available in upcoming Aspose.PDF for Java 20.8 release. Further fixing and investigation against this ticket will be continued and as soon as all sub-tasks are completed, we will be able to share some reliable ETA for complete fix of the issue. We highly appreciate your patience in this matter. Please spare us some time.

We are sorry for the delay and inconvenience.

PS: Please check the attached output document (this is how output will look in 20.8).
PDFNET_48222BulletedList_20_8.zip (27.2 KB)

i’m using Aspose.PDF for .Net, when will new changes published for that?

@Hiral201092

The Aspose.PDF for .NET 20.8 has been published and available now. You can use RecognitionMode.EnhancedFlow with this version of the API. However, ticket is not completely resolved as other issues in the document are to be fixed. We will inform you as soon as the ticket is resolved.

1 Like

Thank you!!

Is this now completely resolved?

@Hiral201092

It is good to know that your issue has been resolved now. Please keep using our API and in case you need further assistance, please feel free to create a new topic.

I tried to use Aspose.PDF 20.8 but there is one more issue with this:
I’m using “saveOptions.Mode = DocSaveOptions.RecognitionMode.EnhancedFlow;” this mode to convert PDF to DOCx and its converting tables correctly but its removing other formatting.

Please go through the ZIP files I have attached, When we convert the same pdf to word using Acrobat Pro than it is converted properly (see AcrobatPro converted.docx) but when I use Apose it removes background colors of the table.PDFToWordIssue.zip (208.9 KB)

@Hiral201092

The earlier logged ticket (PDFNET-48222) is not yet completely resolved as not all formatting issues have been sorted out completely. We just introduced a new option in the latest release. We are continuously working over the ticket and its sub-tasks to get it resolved and will inform you as soon as it is fixed.

Furthermore, we have logged an issue as PDFNET-48727 in our issue tracking system for your recently shared file and information and will look into it as well. We will inform you as soon as we have additional updates in this regard. Please give us some time.

We apologize for your inconvenience.

Is there any update on this?

@Hiral201092

We are afraid that we do not have any updates regarding resolution of the tickets at the moment. We will post in this forum thread as soon as we have some definite updates in this matter. Please give us some time.

Is there any update on this now?

@Hiral201092

We are working over both tickets and have plans to implement basic text formatting in version 20.10. We also intend to improve recognition of the table borders in version 20.11. In case of any further updates, we will inform you.

Thank You!
When can I expect these new versions release?

@Hiral201092

The 20.10 version will be released in first week of October 2010 and 20.11 update will be published in November 2020.

I tested the new version 20.10.0 but issue is not resolved yet.
I checked “saveOptions.Mode = DocSaveOptions.RecognitionMode.EnhancedFlow;” this mode to convert PDF to DOCx and its converting tables correctly but its removing other formatting.

can you please test earlier provided sample file with 20.10.0 version? and can you please provide estimated time, when can I expect this fix to be completed?
Due to this our product is downgrading and this fix is bit urgent for us.

@Hiral201092

We really apologize for the inconvenience being faced due to the issue. Please note that the issue depends upon various internal components of the API and is requiring significant amount of time to get completely resolved. Despite being logged under free support model, we already have been doing investigation to resolve it and working over its fix parallel to other tasks.

We really regret that we cannot share any promising ETA at the moment as the ticket is currently under investigation phase. You may however please check our priority support option in case issue is a blocker for you and needs to be resolved on urgent basis.

We have however recorded your concerns and will consider them during the investigation. Will surely inform you as soon as we have some news about the ticket resolution or ETA. Please give us some time.

We are sorry for the delay and inconvenience.

Is there any update on this issue now?

@Hiral201092

Regretfully, the issues are not yet resolved. However, please note that we have improved recognition of the table borders in 20.11 version of the API which will be published in this week. We further plan to improve the formatting in 20.12 version of the API. We will surely inform you as soon as both tickets are completely resolved. Please give us some time.

We are sorry for the inconvenience.

Is it resolved now?

@Hiral201092

Complete resolution of the tickets is dependent upon various internal components of the API. Therefore, it requires certain amount to time to incorporate and include it. However, as shared earlier, we have improved formatting in the output files in 20.12 and will further keep working on resolving the issues. We will let you know as soon as we additional updates in this regard. Please spare us some time.

We apologize for the inconvenience.