Convert PDF to DOCX using C# - Annotation is not present in output word file

Hi,

i am using AsposeDocx to Convert the Annotated PDF document to Docx, the issue is that Annotation hides the actual text that is Annotated in PDF File.

I have attached the both input.pdf file as well output.docx file.

The following is my code snippet to convert annotated pdf to docx

var Options = new DocSaveOptions();
Options.WarningHandler = callback;              
Options.Format = DocSaveOptions.DocFormat.DocX;
Options.RecognizeBullets = true;                
var streamResult = new MemoryStream();
using ( var document = new Document(streamSource))
{
    document.Save(streamResult, Options);
}

Can you please guide how make it correct Annotated document.

input.pdf (244.5 KB)
output (2).docx (96.1 KB)

Thanks

@HassanNorthbay

You are using DocSaveOptions and saving document to DOCX file format. Please use OoxmlSaveOptions as shown below to save the document to DOCX file format. The DocSaveOptions is used for DOC file format.

var Options = new OoxmlSaveOptions();
Options.Compliance = OoxmlCompliance.Iso29500_2008_Strict;

var doc = new Document(MyDir + "input.pdf");
doc.Save(MyDir + "21.6.docx", Options);

We have tested the scenario using the above code example and noticed that the last page contents are lost after PDF to DOCX conversion. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-22387. You will be notified via this forum thread once this issue is resolved. We apologize for your inconvenience.

The other contents of PDF are exported correctly in output DOCX. Please check the attached output DOCX. 21.6.docx (15.3 KB)

Hi ,

But Annotation is still missing in 21.6.docx. In PDF File, we have some annotation that will lost in converted docx file.

Please see the below PDF Annotation that is not present in docx file

image.png (54.0 KB)

@HassanNorthbay

The shared annotation does not export when you convert PDF to DOCX using Adobe Writer. Could you please share your expected output Word document?

Hi,

This is what i am thinking may be expected output of generated docx.

Just for Clarification, i am using Aspose.pdf library not Aspose.Word. Sorry My bad :frowning:

Annotated.docx (20.3 KB)

I have tried with Aspose.Pdf Version 21.6.0, and got the following docx file, which is incorrect as Annotation box appear but actual text is still missing

21.6.docx (46.9 KB)

@HassanNorthbay

We have moved this forum thread to Aspose.PDF forum where you will be guided appropriately.

@HassanNorthbay

We were able to notice the missing annotation in the output .docx file generated using Aspose.PDF for .NET 21.6 at our end. Hence, an issue as PDFNET-50102 has been logged in our issue tracking system for the sake of rectification. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

@asad.ali,

Can you please tell me date when you are come up with this fix, because this fix is very important for my application?

Thanks

@HassanNorthbay

The issue has recently been logged in our issue tracking system and will be investigated and resolved on a first come first serve basis. We will surely inform you as soon as we make some definite progress towards its fix. Please be patient and spare us some time.

We are sorry for the inconvenience.

Hi Asad,

Can you give me some estimated timeframe like (2 weeks or 3 week or 1 month), so that I can inform my client respectively?

Because right now, i have no estimated time to inform to him.

Thanks

@HassanNorthbay

We really regret that we cannot share any ETA at the moment as ticket is pending for analysis. Furthermore, please note that resolution time of the issue in free support model depends upon the number of issues logged prior to it. However, we have recorded your concerns and will surely share some news about ETA as soon as the ticket is fully investigated. We appreciate your patience in this regard.

We apologize for the inconvenience.

The issues you have found earlier (filed as WORDSNET-22387) have been fixed in this Aspose.Words for .NET 21.7 update and this Aspose.Words for Java 21.7 update.

Hi Asad,

When you are going to plan for fix PDFNET-50102 on Aspose.PDF library?

I am using Aspose.PDF, I am really interested in PDFNET-50102 fix actually.

Moreover I have license for Aspose.pdf till “2018-06-14”. I am eagerly waiting for this fix and will upgrade my license once fix is available in newer version.

Thanks

@HassanNorthbay

We are afraid that we cannot share any reliable ETA at the moment as the ticket is not fully investigated yet. As shared earlier, it would be analyzed and resolved on a first come first serve basis and as soon as we make some certain progress towards its fix, we will inform you. We have also recorded your concerns and will surely consider them during issue investigation. Please spare us some time.

We are sorry for the inconvenience.

@asad.ali

Any Updates on this PDFNET-50102 fix?

It has been while since it was reported and i also told you before it is very critical issue for me.

Thanks

@HassanNorthbay

We had already recorded your concerns and raised the issue to next level. However, we really regret that its investigation could not get completed due to previously logged issues in the queue. We will surely inform you as soon as we have some definite updates about ticket resolution. We highly appreciate your patience and comprehension in this matter. Please spare us some time.

We apologize for the inconvenience and delay.

Any Updates on this PDFNET-50102 fix?

Do you have any plan for this fix, since it has been very long time this ticket was logged :frowning:

FYI, this fix is present in earlier versions but miss in new version.

Thanks

@HassanNorthbay

We would like to share with you that your issue is under the investigation phase at the moment. We will surely let you know as soon as the investigation is done and we have some news about fix ETA. We apologize for the delay and the inconvenience caused due to this issue.

Hi,
Is there any update on this issue?

@kainat123

We would like to share that the ticket PDFNET-50102 has been resolved in 22.1 version of the API which is going to be launched soon. We will let you know once it is available.