I initially convert a document from pdf to word (any version) and then in a later different process convert the word document to html. The word document displays exactly as expected, however,
The two documents that you have produced which you claim to be similar are completely different. There is absolutely no formatting in the html document. Please don’t bother replying unless you are going to help!!
Please accept my apologies for your inconvenience.
In your case, we suggest you please use DocSaveOptions.Mode as RecognitionMode.Flow to get the desired output. Please check the following C# code example.
Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(MyDir + "975.pdf"); Aspose.Pdf.DocSaveOptions options = new Aspose.Pdf.DocSaveOptions(); options.Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow; MemoryStream stream = new MemoryStream(); pdf.Save(MyDir + "Word.doc", options); pdf.Save(stream, options); stream.Position = 0; Aspose.Words.Document doc = new Aspose.Words.Document(stream); doc.Save(MyDir + "Output.html", SaveFormat.Html);
There are two issues in Word output generated by Aspose.PDF. We have logged these issues in our issue tracking system. Following is the detail.
- The position of cell’s text in output Word document is different from input Pdf. Please check the attached image (position of text.png) for detail. This problem is logged as PDFNET-42186.
- The font name of text in output Word document is different from input Pdf. The issue is logged as PDFNET-42187.
In the final html output generated by Aspose.Words, the font name is also incorrect. We have also logged this issue as WORDSNET-14794 in our issue tracking system.
You will be notified via this forum thread once these issues are resolved. We are really very sorry for your inconvenience.