Free Support Forum - aspose.com

Convert pdf to html- pictures in pdf caused some file content missed

when i convert pdf to html, there have some pictures in pdf, and i found it caused some file content missed.The attachment is the file i used.

Hi Dalonggeng,


Thanks for contacting support.

In order to test the scenario, I have tried viewing the attached PDF file but I am afraid it appears to be damaged. Can you please double check at your end and again share the input PDF file. We are sorry for your inconvenience.

Hi,
Thanks for your reply.

I downloaded the zip file,and i could extract it, and open the pdf file. In my another Post, the zip file is the same one,you can try to download that file.Here is the url http://www.aspose.com/community/forums/627375/convert-pdf-to-html/showthread.aspx#627375

Hi Dalonggeng,


Thanks for sharing the details.

I have tested the scenario using Aspose.Pdf for .NET where I have used following code snippet (as shared in your other forum post), and as per my observations, the images do appear in resultant HTML. For your reference, I have also attached the resultant file generated over my end.

BTW, I am still getting an error message when trying to view the file in Adobe Reader 11.0.3

[C#]

Document exportDoc = new Document(@“C:\pdftest\104kb.pdf”); <o:p></o:p>

HtmlSaveOptions newOptions = new HtmlSaveOptions();

newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

newOptions.RemoveEmptyAreasOnTopAndBottom = true;

newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;

newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;

newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;

exportDoc.Save(@“C:\pdftest\104kb_Resultant.html”, newOptions);

Hi Nayyer,

Thanks for your reply.
Could you look at the original pdf file? Then, you could see some file content was disappeared. Sorry, i don’t make myself clear.The missed content is the word, not images.

gengzhi1109:
Hi Nayyer,

Thanks for your reply.
Could you look at the original pdf file? Then, you could see some file content was disappeared. Sorry, i don’t make myself clear.The missed content is the word, not images.

Hi Dalonggeng,

Thanks for sharing the details.

I have been trying to view the input 104kb.pdf file in Adobe Reader 11.0.3 but I am afraid I am still getting an error. However I will try viewing the document in other PDF viewer application, so that I can identify the issues appearing in resultant file. Nevertheless, I will keep you posted with my findings.

Hi Dalonggeng,


I have tried viewing PDF file in Foxit Reader 7.1 and it appears properly and I have managed to observe that when converting PDF to HTML, the text is truncated in resultant HTML (or the image appears on top of Text and hides file contents). Furthermore, I have also noticed that images in resultant HTML appear four times whereas input PDF has 2 occurrences of image. For
the sake of correction, I have logged it in our issue tracking system as
PDFNEWNET-38694. We
will investigate this issue in details and will keep you updated on the status
of a correction.

We apologize for your inconvenience.

Hi Dalonggeng,


Thanks for your patience. Our product team has investigate and concluded that we can not fix the document. It seems the document is not following PDF specification as Adobe Acrobat/Reader it self identify it corrupt. Please find attached screen shot.

We are sorry for the inconvenience caused.

Best Regards,