Issues with Pdf to Html Conversion

Hi


We have requirement to convert pdf document to html. We have used Aspose.pdf ddl (Ver 8.7) to evaluate the same. While converting few files we have found few issues with conversion.

Below is the list of issues that we have encountered
1. Italic font formatting is lost in the generated html. Even bold formatting is not getting applied.
2. Even if there are multiple fonts used in the pdf document, the generated html does not embed them. Instead by default it is taking Times New Roman
3. When pdf is converted to html it is generating svg files. For few letters in the svg files, we are seeing black background.

Can you guys let us know how we can rectify the above issues

Hi Gopi,


Can you please share the source PDF file so that we can test the scenario at our end. We are sorry for your inconvenience.

Hi Nayyer


I have uploaded a sample document. Please convert the document into html. In the converted document you can see that the fonts are not getting preserved. There are 4 fonts in the page. In converted html, we see all the fonts are relating to times new roman.Can you please check

Thanks
Gopi

Hi Gopi,


Thanks for sharing the resource file.

I have tested the scenario using Aspose.Pdf for .NET 8.8.0 where I have used the following code snippet and as per my observations, the contents of resultant HTML are properly appearing (with proper fonts). For your reference, I have also attached the resultant HTML generated over my end.

However I have observed that hyperlinks are removed in resultant HTML. For the sake of correction, I have logged this problem
as PDFNEWNET-36352 in our issue tracking system. We will further
look into the details of this problem and will keep you updated on the status
of correction. Please be patient and spare us little time. We are sorry for
this inconvenience.

Hi Nayyer,


We have tried with Aspose.Pdf for .NET 8.8.0 but still we are unable to get font-family, font-weight, font-style properly in some cases. It seems like Aspose dll is generating invalid styles in some scenarios.
Ex: font-family: “LUSUTP+TT250t00”; , font-family: “IPKUTR+Symbol”;,font-family: “SCFSPE+Times-Bold”;

I have one more question, Aspose dll is generating .svg files to preserve underlines for words, is there any chance to get it saved in styles?

Hi Surendra,


We are sorry for the inconvenience caused. Please share some more details about your requirement. We will appreciate if you please share your code sample along with input/output documents as well. It will help us to address your issue accurately.

Please feel free to contact us for any further assistance.

Best Regards,
surendra890:
I have one more question, Aspose dll is generating .svg files to preserve underlines for words, is there any chance to get it saved in styles?
Hi Surendra

I have tested the scenario and have observed that underline formatting information is being saved as .SVG file. However as per your requirement of saving this information in Style.css file, we have logged this requirement as PDFNEWNET-36388 in our issue tracking system. We will further look into the details of this requirement and will keep you posted on the status of correction. Please be patient and spare us little time.

The issues you have found earlier (filed as PDFNEWNET-36388) have been fixed in Aspose.Pdf for .NET 10.9.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

The issues you have found earlier (filed as PDFNET-36352) have been fixed in Aspose.PDF for .NET 19.12.