PDF to HTML conversion quality is very poor!

Hello,

I’m trying to convert a PDF document to HTML and each time the HTML comes out a garbled mess . Conversion quality is very poor. As for example

1. Text ‘welcome’ in the input PDF get converted as 'we l c o me’

2. Original PDF have hyper-linked text which get converted into blue color without any destination address.

3.Image conversion quality is very poor.

For example I am attaching original PDF file and converted HTML. Kindly advise if anything need to be modified in conversion code? Here is my code snippet I’m trying to convert. Thank you very much for your support.

 string strSourcePath = @“D:\Test\HyperLinkTest.pdf”;
PdfContentEditor editor = new PdfContentEditor();
editor.BindPdf(strSourcePath);
editor.Document.Save(@“D:\Test\HyperLinkTest.html”, Aspose.Pdf.SaveFormat.Html);

Thank
Hirak
<!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:DontVertAlignCellWithSp/>
<w:DontBreakConstrainedForcedTables/>
<w:DontVertAlignInTxbx/>
<w:Word11KerningPairs/>
<w:CachedColBalance/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val="–"/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–><!–[if gte mso 10]> /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

<![endif]–>

Hirak Dutta:
1. Text ‘welcome’ in the input PDF get converted as 'we l c o me’

Hi Hirak,

Thanks for contacting support.

I have tested the scenario using Aspose.Pdf for .NET 8.7.0 where I have used the following code snippet and I am unable to notice any issue.

[C#]

Document doc = new
Document(@“C:\pdftest\HyperLinkTest\HyperLinkTest.pdf”);<o:p></o:p>

doc.Save(@“C:\pdftest\HyperLinkTest\HyperLinkTest_new.html”,
SaveFormat.Html);


Hirak Dutta:
2. Original PDF have hyper-linked text which get converted into blue color without any destination address.

I have tested the scenario and I am able to
notice the same problem. For the sake of correction, I have logged this problem
as PDFNEWNET-36181 in our issue tracking system.


Hirak Dutta:
3.Image conversion quality is very poor.
This problem does not seem to occurring when tested with above code snippet.

Furthermore, I have observed that quality of text is not good in resultant HTML file. For the sake of correction, I have logged this problem as PDFNEWNET-36180 in our issue tracking system. We will further
look into the details of these problems and will keep you updated on the status
of correction. Please be patient and spare us little time. We are sorry for
this inconvenience.

Hi Nayyer,
When these issues will be fixed?

Hi Hirak,


Thanks for your feedback. I am afraid we have recently noticed the issues and these are still pending for investigation in queue with other priority tasks. As soon as investigation of these issues is completed then we will be in a good position to share ETA. We will keep you updated about the issues resolution progress via this forum thread.

We are sorry for the inconvenience caused.

Best Regards,

The issues you have found earlier (filed as PDFNEWNET-36181) have been fixed in Aspose.Pdf for .NET 9.9.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.
(5)