Unable to extract PDF content as HTML

DamodarMahadevan · October 24, 2011, 6:59am

Hi,

I am using the aspose pdf kit version 3.5.0.0

I have a requirement where in, clicking on a pdf document, i would have to extract the contents as html and display it in the div.

//getting the uploaded file content as a stream
Stream stream = uploadedFile.InputStream;
PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(stream);
extractor.ExtractTextMode = 1;
extractor.ExtractText();
string fileText;

using (var memoryStream = new MemoryStream())
{
using (var sr = new StreamReader(memoryStream))
{
extractor.GetText(memoryStream);
memoryStream.Position = 0;
fileText = sr.ReadToEnd();
}
}
return fileText;

All i get in the fileText is just the text content in the PDF, but i would like to see the exact formatted text as it was in the PDF. Searched the forums on this, could not find anything relevant

Please let me know on how to achieve this.

Thanks and regards,
Damodar

shahzadlatif · October 25, 2011, 4:34am

Hi Damodar,

I’m sorry to share with you that currently Aspose.Pdf for .NET doesn’t allow you to convert or extract PDF contents to HTML format. We have already logged a new feature request as PDFNEWNET-13729 in our issue tracking system to provide support for this feature. You’ll be updated via this forum thread once it is supported in future.

We’re sorry for the inconvenience.
Regards,

aspose.notifier · November 5, 2011, 8:36am

The issues you have found earlier (filed as 13729) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(16)