We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

PDF stream- save to HTML stream for further processing

I am trying to follow the documentation example on your site regarding saving a pdf to an html stream. I read in a pdf file, convert it to a Stream object, then try to load AsposePdf.Net Document object with that stream. Once I save the stream, how can I access the stream to load into HtmlAgilityPack? I have tried saving an output stream and then using that, but the output stream does not work, and I do not know how I can access the stream for further processing. The example on the site shows a delegate function, but I do not see where the output stream can be accessed after saving.

Any help on this issue would be greatly appreciated.

Thanks

Code below

public bool ExportPdfToHtml(Stream stream)
{
try
{
Document doc = new Document(stream);
string outHtmlFile = @"T:\test\test.html";
HtmlSaveOptions options = new HtmlSaveOptions();
options.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
options.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
options.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
options.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
doc.Save(outHtmlFile, options);
return true;
}
catch(PdfException pExc)
{
//ignore
}
return false;
}

private static void StrategyOfSavingHtml(HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo)
{
// Get target file name and write content to it
System.IO.BinaryReader reader = new BinaryReader(htmlSavingInfo.ContentStream);
byte[] htmlAsByte = reader.ReadBytes((int)htmlSavingInfo.ContentStream.Length);

// Here You can put code that will save page's HTML to some storage, f.e database
MemoryStream targetStream = new MemoryStream();
targetStream.Write(htmlAsByte, 0, htmlAsByte.Length);
}

Hi Don,


Thanks for your interest in Aspose.Pdf. Please check second section in following documentation link to save PDF to HTML output in a single stream with embedded resources. You can use the resultant stream for post processing. Hopefully it will help you to accomplish the task.


Please feel free to contact us for any further assistance.

Best Regards,