Hi,
I am using the following code to convert PDF’s to HTML, the process works extremely well most of the time, however, for large PDF’s (around 30mb) the process never completes. I left the code running for 16 hours before aborting, the process was taking over 4.5gb of memory before aborting.
/// <summary>
/// Converts supplied byte array to HTML
/// </summary>
private byte[] ConvertPDFToHTML(byte[] fileBytes)
{
byteArray = null;
try
{
Document doc = new Document(new MemoryStream(fileBytes));
HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.FixedLayout = true;
saveOptions.SplitIntoPages = false;
saveOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UsePixelUnitsInCssLetterSpacingForIE;
saveOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
saveOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
saveOptions.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy(StrategyOfSavingHtml);
doc.Save("dummy", saveOptions);
return byteArray;
}
catch (Exception e)
{
Common.WriteErrorLog("PDFConverter", "ProcessControl.ConvertPDFToHTML(byte[]) failed with error " + e.ToString());
return null;
}
}
/// <summary>
/// Used by AsposePDF saveOptions
/// </summary>
private void StrategyOfSavingHtml(HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo)
{
// extract byte array of HTML document
System.IO.BinaryReader reader = new BinaryReader(htmlSavingInfo.ContentStream);
byteArray = reader.ReadBytes((int)htmlSavingInfo.ContentStream.Length);
}
Can you please advise what may be going wrong?