Convert HTML to PDF in C# using Aspose.PDF | slow processing on large files and eats a lot of RAM

AntonIonov · May 27, 2021, 11:35am

Hi, we are using the latest ASPOSE library for C # Net Core. We need to convert small and large html files to pdf with the addition of stamps. Unfortunately, when converting large files, a lot of RAM is eaten up, and this is quite slow. I think the whole file is kept in memory as an aspose structure. Is it possible to do streaming conversion? Without holding the entire file structure in memory?

public static byte[] ConvertHtmlToPdf(string html, Stream header = null, string htmlStamp = null)
{
try
{
var htmlOptions = new HtmlLoadOptions();
htmlOptions.PageInfo.IsLandscape = true;
htmlOptions.PageInfo.Width = PageSize.A4.Height * 1.2;
htmlOptions.PageInfo.Height = PageSize.A4.Width * 1.2;
htmlOptions.PageInfo.Margin = new MarginInfo(35, 35, 35, 80);
var document = new Document(new MemoryStream(Encoding.UTF8.GetBytes(html)), htmlOptions);
var pageNumberStamp = new PageNumberStamp
{
Background = false,
Format = "Страница №# из " + document.Pages.Count,
TopMargin = 40.0,
RightMargin = 60.0,
HorizontalAlignment = HorizontalAlignment.Right,
VerticalAlignment = VerticalAlignment.Top,
StartingNumber = 1
};
pageNumberStamp.TextState.Font = FontRepository.FindFont(“Arial”);
pageNumberStamp.TextState.FontSize = 9.0F;
pageNumberStamp.TextState.ForegroundColor = Color.Black;
foreach (var t in document.Pages)
{
t.AddStamp(pageNumberStamp);
}
            using var ms = new MemoryStream();
            document.Save(ms);

            return ms.ToArray();
        }
        catch (Exception e)
        {
            BackgroundWork.LogException(e, "ConvertToPdf ConvertHtmlToPdf");
        }

        return new byte[0];
    }

asad.ali · May 27, 2021, 5:01pm

@AntonIonov

We will surely investigate the issue and try to improve the API performance. Could you please share any sample large HTML file with which we can perform the test and observe the memory consumption issue at our end?

AntonIonov · June 2, 2021, 8:21pm

@asad.ali
390200-01_210521_3547171_auto.zip (278.2 KB)

asad.ali · June 3, 2021, 6:04pm

@AntonIonov

We were able to notice the memory consumption in our environment during testing with 21.5 version of the API. Therefore, have logged an issue as PDFNET-50012 in our issue tracking system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

AntonIonov · July 14, 2021, 9:32am

have some news?

asad.ali · July 14, 2021, 4:08pm

@AntonIonov

We are afraid that the ticket has not been resolved yet. It will surely be analyzed and fixed on first come first serve basis as per the policy of free support model. We will surely inform you as soon as we make some significant progress towards its resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.

AntonIonov · October 19, 2021, 9:41am

some news now?

asad.ali · October 19, 2021, 8:14pm

@AntonIonov

We are afraid that the earlier logged ticket could not get fully investigated due to other issues in the queue. We will let you know as soon as we have additional updates regarding its analysis or resolution. Please spare us some time.

We apologize for your inconvenience.