Hi there,
We are using Aspose.PDF for .NET version 20.3.0 and the following code:
var textFragmentAbsorber = new TextFragmentAbsorber("Page ##c# of ##t#")
{
TextSearchOptions = {LimitToPageBounds = true}
};
document.Pages.Accept(textFragmentAbsorber);
var textFragmentCollection = textFragmentAbsorber.TextFragments;
foreach (var textFragment in textFragmentCollection)
{
if (textFragment.Page == null)
continue;
textFragment.Text =
textFragment.Text
.Replace("##c#", $"{textFragment.Page.Number}")
.Replace("##t#", $"{document.Pages.Count}")
.PadLeft("Page ##c# of ##t#".Length, ' ');
textFragment.TextState.HorizontalAlignment = HorizontalAlignment.Right;
}
to replace the page counts custom marker (current page and total number of pages) in the header of a PDF.
This is a simplified version of a more generic approach where PDF parts (including this one) is merged into a bigger PDF so the greater goal is to prepare the custom page counters marker (Page ##c# of ##t#) is all the parts and then use the TextFragmentAbsorber to replace it accordingly.
The problems we have with this approach is that:
- it takes roughly around 20 seconds to run on the attached input.xls.zip (2.8 MB) file.
- the memory usage increases to 3Gb while this process runs
We took the approach of using the TextFragmentAbsorber at the page level using the code:
var textFragmentAbsorber = new TextFragmentAbsorber(pageCountsPhrase)
{
TextSearchOptions = {LimitToPageBounds = true}
};
foreach (var page in document.Pages)
{
page.Accept(textFragmentAbsorber);
var textFragmentCollection = textFragmentAbsorber.TextFragments;
foreach (var textFragment in textFragmentCollection)
{
if (textFragment.Page == null)
continue;
textFragment.Text =
textFragment.Text
.Replace(currentPagePlaceholder, $"{textFragment.Page.Number}")
.Replace(countPagesPlaceholder, $"{document.Pages.Count}")
.PadLeft(pageCountsPhrase.Length, ' ');
textFragment.TextState.HorizontalAlignment = HorizontalAlignment.Right;
}
}
and alleviates the problem with the memory consumption but it doubles the execution time…
Taking the approach of using $p and $P is something that we tried just to find out that:
- it is equally time consuming
- preparing a PDF for applying the header (the input we’ve send you is the output of that process) requires saving the document and this is the time when the $p and $P are executed. Maybe we could delay that until the final PDF is built-up?
We would really appreciate leads on accomplishing the replace faster and with less memory consumption.
Best regards.