We are using below code to replace string with blank in PDF file. We are facing issue of taking more memory. mainly doc.Pages.Accept(absorber). Do you have any other option with minimum memory. We are using 21.12 version.
string pattern = “HORIZON/WINDOW|Revolutions start|ULAGE”;
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
var textSearchOptions = new TextSearchOptions(true);
TextFragmentAbsorber absorber = new TextFragmentAbsorber(regex);
absorber.TextSearchOptions = textSearchOptions;
absorber.TextReplaceOptions = new TextReplaceOptions(TextReplaceOptions.ReplaceAdjustment.None);
doc.Pages.Accept(absorber);
TextFragmentCollection textFragmentCollection = absorber.TextFragments;
foreach (TextFragment textFragment in textFragmentCollection)
{
textFragment.Text = string.Empty;
}
@cyginfo
Could you please share the sample PDF document as well for our reference? We will test the scenario in our environment and address it accordingly.
I will arrange document to be sent to you by tomorrow as dev team left for day. Just to add here, we believe this issue is not for particular document but for any type of document.
@cyginfo
In order to prevent high memory consumption, you can search and get the text on page level like below:
foreach(Page page in doc.Pages)
{
page.Accept(absorber);
}
However, if it still does not help, please share a sample file for our reference so that we can further test the scenario in our environment and address it accordingly.
Please download document(20 MB.pdf) from below google drive links,
https://drive.google.com/file/d/15VU36fVI2SQcbkGnDp3aftst8wbIdvsd/view?usp=sharing
Let me know if you face any difficulty to download file.
@cyginfo
Are you sure that the regular expression you shared with us is able to extract the text from this PDF? We tested it in our environment and it was not finding any text. Furthermore, we could not notice the memory consumption issue while testing using below code and 22.3 version of the API:
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(dataDir + @"20 MB.pdf");
string pattern = "HORIZON/WINDOW|Revolutions start|ULAGE";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
foreach (var page in pdfDocument.Pages)
{
var textSearchOptions = new TextSearchOptions(true);
TextFragmentAbsorber absorber = new TextFragmentAbsorber(regex);
absorber.TextSearchOptions = textSearchOptions;
absorber.TextReplaceOptions = new TextReplaceOptions(TextReplaceOptions.ReplaceAdjustment.None);
page.Accept(absorber);
TextFragmentCollection textFragmentCollection = absorber.TextFragments;
foreach (TextFragment textFragment in textFragmentCollection)
{
textFragment.Text = string.Empty;
}
}