Performance issues replacing text

Ethofy · March 12, 2015, 10:16am

Hello.
We are trying to replace some text in a couple of PDF files but operations over TextFragment objects take a considerable amount of time - almost 10 times more than with similar documents.
These files have been saved as PDF from PowerPoint.
There’s another file similar to the ones described above, which has been saved in the same manner, but doesn’t show any delay.

We’ve tested using latest version, 10.2.0, but there was no difference.

Are there any issues with how the files are generated?
Attached you will find the PDF files and some code sample to reproduce the issue.

Thanks.

foreach (Page page in ((Document)pdfDoc).Pages)

{

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("{User.FirstName}", new TextSearchOptions(true));

page.Accept(textFragmentAbsorber);

TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

foreach (TextFragment textFragment in textFragmentCollection)

{

textFragment.Text = “sample text”;

}

tilal.ahmad · March 13, 2015, 2:01am

Hi Dan,

Thanks for your inquiry. We have noticed the reported performance issue with your shared PDF documents. So logged a ticket PDFNEWNET-38361 in our issue tracking system for further investigation and resolution. We will keep you updated about the issue resolution progress.

We are sorry for the inconvenience caused.

Best Regards,

miketong · July 12, 2017, 11:17am

Was this ever resolved? I have the same problem but have worked around it but it’s a bit of a hack.

My requirement is to match tokens within a PDF and replace them with values from a dictionary of data. The tokens include a key, and index and some other data. I use the TextFragmentAbsorber with a regEx to find all my tokens I need to replace.

If I replace them by modifying the TextFragment.Text property of each match, the process is unusable as the performance is so bad (around 1 minutes to replace around 50 tokens). The searching part is fine speed wise. If I don’t modify the document at all, it’s very fast. As soon as I modify the document, the performance tanks.

So my work around is to use the PdfContentEditor to do the actual replace of the token found by the regEx. This performs over 10x better than modifying the actual objects returned by the search. So basically I have to search the document twice just to get usable performance. If the PdfContentEditor supported a regEx replace mode I could just use that, but as it is I’m forced to use both.

There is clearly something going wrong with using the TextFragmentAbsorber to modify documents as the search performance is fine. I got a trial licence of the latest Aspose.pdf to see if the problem was still around, and it was exactly the same.

codewarior · July 13, 2017, 1:02pm

@miketong,

Thanks for contacting support.

I am afraid the earlier reported issue is not yet resolved. But I have intimated the product team to try evaluating this problem and share the possible resolution. Furthermore, please note that performance related issues are related to structure and complexity of input document, so we request you to please share the input PDF file and code snippet, so that we can test the scenario in our environment. We are sorry for your inconvenience.