Highlight Performance Issues

We are running into a major performance issue highlighting text in pdf documents. When the number of highlight strings in a multi page (3-4) document gets high (>200 text strings), then we see the highlight take close to a minute. I’m wondering if there’s a more efficient way for us to do the highlighting. I’ve looked at the sample (https://github.com/aspose-pdf/Aspose.Pdf-for-.NET/blob/6df7b36db669d0a80ae1a055552dfb69e17226d5/Examples/CSharp/AsposePDF/Text/HighlightCharacterInPDF.cs), however, that converts the document to an image and we need a pdf output.

I’ve done some performance analysis and identified 90% of the time is spent setting the TextFragment.TextStage.BackgroundColor property to some color.

Here is some psuedo code for how we highlight.

   foreach (DocumentConversionRequest.HighlightValues hOption in highlightObject)
    {
               
                string hText = @"(?i)" + hOption.Text + @"";
                System.Drawing.Color hColor = System.Drawing.ColorTranslator.FromHtml(hOption.Color);
                textAbsorder = new TextFragmentAbsorber(hText, new TextSearchOptions(true));
                TextFragmentCollection txtFragmentCollection = textAbsorder.TextFragments;
                pdfDocument.Pages.Accept(textAbsorder);

                foreach (TextFragment tf in txtFragmentCollection)
                {
                    // This appears to be really expensive
                    tf.TextState.BackgroundColor = pdfAlias.Color.FromRgb(hColor);
                }
            }

@ontargetjobs,

Kindly send the complete details of the scenario, including source PDF and code. We will investigate and share our findings with you.

@imran.rafique,
I have created a sample c# .net 4.6.1 project using Aspose.Pdf 17.4.0 and uploaded it to github.

https://github.com/cfbarbero/aspose-highlight

@ontargetjobs,

We managed to replicate the slow performance issue in our environment. We have logged an investigation under the ticket ID PDFNET-44051 in our bug tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates.

Any update on this issue?

@ontargetjobs,

The linked ticket ID PDFNET-44051 is pending for the analysis and not resolved yet. Our product team will investigate as per their development schedules. We will let you know once a significant progress has been made in this regard.

Besides this, we recommend our clients to post their critical issues (or already logged ticket IDs) in the paid support forum. Please refer to this helping link: Aspose support options

We have tested this using the latest version (18.8.0) and it appears the issue is still present. We will have to look to an alternative solution.

@dbeckman

Please note that the ticket has been logged under free support model where issues are resolved on first come first serve basis. The ticket could not be resolved in latest version due to previously logged pending issues in the queue. We will definitely let you know once we make any significant progress towards resolution of the issue.

Furthermore, you can also check paid support model where issues are resolved on priority basis. In case your issue is a blocker, you can please subscribe to paid support in order to get it resolved on urgent basis.