We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Replace function in aspose.pdf takes too much time

We are using aspose.pdf to replace some information in pdf file but it takes too much time when it calls the replace function.


We have implemented this function in asp.net webserivce and we call it from a .aspx page but it takes so much time that our web application times out. Remember pdf file size is almost 166 KB.

Here is the code we are using.

 public byte[] ExtractConfidentialInfoFromPdfFileV3(byte[] fileBytes)
{
byte[] result = null;
try
{
//create PdfContentEditor object
Aspose.Pdf.Facades.PdfContentEditor contentEditor = new Aspose.Pdf.Facades.PdfContentEditor();
            <span style="color:blue;">using</span> (<span style="color:#2b91af;">MemoryStream</span> memoryStream = <span style="color:blue;">new</span> <span style="color:#2b91af;">MemoryStream</span>(fileBytes))
            {
                <span style="color:green;">//bind input PDF file</span>
                contentEditor.BindPdf(memoryStream);

                <span style="color:green;">//make sure the regular expression strategy is being used</span>
                contentEditor.ReplaceTextStrategy.IsRegularExpressionUsed = <span style="color:blue;">true</span>;

                <span style="color:green;">//specify that you want to replace all the matching strings</span>
                <span style="color:green;">//by default only the first string will be replaced</span>
                contentEditor.ReplaceTextStrategy.ReplaceScope = Aspose.Pdf.Facades.<span style="color:#2b91af;">ReplaceTextStrategy</span>.<span style="color:#2b91af;">Scope</span>.REPLACE_ALL;
                <span style="color:green;">string[] termsToExtract = new string[] { "Address", "Domicile", "NIC", "National ID", "Father", "Street", "Colony", "House", "Apartment", "Flat", "Floor" };</span>
                <span style="color:blue;">if</span> (termsToExtract != <span style="color:blue;">null</span> && termsToExtract.Length > 0)
                {
                    <span style="color:blue;">foreach</span> (<span style="color:blue;">string</span> term <span style="color:blue;">in</span> termsToExtract)
                    {
                        
                        <span style="color:green;">contentEditor.ReplaceText(@"[\w\d\t0-9 :-]*(?i)" + term + @"[\w\d\t0-9 :-]*", "");</span>
                    }
                }
                <span style="color:blue;">using</span> (<span style="color:#2b91af;">MemoryStream</span> outStream = <span style="color:blue;">new</span> <span style="color:#2b91af;">MemoryStream</span>())
                {
                    contentEditor.Save(outStream);
                    result = outStream.ToArray();
                }
            }
        }
        <span style="color:blue;">catch</span> (<span style="color:#2b91af;">Exception</span> ex)
        {
            result = <span style="color:blue;">null</span>;
        }
        <span style="color:blue;">return</span> result;
    }</pre></div>

Hi Abdul,

Thanks for your interest in our products.

Can you please share some details regarding template documents you are using or create a sample application to show the issue. This will help us to figure out the issue and reply back to you soon.

We apologize for your inconvenience.

Thanks & Regards,

Thanks for the reply.


Actually we have a pdf file, we want to extract some info from it.

A sample website is attached with this comment.

Note:
  1. I did not include aspose.pdf.dll with the sample
  2. I did not include license files with the sample
  3. There is a pdf file (SampleFile.pdf) at attached demo’s root, this is the sample file we want to extract/replace information from
  4. Please run the default page
  5. Browse to the above mentioned SampleFile.pdf file
  6. Click on "Upload and Extract from Pdf" button
  7. It will start conversion, it converts but takes too much time
  8. Currently I have set the debug=true in web.config, that's why it takes time but extract the info, but if we set the debug=false, after some time request times out.
Our major concern is that how the replace can be made more efficient.

Hi Abdul,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

I tested your sample application with the latest version of Aspose.Pdf for .NET v6.8 and I did not get any timeout exception (when setting the debug = false" and it took around 8 ~ 9 seconds to complete the whole process. Please download and try the latest version and share your results with us. This will help us in optimizing the process further.

Sorry for the inconvenience,

Thanks
Nausherwan,


I downloaded the latest version, now performance is very good, otherwise it took a lot of time when we used an old version aspose.pdf.dll.

Hi Abdul,

Thank you for your feedback.

We are glad to know that your issue got resolved with the latest version of Aspose.Pdf for .NET. Please feel free to contact support in case you have any other query.

Thank You & Best Regards,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />