Emptying TextFragment.Text throws an exception

KDSDEV · June 1, 2018, 7:06am

Following code throws exception when I input this file 201752919911.pdf (526.8 KB) using Aspose.Pdf for .NET ver 18.5.0.
Please take a look at the log.txt appears in the code below. 201752919911_log.zip (494 Bytes)

I presume this error is relevant to this topic.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

using Aspose;
using Aspose.Pdf;
using Aspose.Pdf.Text;

namespace pdf
{
    class Program
    {
        static void Main(string[] args)
        {
            License license = new License();
            license.SetLicense("Aspose.Pdf.lic");
            string pdffile = "201752919911.pdf";
            Document doc = new Document(pdffile);
            TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(".+");
            textFragmentAbsorber.TextSearchOptions = new TextSearchOptions(true);
            doc.Pages.Accept(textFragmentAbsorber);
            TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
            foreach (TextFragment textFragment in textFragmentCollection)
            {
                try
                {
                    textFragment.Text = string.Empty;
                }
                catch (Exception ex)
                {
                    StreamWriter stream = new StreamWriter(pdffile.Replace(".pdf", "_log.txt"));
                    stream.WriteLine("[" + DateTime.Now.ToString() + "]");
                    stream.WriteLine("[message]\r\n " + ex.Message);
                    stream.WriteLine("[source]\r\n " + ex.Source);
                    stream.WriteLine("[stacktrace]\r\n" + ex.StackTrace);
                    stream.Close();
                }
            }
        }
    }
}

Thank you in advance.

Farhan.Raza · June 1, 2018, 12:01pm

@KDSSHO

Thank you for contacting support.

We have worked with the data shared by you and have been able to reproduce the issue in our environment. A ticket with ID PDFNET-44813 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

Farhan.Raza · June 4, 2018, 9:04am

@KDSSHO

We have further investigated the ticket PDFNET-44813 in our environment and would like to share an even better approach that does not reproduce the exception and works faster. It works with page operators instead of TextFragmentAbsorber, and removes all text from a PDF page as per your requirements. Please try using below code in your environment and then share your kind feedback with us:

Document pdfDocument = new Document(myDir + "201752919911.pdf");

for (int i = 1; i <= pdfDocument.Pages.Count; i++)
{
    // Remove text showing operators
    Page page = pdfDocument.Pages[i];
    OperatorSelector operatorSelector = new OperatorSelector(new Operator.TextShowOperator());

    System.Collections.ArrayList list = new System.Collections.ArrayList();

    page.Contents.Accept(operatorSelector);
    list.AddRange(operatorSelector.Selected);

    page.Contents.Delete(list);

    // Remove FreeText annotations if present
    System.Collections.Generic.List<int> annotationIndices = new System.Collections.Generic.List<int>();

    for (int j = 1; j <= page.Annotations.Count; j++)
    {
        if (page.Annotations[j] is Aspose.Pdf.Annotations.FreeTextAnnotation)
            annotationIndices.Add(j);
    }

    foreach (int index in annotationIndices)
    {
        page.Annotations.Delete(index);
    }
}

pdfDocument.Save(myDir + "201752919911_op_removed.pdf", Aspose.Pdf.SaveFormat.Pdf);

We hope this will be helpful. Please feel free to contact us if you need any further assistance.

KDSDEV · June 4, 2018, 9:46am

Hi Farhan,

I will. However;

Removing is a part of my requirement. My ultimate goal is to replace texts as follows.

As a part of the sequence for my translation app I found that I should remove the existing texts.
I should have shared the background of my inquiry earlier.
Anyways I’ll be back ASAP!

Farhan.Raza · June 4, 2018, 5:58pm

@KDSSHO

In case suggested workaround is not suitable for your scenario, we are afraid that you have to wait until the ticket PDFNET-44813 is resolved. We will notify you as soon as it will be fixed.

aspose.notifier · October 5, 2018, 8:29pm

The issues you have found earlier (filed as PDFNET-44813) have been fixed in Aspose.PDF for .NET 18.10.