Detailed description of bug fix PDFNET-59790

Hi,

The bug fix in question is summarised thus “The TextFragment.IsolateTextSegments method shifts the isolated segments of text”, but I could not find a detailed description of the issue.

Could you give me additional information?

@Olivier.G

Could you please specify what additional information you are looking for regarding the bug fix PDFNET-59790?

I would need to know what has been fixed in said function and how it affects its output

@Olivier.G

One of our customers was using TextFragment.IsolateTextSegments(int startIndex, int length) method to extract and format specific segments (e.g., applying color or underline). However, the method appeared to return incorrect text in some cases.

For example, for the text APN/Parcel ID(s): 427-172-02, 427-172-05, 427-172-06 and 427-172-03, the operation IsolateTextSegments(startIndex: 18, length: 10) returned:

27-172-02,

Instead of expected:

427-172-02

The issue was not with the number of segments returned, but with the extracted content starting one character late.

Now, the issue has been fixed in the latest release of the API and expected output has been obtained.

1 Like

The new implementation of IsolateTextSegments appears to be faulty, as the following .NET 8 unit test using TEST-A.pdf (45.1 KB) illustrates.

[TestMethod]
[DataRow(true)]
[DataRow(false)]
public void ExtractTextFragment(bool extractFirstFragment)
{
    using Document doc = new("TEST-A.pdf");

    TextFragmentAbsorber tfa = new();
    doc.Pages[1].Accept(tfa);

    TextFragment fragment = tfa.TextFragments[1];

    if (extractFirstFragment)
    {
        Assert.AreEqual("DIRECTION", fragment.IsolateTextSegments(0, 9)[1].Text);
        Assert.AreEqual(" GÉNÉRAL", fragment.IsolateTextSegments(10, 8)[1].Text); 
        // How come we get the same result with a different start index?
        Assert.AreEqual(" GÉNÉRAL", fragment.IsolateTextSegments(09, 8)[1].Text); 
    }
    else
    {
        // How come we get the same result as with Aspose 25.4?
        Assert.AreEqual("GÉNÉRALE", fragment.IsolateTextSegments(10, 8)[1].Text);
    }
}

@Olivier.G

The issue can be document specific or related to specific kind of PDF files. We have generated a new ticket as PDFNET-60028 in our issue management system dedicated to your specific file. We will look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

1 Like

The problem could be document specific but in any case, I think you’ll agree that IsolateTextSegments should be idempotent.

Would I be the only one so far to have reported this issue?

@Olivier.G

Thanks for sharing your feedback. We have updated the ticket information and will surely perform investigation from this perspective. We will inform you as soon as we make some progress towards ticket resolution. We are sorry for the inconvenience.

The ticket PDFNET-60028 you created has the status “Resolved” but I couldn’t find it referenced in the June release notes. I suppose the fix will come along with the July release?

@Olivier.G

Yes, the ticket has been resolved and its fix will be included in the July Release i.e. 25.7. You will receive a notification as soon as the release is published.

1 Like

The issues you have found earlier (filed as PDFNET-60028) have been fixed in Aspose.PDF for .NET 25.7.