TextFragmentAbsorber.TextFragments not functioning properly

Hi,

We are using Aspose PDF for .Net evaluation version and we are facing few issues. We are trying to change the foreground color of text in a pdf using position and for some text the color is not changing properly. For those text, the TextFragments count is returning as 0. Can you please look into the below code and help us to solve the issue.

Document pdfDocument = new Document(dataDir + “Festive Schedule.pdf”);
List repository = new List();
LocationRepository locationRepository;
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@".", new TextSearchOptions(true));
for (int i = 1; i <= pdfDocument.Pages.Count; i++)
{
pdfDocument.Pages[i].Accept(textFragmentAbsorber);
//get the extracted text fragments
textFragmentCollection = textFragmentAbsorber.TextFragments;
foreach (TextFragment textFragment in textFragmentCollection)
{
foreach (TextSegment textSegment in textFragment.Segments)
{
locationRepository = new LocationRepository();
locationRepository.letter = textSegment.Text;
locationRepository.LLX = (float)textSegment.Rectangle.LLX;
locationRepository.LLY = (float)textSegment.Rectangle.LLY;
locationRepository.URX = (float)textSegment.Rectangle.URX;
locationRepository.URY = (float)textSegment.Rectangle.URY;
repository.Add(locationRepository);
}
}

for (int i = 0; i < repository.Count; i++)
{

TextFragmentAbsorber absorber = new TextFragmentAbsorber(repository[i].letter);
absorber.TextSearchOptions.LimitToPageBounds = true;
absorber.TextSearchOptions.Rectangle = new Aspose.Pdf.Rectangle((float)repository[i].LLX, (float)repository[i].LLY, (float)repository[i].URX, (float)repository[i].URY);
pdfDocument.Pages.Accept(absorber);
TextFragmentCollection textFragmentCollection = absorber.TextFragments;
foreach (TextFragment textFragment in textFragmentCollection)
{
foreach (TextSegment textSegment in textFragment.Segments)
{
textSegment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Green);
}
}

Regards,
Sreevidya

Hi Sreevidya,


Thanks for contacting support.

The problem might be related to source/input PDF file so please share the resource file so that we can test the scenario at our end. We apologize for this inconvenience.

Hi


Thanks for responding. Attached the original and color changed PDF’s. In the color changed PDF, the changed text are colored in green.

Regards,
Sreevidya

Hi Sreevidya,


Thanks for sharing the resource file.

I have tried replicating the issue using above sated code snippet but I am afraid there are some classes/variables which are not defined i.e. LocationRepository, because this object is used to keep location coordinates of each TextSegment. Can you please share some sample project so that we can test the scenario at our end. We are sorry for this inconvenience.

Hi,


Please find the code snippet for the required class file,

class LocationRepository
{
public double LLX { get; set; }
public double LLY { get; set; }
public double URX { get; set; }
public double URY { get; set; }
public int pageNum { get; set; }
public string letter { get; set; }
public int index { get; set; }
public string text { get; set; }
}

Regards,
Sreevidya

Hi Sreevidya,


Thanks for sharing the details.

I
have tested the scenario and I am able to reproduce the same problem. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWNET-37686. We will
investigate this issue in details and will keep you updated on the status of a
correction. <o:p></o:p>

We apologize for your inconvenience.

Hi,


Is there any way to extract the text using the position. When can i expect solution for the above stated problem?

Regards,
Srevidya

Sreevidya.Sukumaran:
Is there any way to extract the text using the position.
Hi Sreevidya,

Please visit the following link for required information on Extract Text from an particular page region
Sreevidya.Sukumaran:
When can i expect solution for the above stated problem?

As we recently have been able to notice this issue, and until or
unless we have investigated and have figured out the actual reasons of this
problem, we might not be able to share any timelines by which this problem will
be resolved.
<o:p></o:p>

However, as soon as we have made some significant progress towards the resolution of this issue, we would be more than happy to update you with the status of correction. Please be patient and spare us little time. Your patience and comprehension is greatly appreciated in this regard.