TextFragmentAbsorber.TextFragments not functioning properly

Sreevidya.Sukumaran · October 16, 2014, 5:33am

Hi,

We are using Aspose PDF for .Net evaluation version and we are facing few issues. We are trying to change the foreground color of text in a pdf using position and for some text the color is not changing properly. For those text, the TextFragments count is returning as 0. Can you please look into the below code and help us to solve the issue.

Document pdfDocument = new Document(dataDir + “Festive Schedule.pdf”);

List repository = new List();

LocationRepository locationRepository;

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@".", new TextSearchOptions(true));

for (int i = 1; i <= pdfDocument.Pages.Count; i++)

{

pdfDocument.Pages[i].Accept(textFragmentAbsorber);

//get the extracted text fragments

textFragmentCollection = textFragmentAbsorber.TextFragments;

foreach (TextFragment textFragment in textFragmentCollection)

{

foreach (TextSegment textSegment in textFragment.Segments)

{

locationRepository = new LocationRepository();

locationRepository.letter = textSegment.Text;

locationRepository.LLX = (float)textSegment.Rectangle.LLX;

locationRepository.LLY = (float)textSegment.Rectangle.LLY;

locationRepository.URX = (float)textSegment.Rectangle.URX;

locationRepository.URY = (float)textSegment.Rectangle.URY;

repository.Add(locationRepository);

}

for (int i = 0; i < repository.Count; i++)

{

TextFragmentAbsorber absorber = new TextFragmentAbsorber(repository[i].letter);

absorber.TextSearchOptions.LimitToPageBounds = true;

absorber.TextSearchOptions.Rectangle = new Aspose.Pdf.Rectangle((float)repository[i].LLX, (float)repository[i].LLY, (float)repository[i].URX, (float)repository[i].URY);

pdfDocument.Pages.Accept(absorber);

TextFragmentCollection textFragmentCollection = absorber.TextFragments;

foreach (TextFragment textFragment in textFragmentCollection)

{

foreach (TextSegment textSegment in textFragment.Segments)

{

textSegment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Green);

}

Regards,

Sreevidya

codewarior · October 16, 2014, 8:33am

Hi Sreevidya,

Thanks for contacting support.

The problem might be related to source/input PDF file so please share the resource file so that we can test the scenario at our end. We apologize for this inconvenience.

Sreevidya.Sukumaran · October 17, 2014, 12:07am

Hi

Thanks for responding. Attached the original and color changed PDF’s. In the color changed PDF, the changed text are colored in green.

Regards,

Sreevidya

codewarior · October 17, 2014, 5:15am

Hi Sreevidya,

Thanks for sharing the resource file.

I have tried replicating the issue using above sated code snippet but I am afraid there are some classes/variables which are not defined i.e. LocationRepository, because this object is used to keep location coordinates of each TextSegment. Can you please share some sample project so that we can test the scenario at our end. We are sorry for this inconvenience.

Sreevidya.Sukumaran · October 27, 2014, 12:47am

Hi,

Please find the code snippet for the required class file,

class LocationRepository

{

public double LLX { get; set; }

public double LLY { get; set; }

public double URX { get; set; }

public double URY { get; set; }

public int pageNum { get; set; }

public string letter { get; set; }

public int index { get; set; }

public string text { get; set; }

}

Regards,

Sreevidya

codewarior · October 27, 2014, 5:48am

Hi Sreevidya,

Thanks for sharing the details.

I
have tested the scenario and I am able to reproduce the same problem. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWNET-37686. We will
investigate this issue in details and will keep you updated on the status of a
correction. <o:p></o:p>

We apologize for your inconvenience.

Sreevidya.Sukumaran · October 29, 2014, 6:37am

Hi,

Is there any way to extract the text using the position. When can i expect solution for the above stated problem?

Regards,

Srevidya

codewarior · October 29, 2014, 6:56am

Sreevidya.Sukumaran:
Is there any way to extract the text using the position.

Hi Sreevidya,

Please visit the following link for required information on Extract Text from an particular page region

Sreevidya.Sukumaran:
When can i expect solution for the above stated problem?

As we recently have been able to notice this issue, and until or
unless we have investigated and have figured out the actual reasons of this
problem, we might not be able to share any timelines by which this problem will
be resolved.<o:p></o:p>

However, as soon as we have made some significant progress towards the resolution of this issue, we would be more than happy to update you with the status of correction. Please be patient and spare us little time. Your patience and comprehension is greatly appreciated in this regard.