TextFragmentAbsorber extract incorrect coordinates

Hi,
I attached file. On the last page in the bottom of the page extracts incorrect coordinates. All pages coordinates extract very accurate but on the last(in the bottom area) I have problem. Thanks.

I use such code snippet.
var wordsList = new List();

// search all separate words using regular expression
var textFragmentAbsorber = new TextFragmentAbsorber(@"[^\s]+", new TextSearchOptions(true));
_pdfDocument.Pages[_pageNumber].Accept(textFragmentAbsorber);
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
for (int j = 1; j <= textFragmentCollection.Count; j++)
{
wordsList.Add(new TextItem
{
Height = (float)textFragmentCollection[j].Rectangle.Height * PageZoom,
Width = (float)textFragmentCollection[j].Rectangle.Width * PageZoom,
Top = _pageHeight * PageZoom - (float)textFragmentCollection[j].Position.YIndent * PageZoom - (float)textFragmentCollection[j].Rectangle.Height * PageZoom,
Left = (float)textFragmentCollection[j].Position.XIndent * PageZoom,
Text = textFragmentCollection[j].Text.Replace("–", "").Replace("_", "")
});
}
return wordsList;

Hi Samsen,


Thanks for contacting support.

I have tested the scenario using following code snippet and I am getting below specified output. Can you please share some further details which can help us in identifying the problem which you are facing.

[C#]

Document _pdfDocument = new Document(“c:/pdftest/N3A_R13.pdf”);<o:p></o:p>

// search all separate words using regular expression

var textFragmentAbsorber = new TextFragmentAbsorber(@"[^\s]+", new TextSearchOptions(true));

_pdfDocument.Pages[6].Accept(textFragmentAbsorber);

TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

for (int j = 1; j <= textFragmentCollection.Count; j++)

{

Console.WriteLine("=====================================");

Console.WriteLine(textFragmentCollection[j].Text);

// wordsList.Add(new TextItem

Console.WriteLine((float)textFragmentCollection[j].Rectangle.Height * 100);

Console.WriteLine((float)textFragmentCollection[j].Rectangle.Width * 100);

Console.WriteLine(_pdfDocument.Pages[6].Rect.Height * 100 - (float)textFragmentCollection[j].Position.YIndent * 100 - (float)textFragmentCollection[j].Rectangle.Height * 100);

Console.WriteLine((float)textFragmentCollection[j].Position.XIndent * 100);

Text = textFragmentCollection[j].Text.Replace("–", "").Replace("_", "");

}



Console output

=====================================
branch
660
1839.3
69624.6000766754
24166.92
=====================================
connections.
660
3374.94
69624.6000766754
26170.68
=====================================
For
660
903.48
69624.6000766754
29715.48
=====================================
other
660
1368.36
69624.6000766754
30785.22
=====================================
angles
660
1768.5
69624.6000766754
32313.84
=====================================
(e.g.
660
1193.16
69624.6000766754
34252.2
=====================================
laterals)
660
2137.32
69624.6000766754
35615.22
=====================================
consult
660
1900.44
69624.6000766754
37918.8
=====================================
Piping
660
1664.7
69624.6000766754
39989.1
=====================================
Engineering.
660
3371.82
69624.6000766754
41818.26

Hi Nayyer,

Please look at this attachment. And you will see that Y is incorrect in bottom text boxes.

Hi Samsen,


Thanks for sharing the details.

I have again tested the scenario using Aspose.Pdf for .NET 9.2.1 where I have used the following code to highlight each character over sixth page of PDF file and I am unable to notice any issue. For your reference, I have also attached the resultant image generated over my end.

[C#]

string inFile = “c:/pdftest/N3A_R13.pdf”;<o:p></o:p>

string outFileImg = "c:/pdftest/N3A_R13_resultant.png";

int resolution = 150;

Aspose.Pdf.Document temppdfDocument = new Aspose.Pdf.Document(inFile);

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document();

pdfDocument.Pages.Add(temppdfDocument.Pages[6]);

using (MemoryStream ms = new MemoryStream())

{

PdfConverter conv = new PdfConverter(pdfDocument);

conv.Resolution = new Resolution(resolution, resolution);

conv.GetNextImage(ms, System.Drawing.Imaging.ImageFormat.Png);

Bitmap bmp = (Bitmap)Bitmap.FromStream(ms);

using (System.Drawing.Graphics gr = System.Drawing.Graphics.FromImage(bmp))

{

float scale = resolution / 72f;

gr.Transform = new System.Drawing.Drawing2D.Matrix(scale, 0, 0, -scale, 0, bmp.Height);

// for (int i = 0; i < pdfDocument.Pages.Count; i++)

{

Page page = pdfDocument.Pages[1];

//create TextAbsorber object to find all words

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"[\S]+");

textFragmentAbsorber.TextSearchOptions.IsRegularExpressionUsed = true;

page.Accept(textFragmentAbsorber);

//get the extracted text fragments

TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

//loop through the fragments

foreach (TextFragment textFragment in textFragmentCollection)

{

// if (i == 0)

{

gr.DrawRectangle(

Pens.Yellow,

(float)textFragment.Position.XIndent,

(float)textFragment.Position.YIndent,

(float)textFragment.Rectangle.Width,

(float)textFragment.Rectangle.Height);

for (int segNum = 1; segNum <= textFragment.Segments.Count; segNum++)

{

TextSegment segment = textFragment.Segments[segNum];

for (int charNum = 1; charNum <= segment.Characters.Count; charNum++)

{

CharInfo characterInfo = segment.Characters[charNum];

Aspose.Pdf.Rectangle rect = page.GetPageRect(true);

Console.WriteLine("TextFragment = " + textFragment.Text + " Page URY = " + rect.URY +

" TextFragment URY = " + textFragment.Rectangle.URY);

gr.DrawRectangle(

Pens.Black,

(float)characterInfo.Rectangle.LLX,

(float)characterInfo.Rectangle.LLY,

(float)characterInfo.Rectangle.Width,

(float)characterInfo.Rectangle.Height);

}

gr.DrawRectangle(

Pens.Green,

(float)segment.Rectangle.LLX,

(float)segment.Rectangle.LLY,

(float)segment.Rectangle.Width,

(float)segment.Rectangle.Height);

}

}

}

}

}

bmp.Save(outFileImg, System.Drawing.Imaging.ImageFormat.Png);

}