TextFragment Position

Hi,

This post is related to the post 638092. I want to draw rectangle on an image created by the conversion of PDF page. In the linked issue, the mentioned code works for many cases. However, in some cases the images are highlighted incorrectly.


I’ve attached the sample files and the c# code in order to reproduce the problem. When trying the code, please make sure to update the paths of the sample files and the licence (which is not included in the attachments) accordingly. I would be pleased if you could investigate the situation. Thank you in advance.

Best Regards

Hi Huseyin,


Thanks for contacting support.

I have tested the scenario and have managed to reproduce the issue that for the attached PDF files the images are highlighted incorrectly. For the sake of correction, I have logged a ticket PDFNET-42577 in our issue tracking system. We will further look into the details of this issue and will keep you updated on the status of its resolution within this forum thread. Please be patient and spare us little time.

We are sorry for this inconvenience.

Best Regards,

@huseyincandan

We have completed the investigation of the ticket and found no bugs. TextFragmentAbsorber finds TextFragments with correct rectangles. (See: 42577_fragments.pdf)

The problem is wrong coordinate transformations in your code:

resolution = 150 is senseless because GetNextImage(ms, System.Drawing.Imaging.ImageFormat.Png) returns image with it’s original size from PDF resources; therefore, scaling for resolution is senseless too. But the image in the PDF is scaled for MediaBox (same with page.CropBox for this document). So we need to scale the text position (inside CropBox) to image actual size. Please consider the following code:

Document doc = new Document(dataDir + "20170315_175013.pdf");

int resolution = 150;
using (MemoryStream ms = new MemoryStream())
{
    PdfConverter conv = new PdfConverter(doc);
    conv.Resolution = new Resolution(resolution, resolution);
    conv.StartPage = 1;
    conv.EndPage = 1;
    conv.GetNextImage(ms, System.Drawing.Imaging.ImageFormat.Png);
    System.Drawing.Image bmp = System.Drawing.Image.FromStream(ms);

    using (System.Drawing.Graphics gr = System.Drawing.Graphics.FromImage(bmp))
    {
        Page page = doc.Pages[1];
        float scaleY = (float)(bmp.Height / page.CropBox.Height);
        float scaleX = (float)(bmp.Width / page.CropBox.Width);
        gr.Transform = new System.Drawing.Drawing2D.Matrix(scaleX,
            0, 0, -scaleY, 0, bmp.Height);


        var textFragmentAbsorber = new TextFragmentAbsorber(
            "(?i)" + "HGS", new TextSearchOptions(true));
        page.Accept(textFragmentAbsorber);
        TextFragmentCollection textFragments =
            textFragmentAbsorber.TextFragments;
        foreach (TextFragment textFragment in textFragments)
        {
            System.Drawing.Pen pen = System.Drawing.Pens.Red;

            Position textFragmentPosition = textFragment.Position;
            Aspose.Pdf.Rectangle pageCropBox = page.CropBox;
            float x = Convert.ToSingle(
                textFragmentPosition.XIndent - pageCropBox.LLX);
            float y = Convert.ToSingle(
                textFragmentPosition.YIndent - pageCropBox.LLY);

            Aspose.Pdf.Rectangle textFragmentRectangle =
                textFragment.Rectangle;
            float width = Convert.ToSingle(
                textFragmentRectangle.Width);
            float height = Convert.ToSingle(
                textFragmentRectangle.Height);

            gr.DrawRectangle(pen, x, y, width, height);
        }
    }
    bmp.Save(Path.Combine(dataDir, "42577_out.png"));
} 

42577_fragments.pdf (160.8 KB)
42577_out.jpg (1.2 MB)