Adding a new HtmlFragment at same coordinates as an existing MarkupParagraph

Hello,

I’m evaluating the Aspose.PDF .NET library. I’m trying to update an existing PDF by:

  1. Locating the first Paragraph in the pdf that contains a particular text string (e.g., “tite-text-to-replace”)
  2. Replacing that Paragraph with a small bit of HTML (e.g., <h2>Here is my title</h2>)

The problem I’m running into is that the Aspose PDF API treats HtmlFragments differently than TextFragments, especially when it comes to positioning and coordinates.

I was able to create a method which locates the Paragraph using a ParagraphAbsorber:

private MarkupParagraph GetFirstParagraphWithText(string textToFind)
{
    ParagraphAbsorber absorber = new ParagraphAbsorber();
    absorber.Visit(_doc);

    foreach (PageMarkup markup in absorber.PageMarkups)
    {
        foreach (MarkupSection section in markup.Sections)
        {
            foreach (MarkupParagraph paragraph in section.Paragraphs)
            {
                foreach (List<TextFragment> line in paragraph.Lines)
                {
                    foreach (TextFragment fragment in line)
                    {
                        if (fragment.Text.Contains(textToFind))
                        {
                            return paragraph;
                        }
                    }
                }
            }
        }
    }
    return null;
}

I then “replaced” that paragraph with my Html:

public void ReplaceTextWithHtml(string textToFind, string html)
{
    MarkupParagraph markupParagraph = GetFirstParagraphWithText(textToFind);

    HtmlFragment htmlFragment = new HtmlFragment(html);

    // get reference of first fragment in the paragraph
    TextFragment textFragment = markupParagraph.Fragments[0];
    Page page = textFragment.Page;

    Rectangle fragmentRect = (Rectangle)textFragment.Rectangle.Clone();

    // Specify margins in order to position the new htmlFragment
    // Determine X,Y coords based on the first fragment in the paragraph that we're replacing
    MarginInfo info = new MarginInfo(
        fragmentRect.LLX - page.Rect.LLX,
        fragmentRect.LLY - page.Rect.LLY,
        page.Rect.URX - fragmentRect.URX,
        page.Rect.URY - fragmentRect.URY);

    htmlFragment.Margin = info;

    // blank out the entire paragraph
    foreach (TextFragment fragment in markupParagraph.Fragments)
    {
        fragment.Text = string.Empty;
    }
    page.Paragraphs.Add(htmlFragment);

    return;
}

Unfortunately, the positioning of my new HtmlFragment is not working properly.
I don’t understand why, but the HtmlFragment positioning is based on Margins, whereas the MarkupParagraph is based on absolute position.

I also tried incorporating page.PageInfo.Margin into the margin calculations, but that was incorrect:

MarginInfo info = new MarginInfo(
    fragmentRect.LLX - page.Rect.LLX - page.PageInfo.Margin.Left,
    fragmentRect.LLY - page.Rect.LLY - page.PageInfo.Margin.Top,
    page.Rect.URX - fragmentRect.URX - page.PageInfo.Margin.Right,
    page.Rect.URY - fragmentRect.URY - page.PageInfo.Margin.Bottom);

How can I add a new HtmlFragment at the same coordinates as an existing MarkupParagraph?

Thanks,
Mike

@mlinnetz

Would you kindly share a sample source PDF document for our reference as well? We will test the scenario in our environment using your code snippet and address it accordingly.

Thank you for your reply, Asad.

Attached are sample input and output files for the following call:

ReplaceTextWithHtml("denied access to investigations", "<h2>hello world</h2>");

input.pdf (46.8 KB)
output.pdf (72.5 KB)

You can see the incorrect positioning of my new HtmlElement in the output. Per above, this is what i’m using to create margins for the new HtmlFragment:

    MarginInfo info = new MarginInfo(
        fragmentRect.LLX - page.Rect.LLX,
        fragmentRect.LLY - page.Rect.LLY,
        page.Rect.URX - fragmentRect.URX,
        page.Rect.URY - fragmentRect.URY);

I took the time to debug and write out the different object coordinates to a log file:

fragmentRect: 72.024,481.78000000953676,121.074719830513,493.92399996757507
page.Rect: 0,0,792,612
info (MarginInfo): Left: 72.024, Bottom: 481.78000000953676, Right: 670.925280169487, Top: 118.07600003242493

Thanks,
Mike

@mlinnetz

We have modified your ReplaceTextWithHtml() method a little bit and used floating box to place it at the position of found text which you need to replace. Please check the below code and let us know in case it satisfies your needs:

Document doc = new Document(dataDir + "input.pdf");
MarkupParagraph markupParagraph = GetFirstParagraphWithText("denied access to investigations", doc);

HtmlFragment htmlFragment = new HtmlFragment(@"<h2>Hello World!</h2>");

// get reference of first fragment in the paragraph
TextFragment textFragment = markupParagraph.Fragments[0];
Page page = textFragment.Page;

Rectangle pageRect = textFragment.Page.Rect;
Rectangle fragmentRect = textFragment.Rectangle;
MarginInfo marginInfo = textFragment.Page.PageInfo.Margin;

FloatingBox box = new FloatingBox();
box.Left = fragmentRect.LLX;
box.Top = pageRect.Height - fragmentRect.URY - marginInfo.Top;
box.Width = textFragment.Rectangle.Width;// You can increase it in order to fit the HTML Content
box.Height = textFragment.Rectangle.Height;

// blank out the entire paragraph
foreach (TextFragment fragment in markupParagraph.Fragments)
{
 fragment.Text = string.Empty;
}
box.Paragraphs.Add(htmlFragment);
page.Paragraphs.Add(box);
doc.Save(dataDir + "output.pdf");

Hi Asam,

That didn’t quite seem to get the right coordinates. Here is the new output.pdf for the same input file attached above.output.pdf (72.5 KB)

Is that the same as the output file you were getting?

What i’m looking for is for the new text to be aligned to the same upper left corner as the paragraph I’m replacing. So, if this is my input:
image.png (5.0 KB)
Then this should be the output:
image.png (2.3 KB)

In case it helps, I again logged the new coordinates, including the new FloatingBox coords.
fragmentRect: 72.024,481.78000000953676,121.074719830513,493.92399996757507
page.Rect: 0,0,792,612
marginInfo: Left: 90, Bottom: 72, Right: 90, Top: 72
box: Left: 72.024, Width: 49.050719830513, Top: 46.07600003242493, Height: 12.143999958038307

What’s strange is that the box.Left is the same as fragmentRect.Left, yet clearly they are not in alignment. Is there some sort of centering going on? Does the htmlFragment need to be left-aligned inside the box?

Thanks,
Mike

@mlinnetz

We need to further investigate this complete scenario. For the purpose, an investigation ticket as PDFNET-50684 has been logged in our issue management system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

Hello,
Do you have any update on this?
It seems that regardless of whether i use a FloatingBox, the Left and Top alignment don’t match the
original textFragment position.

Thanks,
Mike

@mikeatacsys

We are afraid that the earlier logged ticket is not yet resolved. We will investigate and fix it on a first come first serve basis and let you know as soon as it is resolved. Please spare us some time.

We apologize for the inconvenience.