Replace text with content from another PDF

I have a business requirement that is to replace a text with content from another pdf, basically, there are 2 pdfs,
I want to replace a text with content (for instance a paragraph or a table etc.) from second PDF, can you please let me know how to achieve this? thanks!
Note: I already know this Replace Text in PDF|Aspose.PDF for .NET, it is replacing a text with a new text, not the one I am asking.

@legend23,

Kindly share the complete details of the scenario, including the two source PDF documents and a final expected output PDF. We will take a closer look, and then let you know about our findings.

Commentary.pdf (45.7 KB)
test.pdf (82.1 KB)
I have updated 2 pdfs, I want to replace “replace” text in test.pdf with second paragraph (B comment) from commentary.pdf. thanks.

@legend23,

You can extract the rectangular coordinates of the phrase B comment with the help of Rectangle member of the text fragment instance, and then extend this rectangle coordinates to extract the whole text of this paragraph (with the help of textfragment.Page.Rect.Width and textfragment.Page.Rect.Height members). You can retrieve text in various ways, please refer to this help topic: Extract Text from PDF

Once the whole text paragraph is retrieved, you can extract the rectangular coordinates of the word replace and extend the rectangle coordinates to place the paragraph text in particular page area. Please refer to this help topic: Replace Text in the particular page region

Thanks, I am going to try you suggestion. I have another question though, Once I replace text with B comment, will the B comment (whole paragraph) keep all the formatting in the new pdf ?

@legend23,

You can manage text formatting during the text replacement as follows:

[C#]

// Update text and other properties
textFragment.Text = "TEXT";
textFragment.TextState.Font = FontRepository.FindFont("Verdana");
textFragment.TextState.FontSize = 22;
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Blue);
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Green);

I don’t want to change the text formatting, I just want to keep the original formatting of B comment, it seems to be no possible, is it? thanks.

@legend23,

In order to apply the same formatting, you can assign text fragment properties to the target text fragment properties. Please try the following code example:

[C#]

// Open document
Document pdfDocument = new Document(dataDir + "Commentary.pdf");
// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("B Comment");
// Accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

// Loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
    Document document = new Document(dataDir + "test.pdf");
    // Create TextAbsorber object to find all instances of the input search phrase
    TextFragmentAbsorber textreplaceFragmentAbsorber = new TextFragmentAbsorber("replace");
    // Accept the absorber for all the pages
    document.Pages.Accept(textreplaceFragmentAbsorber);
    // Get the extracted text fragments
    TextFragmentCollection textreplaceFragmentCollection = textreplaceFragmentAbsorber.TextFragments;
    textreplaceFragmentCollection[1].Text = textFragment.Text;

    textreplaceFragmentCollection[1].TextState.Font = textFragment.TextState.Font;
    textreplaceFragmentCollection[1].TextState.FontSize = textFragment.TextState.FontSize;
    textreplaceFragmentCollection[1].TextState.Underline = textFragment.TextState.Underline;
    document.Save(dataDir + "Output.pdf", SaveFormat.Pdf);
}

This is the output PDF: Output.pdf (100.1 KB)

Another way round is that you can convert the page region to an image, and then place in the target PDF document. Please refer to these help topics: Convert a particular page region to Image and Add Image to Existing PDF File

If the B comment paragraph has more than one formatting, for instance 2 different fonts for B and Comment, how do I keep these 2 fonts in the output pdf? Also, if use Image extracted from page region, when the image get inserted into output pdf, can the content auto rearranges like text replacement in the output pdf?

@legend23,

The paragraph element has a sub element TextFragment to store the text string, if the formatting of a sub text string is different, then you can iterate through the TextSegments (sub elements of TextFragment) and apply font settings. Please refer to the internal hierarchy of the PDF document: PDF Document Structure.

You can retrieve the formatted text from an image, and then add this formatted text to the target PDF document. Aspose.OCR for .NET API can retrieve the text with style, font, text size and language, etc. Please refer to this help topic: Read the Part Information of Recognized Text.

Furthermore, we can understand the complexity of the scenario and have logged a feature request under the ticket ID PDFNET-43901 in our issue tracking system to copy the compete paragraph from one PDF to another with original formatting. We have linked your post to this ticket and will keep you informed regarding any available updates.

Is there anyway to replace text with a table? and make content of pdf go down with the table rows so that there is no overlap

@ouri101

A ticket as PDFNET-41102 has been logged in our issue tracking system for the implementation of the required feature. We have linked the ticket with this forum thread so that you can receive a notification as soon as the feature is available. Please be patient and spare us some time.

We are sorry for the inconvenience.