Convert Hyperlinks to images in output PDF using .NET

Hello,

I have a few image conversion and formatting questions for Word Doc files. Upon generating an image file of a hyperlink text during run time, only boolean values are being converted to an image file. My logic is able to indicate hyperlink texts that should be converted to an image file, but the image does not consist of the hyperlink text.

Once the image is successfully uploaded, I would like to have it positioned to where the old hyperlink text was located. The preferred approach is to remove the hyperlink field and place the image file in its coordinates.

Thanks,
Woon Gi

@whong4

To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input Word document.
  • Please attach the output file that shows the undesired behavior.
  • Please attach the expected output file that shows the desired behavior.
  • Please create a standalone console application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

@whong4

Your code contains the compilation errors. Please create a standalone console application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.

Could you please share some more detail about your requirement? Perhaps, there is some other way to achieve it.

@whong4

Please note that Field.Unlink method performs the field unlink. If the field has been unlinked, it returns True. This is the reason you are getting “True” in the output PDF.

In your code, you are getting fields from the header and footer of document as well. We suggest you please get the fields of document’s body as shown below.

var wordDoc = new Aspose.Words.Document(MyDir + "sample_input.docx");
foreach (Section section in wordDoc.Sections)
{
    foreach (Field field in section.Body.Range.Fields)
    {

    }
} 

Moreover, you need to move the cursor to the field and insert the inline image. The following content from your document are not hyperlinks. Please open the Word document in MS Word and press ALT + F9 to check the fields codes.

www.google.com
www.example.com
www.example.edu

Thank you for your response. I apologize for the confusion, but this following document should have the following texts as hyperlinks. Is there a way that I can identify the coordinates of the hyperlink text in the word document, and place the inline images in their set locations.

I would need a way to be able to complete this during runtime. If a user uploads a Word Document and requests to generate a PDF file, the hyperlink text should be replaced with the image files.

Sample_Input.zip (17.4 KB)

@whong4

Yes, you can find the location of hyperlink using Aspose.Words. In your case, we suggest you following solution.

  1. Find the hyperlink and bookmark it.
  2. Use Layout API to find the location of BookmarkStart node.
  3. Move the cursor to the BookmarkStart node and insert the image.

Following code example shows how to get the position of Bookmark.

Document document = new Document(MyDir + "input.docx");

Bookmark bm = document.Range.Bookmarks["bookmark"];
document.UpdatePageLayout();

//Get the position of Bookmark
LayoutCollector layoutCollector = new LayoutCollector(document);
LayoutEnumerator layoutEnumerator = new LayoutEnumerator(document);

layoutEnumerator.Current = layoutCollector.GetEntity(bm.BookmarkStart);
Console.WriteLine(" --> Left : " + layoutEnumerator.Rectangle.Left + " Top : " + layoutEnumerator.Rectangle.Top);

Thank you for providing a work around for this issue. I did have a few questions regarding the ‘Current’ property of the Layout API. My image conversion method requires the x and y coordinates of the image file.

Is there a way that I can parse the Current property to a class that enables integer returns, instead of objects? I’m sorry if this is worded poorly. Please let me know if you have any questions or concerns about the following request.

Thanks,
Woon Gi

@whong4

The LayoutEnumerator.Current property gets or sets current position in the page layout model. You can get the left top position of text by bookmark it. You can get the position of text using LayoutEnumerator.Rectangle property after setting Current property.

Could you please share the issue that you are facing while using Layout APIs?

I tried the following logic to set the images of hyperlink text to the location of where the hyperlink should be located. The if statement consists of a condition that verifies whether a hyperlink meets the condition of the Regex. If the condition is met, the logic within the body of the condition should be executed.

The image file of appropriate hyperlink text appears to be working. However, the implementation of my bookmark logic needs to be worked on. After implementing the bookmark logic, it doesn’t seem to be stopping at my breaking point for the image method, DrawText().

                if (Literals.RegEx.IsHyperLink.IsMatch(field.Result))
                {
                    Bookmark bm = doc.Range.Bookmarks["bookmarks"];
                    doc.UpdatePageLayout();

                    LayoutCollector layoutCollector = new LayoutCollector(doc);
                    LayoutEnumerator layoutEnumerator = new LayoutEnumerator(doc);

                    layoutEnumerator.Current = layoutCollector.GetEntity(bm.BookmarkStart);
                    Image image = DrawText(field.Result, "Arial", 10, Color.Black, Color.Transparent, layoutEnumerator.Rectangle.Left, layoutEnumerator.Rectangle.Top);

                    field.Remove();
                    builder.InsertImage(image);
                }

@whong4

Please use the following code example to get the desired output. Hope this helps you.

Document doc = new Document(MyDir + "sample_input.docx");
int i = 1;
DocumentBuilder builder = new DocumentBuilder(doc);
foreach (Field field in doc.Range.Fields)
{
    if (field.Type == FieldType.FieldHyperlink)
    {
        FieldHyperlink hyperlink = (FieldHyperlink)field;
        builder.MoveToField(field, false);
        builder.StartBookmark("bookmark"+i);
        builder.MoveToField(field, true);
        builder.EndBookmark("bookmark" + i);
        field.Unlink();
        i++;
    }
}
                 
foreach (Bookmark bookmark in doc.Range.Bookmarks)
{
    Image image = DrawText(bookmark.Text, "Arial", 10, Color.Black, Color.Transparent, 0, 0);
    bookmark.Text = "";
    builder.MoveToBookmark(bookmark.Name);
    builder.InsertImage(image);
}

doc.Save(MyDir+ "output.docx");

Hello Mr. Manzoor,

Thank you for the assistance! I was able to merge the following logic to the changes that I made. Overall, the images are positioned to where the non-email hyperlinks are located.

There is one slight change that I would like to look into. The styling of the inline images are off from the rest of the text in the document. Are there any styling properties within the Bookmark API that can resolve this issue.

Thanks,
Woon Gi

@whong4

Could you please share your expected output document and problematic document in which the image is off from the rest of text? Please also share the screenshot of problematic section of document. We will check this issue and provide you detail on this issue.

@whong4

The images are created by .NET API. Aspose.Words inserts them into document. This issue is more related to .NET. Could you please save the images to disk and share them here for further testing?

@whong4

Please note that Aspose.Words mimics the behavior of MS Word. If you insert the images into document and save it to PDF using MS Word, you will get the same output.

The images quality is not good. So, you are getting expected output.

Thank you for your response.

Would there be a .NET API Property or Aspose.Words API Property that I could use to replicate the styling of texts in a MS Word Document?

@whong4

Could you please share why you want to convert email to images? Perhaps, there is some other way to get the desired output.

@tahir.manzoor

Adobe Acrobat Reader has a default setting that will embed all hyperlink text with a hyperlink address. The scenario that I am trying to complete is to generate a PDF document that has embedded hyperlinks for email addresses, but any non-email hyperlinks should have the embedded hyperlink removed.

The unlink() property of the Field API seems to be working for Word documents, but PDF seems to be ignoring this property during its generation process. I thought that converting the non email hyperlinks to an image file would be a workaround to solve the embedded hyperlink issue.

However, if you have any other possible solutions that could fix this issue, I would be open for suggestions.

Thanks,
Woon Gi

@whong4

In your case, we suggest you please call Field.Unlink method for non-email hyperlinks and export the document to PDF.

In this case, hyperlinks are not exported to the output PDF. Acrobat Reader creates hyperlink by itself from the text. This behavior is controlled by “Edit->Preferences->General->Create links from URLs” checkbox. Please untick this property to get the desired output.

@tahir.manzoor

Thank you for your insight. Although this idea works, the preferred approach would be to provide a workaround for the cropped image.

Is there an Aspose API that you can refer me to that could potentially adjust the styling of the cropped image. I would like the format of the cropped text to look identical to the font styling of the rest of the document.

If this approach seems infeasible, please let me know.

Thanks,
Woon Gi

@whong4

In your case, we suggest you following solution.

  1. Please get the text of hyperlink.
  2. Insert textbox into the document.
  3. Insert the hyperlink text into textbox.
  4. Convert the shape (textbox) into image using Shape.GetShapeRenderer().Save method.
  5. Move the cursor to the hyperlink and insert the shape.
  6. Save the document to PDF

Hope this helps you.

Moreover, you can protect your output PDF using Aspose.Words. Following code example demonstrates how to set permissions on a PDF document generated by Aspose.Words.

Document doc = new Document(MyDir + "Rendering.docx");

PdfSaveOptions saveOptions = new PdfSaveOptions();

// Create encryption details and set owner password
PdfEncryptionDetails encryptionDetails =
    new PdfEncryptionDetails("password", string.Empty, PdfEncryptionAlgorithm.RC4_128);

// Start by disallowing all permissions
encryptionDetails.Permissions = PdfPermissions.DisallowAll;

// Extend permissions to allow editing or modifying annotations
encryptionDetails.Permissions = PdfPermissions.ModifyAnnotations | PdfPermissions.DocumentAssembly;
saveOptions.EncryptionDetails = encryptionDetails;

// Render the document to PDF format with the specified permissions
doc.Save(ArtifactsDir + "Rendering.EncryptionPermissions.pdf", saveOptions);