HTML to PDF: Treatment of links

I would like to get your opinion on how we can handle the following issues or if they are bugs and when they can be fixed. I am currently evaluating Aspose.PDF for our company and the items marked as important below, are considered critical for our application.

  1. named anchors - Iimportant

If the html has content like the following:


This is a test for named page anchors and how they work with Aspose.PDF.

link to in page anchor


Once converted to PDF, I would expect clicking on “link to in page anchor” to go to the correct place as specified above it. But it seems Aspose.PDF treats it as an external link and tries to go to “in_page_anchor” URL.

  1. relative links

if the html has relative links of the form “…/section/test.aspx” or “/section/test.aspx” etc, what is the best way to append the correct site domain prefix (ie: http://www.dymaxium.com/app2/) to the converted PDF file?

  1. javasript links - important

when there is a javascript link of the form test js linke, the PDF file tends to have different behavior and some of the behavior are not acceptable for us.

Here are the behaviors I have observed:

a) the text “test js link” is not treated as a link and shows up as normal text. This probably is the expected behavior and we are okay with it. Although we have not looked into this in great details, this seems to happen for larger documents that have many such links.

b) most of the time we get this behavior. the text is treated as a regular link and clicking it takes the user to a URL containing the javascript - this must be a bug as the javascript is meaningless in this fashion as the context (page) is not available anymore?

c) the text shows up as link and nothing happens when clicking on it.

If you can provide your feedback or workarounds for the above, or estimates on when you plan to address them, I would appreciate it.

Addendum:

an observation our team made that could be valuable in fixing some of the issues:

for issue #1, it seems Aspose.Word’s PDF converter does a much better job of handling this so does the old Aspose PDF generator one.

for issue #3: Aspose.Word’s PDF converter does a much better job again.

I assume your PDF conversion logic/code is shared/similar between the Aspose.Word and Aspose.PDF so this could be an easier fix for you guys.

Hi Kuna,

ksubramaniyam@dymaxium.com:

1) named anchors - Iimportant
If the html has content like the following:
-------

This is a test for named page anchors and how they work with Aspose.PDF.


---------

Once converted to PDF, I would expect clicking on "link to in page anchor" to go to the correct place as specified above it. But it seems Aspose.PDF treats it as an external link and tries to go to "in_page_anchor" URL.


We have tested the scenario and noticed the reported issue, so logged a ticket PDFNEWNET-40410 for further investigation. We will notify you as soon as it is resolved.

ksubramaniyam@dymaxium.com:
2) relative links
if the html has relative links of the form "../section/test.aspx" or "/section/test.aspx" etc, what is the best way to append the correct site domain prefix (ie: http://www.dymaxium.com/app2/) to the converted PDF file?

3) javasript links - important
when there is a javascript link of the form test js linke, the PDF file tends to have different behavior and some of the behavior are not acceptable for us.

Here are the behaviors I have observed:
a) the text "test js link" is not treated as a link and shows up as normal text. This probably is the expected behavior and we are okay with it. Although we have not looked into this in great details, this seems to happen for larger documents that have many such links.
b) most of the time we get this behavior. the text is treated as a regular link and clicking it takes the user to a URL containing the javascript - this must be a bug as the javascript is meaningless in this fashion as the context (page) is not available anymore?
c) the text shows up as link and nothing happens when clicking on it.


For point 2 and 3, we will appreciate it if you please share your sample input HTML files here, We will look into it and will guide you exactly.

We are sorry for the inconvenience caused.

Best Regards,

for #3, i had provided the sample in my question… it doesn’t matter anyway as from our testing it fails for any type of javascript function call.


here is a full sample:

<a href="javascript:test()">test js link</a>

once converted, you will notice that the link is clickable but PDF does not interpret javascript. Depending the PDF viewer you have, you may see different behavior: 1) inline browser ones will take you to the link containing the javascript and it would be an invalid link. 2) adobe acrobat may show message to the effect of launching something external but might not launch anything.

when the same type of javascript links are used but in large quantities, you tend to get different behavior for certain links - unfortunately, I cannot provide them here due to nondiscloure agreements on our side. In any case, the above issue is more important than this.

for #2, i am asking for best practices guide and not sure if it’s an issue.

<html></div><div><body></div><div><a href="/global/absolute.html">Absolute test</a></div><div><a href="../home/relative.html">Relative test</a></div><div><a href="http://www.dymaxium.com/full.html">Complete URL</a></div><div></body></div><div></html>

When this html is stored in a string/static file, how can i convert to PDF and properly prepend and actual domain to the URLs so the URLs end up like this:

http://www.site.com/global/absolute.html
http://www.site.come/home/relative.html
http://www.dymaxium.com/full.html
Hi Kuna,

Thanks for sharing additional information.

ksubramaniyam@dymaxium.com:
for #3, i had provided the sample in my question.. it doesn't matter anyway as from our testing it fails for any type of javascript function call.

here is a full sample:

-------

this is a JS test

---------


once converted, you will notice that the link is clickable but PDF does not interpret javascript. Depending the PDF viewer you have, you may see different behavior: 1) inline browser ones will take you to the link containing the javascript and it would be an invalid link. 2) adobe acrobat may show message to the effect of launching something external but might not launch anything.

when the same type of javascript links are used but in large quantities, you tend to get different behavior for certain links - unfortunately, I cannot provide them here due to nondiscloure agreements on our side. In any case, the above issue is more important than this.


I have tested the HTML to PDF conversion scenario and noticed the JavaScript link issue in resultant PDF document, so logged a ticket PDFPDFNEWNET-40415 for further investigation and rectification. We will keep you updated about the issue resolution progress.

I am looking into point 2 and will update you soon.

Best Regards,

In regards to #3,

Any updates?

I would assume this is a common requirement of converting HTML to PDF so not sure why it’s taking long?

Hi Kuna,

ksubramaniyam@dymaxium.com:
In regards to #3,
Any updates?

Thanks for your inquiry. I am afraid the reported issue (PDFNEWNET-40415) is recently noticed and pending for investigation. We will notify you as soon as we made some significant progress towards issue resolution.

ksubramaniyam@dymaxium.com:
for #2, i am asking for best practices guide and not sure if it’s an issue.
for example, sample like:
Absolute test
Relative test
Complete URL
When this html is stored in a string/static file, how can i convert to PDF and properly prepend and actual domain to the URLs so the URLs end up like this:
http://www.site.com/global/absolute.html
http://www.site.come/home/relative.html
http://www.dymaxium.com/full.html

Currently Aspose.Pdf does not support to update links in HTML to PDF conversion. We have logged an investigation ticket PDFNEWNET-40479 for the purpose. We will keep you updated about the issue resolution progress. However as a workaround you may update links in a PDF document as suggested in following documentation.

We are sorry for the inconvenience caused.

Hi,


I’m evaluating the product but ran into issue #1, is there a fix to ticket PDFNEWNET-40410 already?

Please advice

Hi Alejandro,


Thanks for your inquiry. I am afraid the reported issue(PDFENWNET-40410) is still not resolved, as our product team is busy in resolving other issues in the queue. However, we have recorded your concern and will notify you as soon as we made some significant progress towards issue resolution.

We are sorry for the inconvenience caused.

Best Regards,
Hi,
Is issue PDFNEWNET-40410 already resolved? Or maybe there is a workaround?

Best regards,
Gabriel
gabrielmazur:
Hi,
Is issue PDFNEWNET-40410 already resolved? Or maybe there is a workaround?

Best regards,
Gabriel
Hi Gabriel,

Thanks for your patience.

I am afraid the earlier reported issue is not yet resolved. However we request you to please share your input files so that we can test the scenario accordingly. We are sorry for your inconvenience.

Hi Tilal,


Following code will generate failing link:

var options = new HtmlLoadOptions();
string htmlContent = @"

<a href=""#in_page_anchor"">Jump to second page.
<div style="“page-break-before:always”">
<a name="“in_page_anchor”">here
";


using (var memoryStream = new MemoryStream(Encoding.UTF8.GetBytes(htmlContent)))
{
var pdfDocument = new Document(memoryStream, options);

pdfDocument.Save(“HTMLToPDF_Anchor.pdf”);
}


Thanks,
Gabriel

Hi Gabriel,


Thanks for sharing our sample code. We have shared it with the product team and requested to investigate and share an ETA/update as soon as possible. We will keep you updated about the issue resolution progress.

We are sorry for the inconvenience.

Best Regards,

I’m having the same issue as described at the begging “#1” PDFNEWNET-40410

Anchor tag is not working for navigation in the PDF that was converted from HTML

Any news on how to make it work? (Java, Html to PDF)

@giedrius14,

We have assigned this issue to concerned team member and we will share good news with you soon.

Was PDFNEWNET-40410 ever resolved and if not, is there a workaround for this?

@ANDREA.FARRIS

The ticket was blocked by other sub-tasks initially. We have revived it against your concerns and will inform you as soon as it is closed. Furthermore, can you please try 23.12 version of the API and if you facing the similar issue, can you please share your sample HTML in .zip format with us? We will include it in our investigation and do the needful.

Hi yes we tried version 23.12 and still no luck. I have attached a zip with the html we are testing. The url in the html file should take you to the section at the bottom
Test_Bookmark.zip (1.3 MB)

@ANDREA.FARRIS

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-56377

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Is there any update on this? Or is there a workaround?

Being able to use Html->Pdf to create links within a PDF that take you to another part of the PDF seems like a pretty rudimentary requirement. I’m very surprised there isn’t a way to get this to work.

@ANDREA.FARRIS

We apologize for the inconvenience and the delay for this feature. This may look like as simple as it works in HTML but from the perspective of PDF format, it may be a complex functionality. Nevertheless, we are surely looking into it and as soon as we release it, we will announce and post a notification here for your kind reference. Please spare us some time.