Hyperlink issue during Word to PDF conversion by Aspose.Words


#1

Hi,

Seems that fix of WORDSNET-14728 introduced the new bug (17.2.0 release).

If the link has already encoded then after converting document to the PDF this link becomes decoded.
The link from the PDF must be equal to the link from the WORD.

Could somebody fix this issue?


"Re: Word to PDF conversion ""double encodes"" hyperlinks"
#2

@spectr100,

Please attach your input Word document and Aspose.Words 17.10 generated output PDF file showing the undesired behavior here for testing. We will investigate the issue on our end and provide you more information.


#4

test.zip (40.9 KB)


#5

@spectr100,

I have generated PDF files using the latest licensed version of Aspose.Words i.e. 17.10 and MS Word 2016 and attached them here for your reference (see 17.10.pdf (18.1 KB) and msw-2016.pdf (157.0 KB)). Please create a comparison screenshot highlighting (encircle) the problematic areas in this Aspose.Words generated PDF and attach it here for our reference. We will investigate the issue further on our end and provide you more information.


#6

The original hyperkink in Word:
https://www.test.com/Test/12345/Abc/Test.html?testPath=%2FTest%2Fv1%2FtestTable%2Fnav%3F123

After converting to PDF:
https://www.test.com/Test/12345/Abc/Test.html?testPath=/Test/v1/testTable/nav?123

test.zip (42.6 KB)


#7

17.10.png (3.2 KB)
msw-2016.png (3.2 KB)


#8

@spectr100,

Thanks for the additional information. While using the latest version of Aspose.Words i.e. 17.10, we managed to reproduce this issue on our end. We have logged this issue in our bug tracking system. The ID of this issue is WORDSNET-16121. Your thread has also been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.


#9

Can I expect the fix in the 17.12 release?


#10

@spectr100,

Unfortunately, your issue is not resolved yet and there are no estimates available at the moment. Currently, it is pending for analysis and is in the queue. We will inform you via this thread as soon as this issue is resolved. We apologize for your inconvenience.


#11

@spectr100,

Regarding WORDSNET-16121, our product team has completed the analysis of your issue and has come to a conclusion that this issue and the undesired behavior you are observing is actually not a bug in Aspose.Words. So, we will close this issue as ‘Not a Bug’. Please see the following details.

According to RFC-3986 specification the “/” character is a “gen-delims” type of character. See “2.2. Reserved Characters” chapter of the specification. So “/” character should not be represented in percent-encoded form. If you want %2F string to be part of output URI, you should previously escape “%” character ("%25").

So the input string represented URI should be like:
"https://www.test.com/Test/12345/Abc/Test.html?testPath=%2FTest

Can you please share why you still think the current behavior is a bug (despite the fact that MS Word produces different result which may be a bug as well) and why you need that kind of output?


#12

This issue was introduced in WORDSNET-13638 (16.6 version).

According to RFC-3986 specification we encode “/” character to the “%2F”, our hyperlink becomes like https://www.test.com/Test/12345/Abc/Test.html?testPath=%2FTest, and this is correct link.

Aspose converter shouldn’t decode it back, because it is breaks the link.
https://www.test.com/Test/12345/Abc/Test.html?testPath=/Test this is incorrect link.

Another example:
You can open google.com and type “/test”, the resulting url will be “https://www.google.com/search?q=%2Ftest”.

If you encode parameter again (% -> %25) then you will get broken url
https://www.google.com/search?q=%2Ftest

So the link from the PDF must be equal to the link from the WORD without any changes.

PS. You can save DOCX as PDF in Microsoft Word Processor and it also will not change the urls.

Thanks.


#13

@spectr100,

We have passed these details to our product team for further investigation. We will keep you posted.


#14

@spectr100,

Regarding WORDSNET-16121, it is to update you that we use some combination of default system classes to get correct URI in output PDF document. In your example:

string escapedUri = System.Uri.EscapeUriString(@"https://www.test.com/Test/12345/Abc/Test.html?testPath=/Test test");

This code would produce “https://www.test.com/Test/12345/Abc/Test.html?testPath=/Test%20test” string. So current implementation in Aspose.Words seems to be correct.

If you are sure that your way of escaping URI string is correct, we can add an option DoNotEscapeUri in PdfSaveOptions. If this option is enabled we will skip process of escaping and write value of URI from the source document. So, it will be up to you to properly handle all possible cases.

Please let us know if addition of this new option in API is acceptable for you?


#15

Yes, it would be great!


#16

The issues you have found earlier (filed as WORDSNET-16121) have been fixed in this Aspose.Words for .NET 18.3 update and this Aspose.Words for Java 18.3 update.