GoToURIAction destination encoding issues

dr.doc · July 19, 2019, 4:04pm

Hi Aspose team,

I have one PDF with some text and few links. When I open document in Adobe Reader I see URL having “Démo estFolder”. When I use attached project Aspose.PDF is returning “/DÃ©mo%20estFolder2”.

What should I do to get what is really written in document? Can I influence encoding done by Aspose.PDF?

Aspose.Bugs.PDF.Encoding.zip (77.2 KB)

Thanks,
Oliver

Farhan.Raza · July 19, 2019, 9:52pm

@dr_oli

Thank you for contacting support.

Please always share SSCCE code for efficiency purposes. We have used below code and extracted result is almost similar to what it appears when checked with Adobe Acrobat. We have attached a comparison screenshot for your kind reference. Comparison.PNG

//open document
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"D:\AsposeFiles\Aspose.Bugs.PDF.Encoding\Aspose.Bugs\bin\Debug\1.pdf");
foreach (Aspose.Pdf.Page objPage in pdfDocument.Pages)
{
    foreach (Aspose.Pdf.Annotations.Annotation objAnnotation in objPage.Annotations)
    {
        if (objAnnotation is Aspose.Pdf.Annotations.LinkAnnotation)
        {

            Aspose.Pdf.Annotations.LinkAnnotation objLinkAnnotation;
            objLinkAnnotation = (Aspose.Pdf.Annotations.LinkAnnotation)objAnnotation;

            if ((objLinkAnnotation.Action != null && objLinkAnnotation.Action.GetType() == typeof(Aspose.Pdf.Annotations.GoToURIAction)))
            {
                Aspose.Pdf.Annotations.GoToURIAction objURIAction;
                objURIAction = (Aspose.Pdf.Annotations.GoToURIAction)objLinkAnnotation.Action;
                string Address = objURIAction.URI.ToString();
                if (Address == null)
                    Address = "n.a";
                Console.WriteLine(Address);
            }
        }
    }
}

dr.doc · July 20, 2019, 8:48pm

Hi Farhan,

what is SSCCE?

In your screenshot is exactly what I am saying - in Adobe Acrobat you see Démo and Aspose.PDF is returning DÃ©mo.

You cannot say we are returning almost similar it is either the same or it is not.

If this is a bug fix would be needed or if you say that you are reading encoding differently then question is what are options to set in Aspose.PDF so that encoding is properly interpreted.

Thx,
Oliver

asad.ali · July 21, 2019, 11:18am

@dr_oli

Thanks for writing back.

We are looking into the scenario again and will get back to you shortly.

dr.doc · July 24, 2019, 6:22pm

Hi Asad,

any news here? Is this bug or you will have workaround how to encode properly PDF content?

Thx,
Oliver

Farhan.Raza · July 24, 2019, 9:45pm

@dr_oli

Thank you for elaborating it further.

About SSCCE, we had hyperlinked the text for your kind reference. About the issue of encoding, we have also tried HttpUtility.HtmlDecode method but the problem persists. Therefore, a ticket with ID PDFNET-46738 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

dr.doc · July 31, 2019, 8:14pm

Hi,

I understand first come, first served approach but any progress regarding encoding here? I doubt that this is huge problem.

Thx,
Oliver

Farhan.Raza · July 31, 2019, 10:41pm

@dr_oli

Thank you for getting back to us.

We really understand your concerns and realize the significance of this issue. Please note that not every encoding issue is caused by same reasons as scenarios vary PDF to PDF. We have recorded your comprehensions and have escalated it internally. We will be trying to schedule it soon and will share our findings with you. Please spare us some time.