How to check a pdf document with one or more hyperlinks pointing to a file that does not exist

kranthireddyr · December 26, 2017, 10:23am

How to check a pdf document with one or more hyperlinks pointing to a file that does not exist.

How to check for broken Hyperlink

imran.rafique · December 26, 2017, 5:13pm

You can retrieve the hyperlink file path, and then check whether the file exists with the following code example:

[C#]

string curFile = @"c:\temp\test.txt";
Console.WriteLine(File.Exists(curFile) ? "File exists." : "File does not exist.");

Please also refer to this help topic: Add and Get Hyperlink

kranthireddyr · January 4, 2018, 2:27pm

Hi Imran,

How can i know the hyperlink Text.

Also how can i know the full path of location if link is selected as LaunchAction and URL Text if link is selected as “GoToURIAction”.

Here is my Code and Attached File in which 3 links are there i want to know the link text.HyperLinks.pdf (43.7 KB)

string path = @"F:\Applications-Names\TestFiles\HyperLinks.pdf";
            Aspose.Pdf.Document document = new Aspose.Pdf.Document(path);
            string actionType = string.Empty;
            foreach (Aspose.Pdf.Page page in document.Pages)
            {
                AnnotationSelector selector = new AnnotationSelector(new        Aspose.Pdf.Annotations.LinkAnnotation(page, Aspose.Pdf.Rectangle.Trivial));
                page.Accept(selector);
                IList list = selector.Selected;
                foreach (LinkAnnotation a in list)
                {
                    actionType = a.Action.GetType().Name;
                   if (actionType == "GoToAction")
                    {
                        GoToAction anAction = (GoToAction)a.Action;
                        if (anAction != null)
                        {
                        }
                        else
                        {
                            // Broken
                        }
                    }
                    //Console.WriteLine("\nDestination: " + (a.Action as Aspose.Pdf.Annotations.GoToURIAction).URI + "\n");
                }
            }

imran.rafique · January 5, 2018, 1:41am

@kranthireddyr,

Please modify your code as follows:

[C#]

string dataDir = @"c:\pdf\test549\";
string path = dataDir + "HyperLinks.pdf";
Aspose.Pdf.Document document = new Aspose.Pdf.Document(path);
string actionType = string.Empty;
foreach (Aspose.Pdf.Page page in document.Pages)
{
    AnnotationSelector selector = new AnnotationSelector(
            new Aspose.Pdf.Annotations.LinkAnnotation(page, Aspose.Pdf.Rectangle.Trivial));
    page.Accept(selector);
    IList list = selector.Selected;
    foreach (LinkAnnotation a in list)
    {
         actionType = a.Action.GetType().Name;
         TextAbsorber absorber = new TextAbsorber();
         absorber.TextSearchOptions.LimitToPageBounds = true;
         absorber.TextSearchOptions.Rectangle = a.Rect;
         page.Accept(absorber);
         string extractedText = absorber.Text;
         // Print the text associated with hyperlink
         Console.WriteLine(extractedText);

         switch (actionType)
         {
             case "GoToAction":
                 Console.WriteLine("\nDestination: " + (a.Action as GoToAction).Destination);
                 break;
             case "LaunchAction":
                 Console.WriteLine("\nOpen file: " + (a.Action as LaunchAction).File);
                 break;
             case "GoToURIAction":
                 Console.WriteLine("\nDestination: " + (a.Action as Aspose.Pdf.Annotations.GoToURIAction).URI + "\n");
                 break;
             }
    }
}

However, we find that the code example is not retrieving the text of the links. It has been logged under the ticket ID PDFNET-43944 in our bug tracking system. We have linked your post to this ticket and will keep you informed regarding an available updates.

kranthireddyr · January 5, 2018, 7:00am

imran.rafique:

foreach (Aspose.Pdf.Page page in document.Pages)
{
AnnotationSelector selector = new AnnotationSelector(
new Aspose.Pdf.Annotations.LinkAnnotation(page, Aspose.Pdf.Rectangle.Trivial));
page.Accept(selector);
IList list = selector.Selected;
foreach (LinkAnnotation a in list)
{
actionType = a.Action.GetType().Name;
TextAbsorber absorber = new TextAbsorber();
absorber.TextSearchOptions.LimitToPageBounds = true;
absorber.TextSearchOptions.Rectangle = a.Rect;
page.Accept(absorber);
string extractedText = absorber.Text;
// Print the text associated with hyperlink
Console.WriteLine(extractedText);
     switch (actionType)
     {
         case "GoToAction":
             Console.WriteLine("\nDestination: " + (a.Action as GoToAction).Destination);
             break;
         case "LaunchAction":
             Console.WriteLine("\nOpen file: " + (a.Action as LaunchAction).File);
             break;
         case "GoToURIAction":
             Console.WriteLine("\nDestination: " + (a.Action as Aspose.Pdf.Annotations.GoToURIAction).URI + "\n");
             break;
         }
}
}

Hi Imran,

Still i am not able to get the Hyperlink Text.

Please Check my PDF OnceHyperLinks.pdf (43.7 KB)
Query1.png (180.8 KB)

imran.rafique · January 5, 2018, 6:35pm

@kranthireddyr,

We have found the problem of not being able to extract the hyperlink text and it has been logged in our bug tracking system. We will let you know once it is resolved.