Free Support Forum - aspose.com

Cannot extract an link from a given pdf

Dear All,

I tried to extract a link from a PDF (see attached file below). I tried the following code (VB.NET):

 Private Function extractUrlsFromPDF(ByVal pdfPath As String) As String()
   Dim doc As Pdf.Document = New Pdf.Document(pdfPath)
   Dim urls As List(Of String) = New List(Of String)
   For Each page As Pdf.Page In doc.Pages
     Dim selector As Pdf.Annotations.AnnotationSelector =
     New Pdf.Annotations.AnnotationSelector(New Pdf.Annotations.LinkAnnotation(page, Aspose.Pdf.Rectangle.Trivial))
     page.Accept(selector)
     Dim list As IList(Of Pdf.Annotations.Annotation) = selector.Selected
     For Each linkAnnotation As Pdf.Annotations.LinkAnnotation In list
       If linkAnnotation.Action Is Nothing Then Continue For
       Dim url As String = vbNullString
       If linkAnnotation.Action.GetType() Is GetType(Pdf.Annotations.GoToRemoteAction) Then
         url = CType(linkAnnotation.Action, Pdf.Annotations.GoToRemoteAction).File.Name
       End If
       If linkAnnotation.Action.GetType() Is GetType(Pdf.Annotations.LaunchAction) Then
         url = CType(linkAnnotation.Action, Pdf.Annotations.LaunchAction).File
       End If
       If linkAnnotation.Action.GetType() Is GetType(Pdf.Annotations.GoToURIAction) Then
         url = CType(linkAnnotation.Action, Pdf.Annotations.GoToURIAction).URI
       End If
       If url <> vbNullString Then urls.Add(url)
     Next
   Next
   Return urls.OrderBy(Function(obj) obj).ToArray
 End Function

Howerver, the URL isn’t selected as a GoToRemoteAction, LauchAction, neither GoToURIAction. Why? What code could I use to catch this URL? I wrote a VB.NET sample but C# or whatever .NET compatible solution would be fine.

Regards.

Attached file: sample.pdf (64.6 KB)

@monir.aittahar

Thank you for contacting support.

Please try using below code snippet in your environment and then share your kind feedback with us.

// Load the PDF file
Document document = new Document(dataDir + "sample.pdf");
// Traverse through all the page of PDF
foreach (Aspose.Pdf.Page page in document.Pages)
{
    // Get the link annotations from particular page
    AnnotationSelector selector = new AnnotationSelector(new Aspose.Pdf.Annotations.LinkAnnotation(page, Aspose.Pdf.Rectangle.Trivial));

    page.Accept(selector);
    // Create list holding all the links
    IList<Annotation> list = selector.Selected;
    // Iterate through invidiaul item inside list
    foreach (LinkAnnotation a in list)
    {
        if (!(a.Action as Aspose.Pdf.Annotations.GoToURIAction is null))
        {
            // Print the destination URL
            Console.WriteLine("\nDestination: " + (a.Action as Aspose.Pdf.Annotations.GoToURIAction).URI + "\n");
        }
    }
}

@Farhan.Raza,

Thanks for your reply. I translated your code sample into this one in VB.NET:

Private Sub extractUrlsFromPDF2(ByVal pdfPath As String)
  Dim document As Pdf.Document = New Pdf.Document(pdfPath)
  For Each page As Pdf.Page In document.Pages
    Dim selector As Pdf.Annotations.AnnotationSelector =
      New Pdf.Annotations.AnnotationSelector(New Aspose.Pdf.Annotations.LinkAnnotation(page, Pdf.Rectangle.Trivial))
    page.Accept(selector)
    Dim list As IList(Of Pdf.Annotations.Annotation) = selector.Selected
    For Each a As Pdf.Annotations.LinkAnnotation In list
      If a.Action.GetType() Is GetType(Pdf.Annotations.GoToURIAction) Then
        Console.WriteLine(vbNewLine & "Destination: " & CType(a.Action, Pdf.Annotations.GoToURIAction).URI)
      End If
    Next
  Next
End Sub

The line Console.Writeline is not reached, ie no action of type GoToURIAction was found.

@monir.aittahar

The same code is working on our side in C# as well as VB.NET and the link is printed at console. Would you please ensure using Aspose.PDF for .NET 19.9 and share a narrowed down sample application if you still notice the problem.

Moreover, in case you do not have a valid license, please consider applying for free 30-days temporary license in order to test the API in its full capacity.

Hello @Farhan.Raza,

After a while, I tried your sample. When I posted the question, I was using Aspose.PF 19.6. I retried with 19.11 and C# instead of VB.NET. Strangely, to make it work I had to write:

if (a.Action is Aspose.Pdf.Annotations.GoToURIAction)

Instead of:

if (!(a.Action as Aspose.Pdf.Annotations.GoToURIAction is null))

Though, I still cannot reach the Console.WriteLine() line with the same PDF sample (the one added as an attachment). What am I still missing?

Regards.

@monir.aittahar

Kindly share a sample application containing SSCCE code as mentioned by us in first reply, so that we may try to reproduce your scenario in our environment.

Hello @Farhan.Raza,

You’ll find below an (hopefully) SSCCE sample attached. It opens a file dialog, use it to open the sample pdf file.

Best regards.

TestExtractLinksFromPdfApp.zip (8.9 KB)

@monir.aittahar

We have figured out that you are currently facing evaluation limitation, which allows to process only four items from any collection, along with evaluation watermark in output files, as explained in Licensing . You may avoid this limitation and test the API in its full capacity by applying a free 30-days temporary license.