Dear All,
I tried to extract a link from a PDF (see attached file below). I tried the following code (VB.NET):
Private Function extractUrlsFromPDF(ByVal pdfPath As String) As String()
Dim doc As Pdf.Document = New Pdf.Document(pdfPath)
Dim urls As List(Of String) = New List(Of String)
For Each page As Pdf.Page In doc.Pages
Dim selector As Pdf.Annotations.AnnotationSelector =
New Pdf.Annotations.AnnotationSelector(New Pdf.Annotations.LinkAnnotation(page, Aspose.Pdf.Rectangle.Trivial))
page.Accept(selector)
Dim list As IList(Of Pdf.Annotations.Annotation) = selector.Selected
For Each linkAnnotation As Pdf.Annotations.LinkAnnotation In list
If linkAnnotation.Action Is Nothing Then Continue For
Dim url As String = vbNullString
If linkAnnotation.Action.GetType() Is GetType(Pdf.Annotations.GoToRemoteAction) Then
url = CType(linkAnnotation.Action, Pdf.Annotations.GoToRemoteAction).File.Name
End If
If linkAnnotation.Action.GetType() Is GetType(Pdf.Annotations.LaunchAction) Then
url = CType(linkAnnotation.Action, Pdf.Annotations.LaunchAction).File
End If
If linkAnnotation.Action.GetType() Is GetType(Pdf.Annotations.GoToURIAction) Then
url = CType(linkAnnotation.Action, Pdf.Annotations.GoToURIAction).URI
End If
If url <> vbNullString Then urls.Add(url)
Next
Next
Return urls.OrderBy(Function(obj) obj).ToArray
End Function
Howerver, the URL isn’t selected as a GoToRemoteAction, LauchAction, neither GoToURIAction. Why? What code could I use to catch this URL? I wrote a VB.NET sample but C# or whatever .NET compatible solution would be fine.
Regards.
Attached file: sample.pdf (64.6 KB)
@monir.aittahar
Thank you for contacting support.
Please try using below code snippet in your environment and then share your kind feedback with us.
// Load the PDF file
Document document = new Document(dataDir + "sample.pdf");
// Traverse through all the page of PDF
foreach (Aspose.Pdf.Page page in document.Pages)
{
// Get the link annotations from particular page
AnnotationSelector selector = new AnnotationSelector(new Aspose.Pdf.Annotations.LinkAnnotation(page, Aspose.Pdf.Rectangle.Trivial));
page.Accept(selector);
// Create list holding all the links
IList<Annotation> list = selector.Selected;
// Iterate through invidiaul item inside list
foreach (LinkAnnotation a in list)
{
if (!(a.Action as Aspose.Pdf.Annotations.GoToURIAction is null))
{
// Print the destination URL
Console.WriteLine("\nDestination: " + (a.Action as Aspose.Pdf.Annotations.GoToURIAction).URI + "\n");
}
}
}
@Farhan.Raza,
Thanks for your reply. I translated your code sample into this one in VB.NET:
Private Sub extractUrlsFromPDF2(ByVal pdfPath As String)
Dim document As Pdf.Document = New Pdf.Document(pdfPath)
For Each page As Pdf.Page In document.Pages
Dim selector As Pdf.Annotations.AnnotationSelector =
New Pdf.Annotations.AnnotationSelector(New Aspose.Pdf.Annotations.LinkAnnotation(page, Pdf.Rectangle.Trivial))
page.Accept(selector)
Dim list As IList(Of Pdf.Annotations.Annotation) = selector.Selected
For Each a As Pdf.Annotations.LinkAnnotation In list
If a.Action.GetType() Is GetType(Pdf.Annotations.GoToURIAction) Then
Console.WriteLine(vbNewLine & "Destination: " & CType(a.Action, Pdf.Annotations.GoToURIAction).URI)
End If
Next
Next
End Sub
The line Console.Writeline
is not reached, ie no action of type GoToURIAction
was found.
@monir.aittahar
The same code is working on our side in C# as well as VB.NET
and the link is printed at console. Would you please ensure using Aspose.PDF for .NET 19.9 and share a narrowed down sample application if you still notice the problem.
Moreover, in case you do not have a valid license, please consider applying for free 30-days temporary license in order to test the API in its full capacity.
Hello @Farhan.Raza,
After a while, I tried your sample. When I posted the question, I was using Aspose.PF 19.6. I retried with 19.11 and C# instead of VB.NET. Strangely, to make it work I had to write:
if (a.Action is Aspose.Pdf.Annotations.GoToURIAction)
Instead of:
if (!(a.Action as Aspose.Pdf.Annotations.GoToURIAction is null))
Though, I still cannot reach the Console.WriteLine() line with the same PDF sample (the one added as an attachment). What am I still missing?
Regards.
@monir.aittahar
Kindly share a sample application containing SSCCE code as mentioned by us in first reply, so that we may try to reproduce your scenario in our environment.
Hello @Farhan.Raza,
You’ll find below an (hopefully) SSCCE sample attached. It opens a file dialog, use it to open the sample pdf file.
Best regards.
TestExtractLinksFromPdfApp.zip (8.9 KB)
@monir.aittahar
We have figured out that you are currently facing evaluation limitation, which allows to process only four items from any collection, along with evaluation watermark in output files, as explained in Licensing . You may avoid this limitation and test the API in its full capacity by applying a free 30-days temporary license.
Hello @Farhan.Raza,
I’m very sorry to reply so late, I just wanted to confirm it was indeed a license issue.
Thank you very much.
@monir.aittahar
It is good to know that your issue has been resolved. Please keep using our API and in case of any further assistance, please feel free to let us know.