Get Comments from a pdf file

jgabriel-ulx · August 29, 2011, 3:18pm

Hi,

How do we extract Comments from a pdf Document? The fileInfo’s header have a Comments object but is not getting populated when I use it on my document.

Can some one let me know how do we get the comments out of a pdf document?

Thanks,

John

codewarior · August 29, 2011, 9:15pm

Hello John,

Thanks for using our products.

Please visit the following link for information on Get All Annotations from the Page of a PDF document. In case you still face any problem, can you please share the source PDF document so that we can test the scenario at our end. We apologize for your inconvenience.

FYI, you may also use Aspose.Pdf for .NET to extract following information

Get PDF File Information
Get XMP Metadata from PDF File
Get Document Window and Page Display Properties

jgabriel-ulx · August 30, 2011, 11:08am

Hi,

I started using Annotations but I’m getting a “System.NullReferenceException: Object reference not set to an instance of an object.

at Aspose.Pdf.InteractiveFeatures.Annotations.Annotation.( , Page )

at Aspose.Pdf.InteractiveFeatures.Annotations.AnnotationCollection..get_Current()

at MyClass.GetCommentsfromDoc(Document PdfDoc)“

I’m passing the pdf doc as parameter for this method. I’m able to get the Metadata from the same document from a different method, so the document should be good. I have attached the document to this post and please note that the password for the document is “Password”. This is my code:

internal string GetCommentsfromDoc(Document PdfDoc)

{

string comments = “”;

for (int i = 1; i <= PdfDoc.Pages.Count; i++)

{

try

{

foreach (MarkupAnnotation annotation in PdfDoc.Pages[i].Annotations)

{

string.Format(”{0} Page: {1} ; Title: {2} ; Subject : {3} ; Contents :{4} \n”,

comments,

i,

annotation.Title,

annotation.Subject,

annotation.Contents);

}

catch(Exception ex)

{

//Handle exception

}

return comments;

}

Please let me know.

Thanks,

John.

nausherwan.aslam · August 30, 2011, 1:10pm

Hi John,

Thank you for sharing the sample code and template file.
We have found your mentioned issue after an initial test. Your issue has been registered in our issue tracking system with issue id: PDFNEWNET-30245. You will be notified through this forum thread once the issue gets resolved.

Sorry for the inconvenience.

aspose.notifier · September 8, 2011, 8:38am

The issues you have found earlier (filed as 30245) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

jgabriel-ulx · September 8, 2011, 10:12am

Great!

I will test it and let you guys know.

Thanks for the update.

John.

jgabriel-ulx · September 8, 2011, 11:15am

Hey,

I got the latest release and tried extracting the annotations. In my document I have hyper links. So when I tried "foreach (MarkupAnnotation annotation in PdfDoc.Pages[i].Annotations)"

1)it is giving me an exception that it is a LinkAnnotation, which might be caused because it encountered a hyperlink.

2)When I changed it to LinkAnnotation, which I don't have to or need to, it didn't have any data in it.

3)After the hyperlinks it said now there is a CaretAnnotation and threw an Exception.

The documentation provided with 6.2.0 still shows the old examples using MarkupAnnotations, which is no longer working if the page have annotations other than Markup. Please let me know how do I get the comments from the page with the new 6.2.0 PDF for .Net. You can try the previous document that I have provided you for the evaluation.

Thanks,

John.

nausherwan.aslam · September 9, 2011, 3:17am

Hi John,

Please accept our apologies regarding the re-occurrence of the issue. I have tested the scenarios you mentioned and indeed the issues are still occurring in the latest version of Aspose.Pdf. I have re-opened the issue and informed the development team to further look into it. We will update you as soon as any feedback is shared by the development team.

We are really sorry for the inconvenience caused.

nausherwan.aslam · September 9, 2011, 6:56am

Hi John,

I got the following reply from our development team regarding your issue. I would like to share it with you so you can change your code accordingly to avoid the exceptions you are currently getting while reading the annotations.

As per the details from development team, when we try to iterate the collection we must use the highest possible general type for variable (C# rule). PdfDoc.Pages[i].Annotations is a collection of annotations, where Annotation is the base class for all annotations (MarkupAnnotation, LinkAnnotation) but LinkAnnotation is different from MarkupAnnotation. So, these annotations have different sets of properties (like LinkAnnotation (and some other) does not contain properties as Title and Subject), so for each value in the annotation collection we must identify the exact type and then call the properties of that type.

Following is the sample code for your reference:

Document PdfDoc = new Document("PDFTest1.pdf", "Password");
string comments = "";
for (int i = 1; i <= PdfDoc.Pages.Count; i++)
{
    try
    {
        foreach (Annotation annotation in PdfDoc.Pages[i].Annotations)
        {
            string s = string.Format(
                "{0} Page: {1} ; Title: {2} ; Subject : {3} ; Contents :{4} \n",
                comments,
                i,
                annotation is MarkupAnnotation ? ((MarkupAnnotation)annotation).Title : string.Empty,
                annotation is MarkupAnnotation ? ((MarkupAnnotation)annotation).Subject : string.Empty,
                annotation.Contents
            );
            Console.WriteLine(s);
        }
    }
    catch (Exception ex)
    {
        //Handle exception
    }
}
Console.ReadLine();

In case you have any further query, please feel free to contact us.

Thank You & Best Regards,