Extract Footers/Headers from pdf file

Hi,


I can’t find in your API option to extract footers and headers from pdf file.
I have seen it was developed: PDF NEWNET-31086
but i have not seen any example how it can be done.
I am using Aspose.Pdf for .net version 9.5.0.0

Thanks.
Alex B

Hi Alex,

Thanks for your inquiry. Please note PdfContentEditor was improved to get text of footer/header stamps. Text property was added to StampInfo. For example the following code will print text of all stamps on 1st page of the document.

PdfContentEditor pce = new PdfContentEditor();
pce.BindPdf("input.pdf");
StampInfo[] infos = pce.GetStamps(1);
foreach (StampInfo si in infos)
{
    Console.WriteLine(si.Text);
}

Note. If you need to get text of header/footer added with Adobe Acrobat (not stamps added by Aspose software) you should use Page.Artifacts property to read header and footer artifacts on the page.

foreach (Artifact artifact in doc.Page[1].Artifacts)
{
    if (artifact.Subtype == Artifact.ArtifactSubtype.Header || artifact.Subtype == Artifact.ArtifactSubtype.Footer)
    {
        Console.WriteLine(artifact.Text);
    }
}

Please feel free to contact us for any further assistance.

Best Regards,

Hi

Thank you for your answer. I need to get text from the header and footer of different PDF files which were created in some way that I really don’t know. It can also be from different applications.

I have tried your code on a PDF file which was created by Word and I get NullReferenceException. Then I try to get:

    pdfDocument.Pages[1].Artifacts

My code:

Document pdfDocument = new Document(@"doc2.pdf");

foreach (Artifact artifact in pdfDocument.Pages[1].Artifacts)
{
    if (artifact.Subtype == Artifact.ArtifactSubtype.Header || artifact.Subtype == Artifact.ArtifactSubtype.Footer)
    {
        Console.WriteLine(artifact.Text);
    }
}

I have attached a link to the tested file: https://nfil.es/rV4sfA/

Thanks,
Alex B

Hi Alex,

I have tested the scenario and I am able to reproduce the same problem. For the sake of correction, I have logged it in our issue tracking system as PDFNEWNET-37367. We will investigate this issue in detail and will keep you updated on the status of a correction.

We apologize for your inconvenience.

Hi,

I have tried other documents with the same code, but I can’t manage to get text from headers and footers.

Any advice?

Files for test: https://nfil.es/lHSa9d/

Regards,
Alex B

![Quote Icon](/community/Themes/default/images/icon-quote.gif) **abarmak:**
I have tried others documents the same code not failed but i can't manage to get text from headers and footers.

Any advice?

Files for test: https://nfil.es/lHSa9d/

Hi Alex,

Thanks for sharing the details.

I have tested the scenario and have managed to reproduce the same issue that Text is not being extracted from Header/Footer section of earlier shared PDF files. For the sake of correction, I have separately logged it in our issue tracking system as PDFNEWNET-37371. We will investigate this issue in details and will keep you updated on the status of a correction.

We apologize for your inconvenience.```

Hi Alex,

Thanks for your patience. We have investigated the PDFNEWNET-37371 issue and found that your document contains artifacts but they are not of type “Header”. Artifact has non-standard type “Pagination”. You can check this type with “CustomType” property as following.

if (artifact.CustomType == "Pagination")

{
    Console.WriteLine(artifact.Text);
}

Please feel free to contact us for any further assistance.

Best Regards,

The issues you have found earlier (filed as PDFNEWNET-37367;PDFNEWNET-37371) have been fixed in Aspose.Pdf for .NET 9.7.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.