Aspose.PDF Labels

stevoskimk · February 6, 2025, 3:49pm

I have a question about extracting PDF labels with the Aspose.PDF library.

I have a very well defined PDF document which contains labels in order to specify the print page numbers in the documents, using a combination of both Roman and Arabic numerals. In a pdf viewer the page numbers are rendered in the right way, so the “metadata” is there.

For example

image.png (292.3 KB)

I need to extract this number/symbol information and map it to the PDF Page Number Aspose.PDF returns when iterating through the pages in a loop, because for example PDFPage number 10 can have a print page number 13, or xiii…

test document

I’m trying to achieve this with the following code snipped, but I’m getting very strange and incomplete results. In some of my documents only a subset of the labels is returned, on others only the information from the first page, or no labels at all are retrieved, etc…

using (Stream stream = File.OpenRead("1.pdf"))
{
    Aspose.Pdf.Document document = new(stream);

    for (int i = 0; i < 100; i++)
    {
        var pageNumber = i + 1;

        Aspose.Pdf.PageLabel label = document.PageLabels.GetLabel(i);

        Aspose.Pdf.Document pageDocument = new();
        var page = document.Pages[pageNumber];
        pageDocument.Pages.Add(page);
        ...
    }
}

Can this be because of a licensing issue with Aspose.PDF(currently trying it on a trial license to make sure it works before proceeding with a purchase), a bug, or maybe there’s another way to extract this information with Aspose.PDF?

Itext7 manages to extract the labels on all my test documents without an issue, so I don’t think there is a problem with the documents.

asad.ali · February 6, 2025, 10:32pm

@stevoskimk

We are checking it and will get back to you shortly.

asad.ali · February 7, 2025, 8:12am

@stevoskimk

We have checked your PDF document and we need to investigate this requirement in details to check its feasibility. Can you please share how you produced this PDF file or have you received it from somewhere?

stevoskimk · February 7, 2025, 8:49am

Hello,

The example document is a book I have been given a very long time ago as part of some professional courses I was undertaking.

The client I’m working and considering aspose.pdf for though has a lot of documents with the same structure as this one which unfortunately I can’t share because of copyrights.

Here’s an example in screenshots from one of their documents though, Aspose.PDF manages to extract 64 labels while itext7 manages to extract 450, from each page…

results.png (271.4 KB)

code.png (103.0 KB)

We were really hoping to use Aspose.PDF as it also offers HTML to PDF conversion in the library which we will eventually need, but this above is a bit of a showstopper…

Looking forward to your response.

asad.ali · February 7, 2025, 7:16pm

@stevoskimk

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-59240

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.