I have a question about extracting PDF labels with the Aspose.PDF library.
I have a very well defined PDF document which contains labels in order to specify the print page numbers in the documents, using a combination of both Roman and Arabic numerals. In a pdf viewer the page numbers are rendered in the right way, so the “metadata” is there.
For example
image.png (292.3 KB)
I need to extract this number/symbol information and map it to the PDF Page Number Aspose.PDF returns when iterating through the pages in a loop, because for example PDFPage number 10 can have a print page number 13, or xiii…
I’m trying to achieve this with the following code snipped, but I’m getting very strange and incomplete results. In some of my documents only a subset of the labels is returned, on others only the information from the first page, or no labels at all are retrieved, etc…
using (Stream stream = File.OpenRead("1.pdf"))
{
Aspose.Pdf.Document document = new(stream);
for (int i = 0; i < 100; i++)
{
var pageNumber = i + 1;
Aspose.Pdf.PageLabel label = document.PageLabels.GetLabel(i);
Aspose.Pdf.Document pageDocument = new();
var page = document.Pages[pageNumber];
pageDocument.Pages.Add(page);
...
}
}
Can this be because of a licensing issue with Aspose.PDF(currently trying it on a trial license to make sure it works before proceeding with a purchase), a bug, or maybe there’s another way to extract this information with Aspose.PDF?
Itext7 manages to extract the labels on all my test documents without an issue, so I don’t think there is a problem with the documents.