Validate PDF object id cross reference

LionRednose · August 5, 2024, 12:39pm

Hey Aspose,

I am building a project that helps guide our customers through PDF UA compliancy. For this I’m using the Validate method.
Since this method also returns an object Id, I was hoping to cross reference this object inside the Document object to supply further insight but these IDs don’t seem to be in there at all.

I found the GetObjectById method in the documentation but I can only get it to return NULL.

So in short: Is there any way to find what these object IDs are referring to inside Aspose.PDF?

Kind regards,

Lion

Professionalize.Discourse · August 5, 2024, 12:39pm

To cross-reference object IDs returned by the Validate method in Aspose.PDF, you can utilize the GetObjectById method. However, if you are encountering issues where this method returns NULL, it may be due to the way the objects are structured or referenced within the PDF document.

Here are some steps you can take to troubleshoot and potentially resolve the issue:

Ensure Object ID Validity: Verify that the object ID you are trying to reference is valid and exists within the PDF document. Object IDs are typically assigned sequentially, so ensure that the ID you are using corresponds to an actual object in the document.
Check Object Type: The GetObjectById method may return NULL if the object ID refers to a type of object that is not directly accessible or is not supported by the method. Make sure that the object you are trying to access is of a type that can be retrieved.
Inspect the PDF Structure: Understanding the structure of the PDF can help. PDF files consist of various objects, including dictionaries, streams, and arrays. If the object ID refers to a stream or a complex object, you may need to navigate through the document structure to find the relevant information.
Use the Document Object: If you have access to the Document object, you can explore its properties and methods to find more information about the objects contained within. This may provide additional context or help identify the object you are looking for.
Debugging: Implement logging or debugging to capture the object IDs being processed and their corresponding types. This can help identify patterns or specific cases where the GetObjectById method fails.

If you continue to experience difficulties, consider reaching out to the Aspose support team with specific details about the object IDs and the context in which you are using them. They may provide further insights or solutions tailored to your situation.

For more detailed information about PDF object structures and methods, you can refer to the Aspose documentation on PDF files.

Sources:
[1]: What is PDF file? | Knowledge Base - Aspose Documentation

LionRednose · August 5, 2024, 2:46pm

Appreciate the quick response.

I’ll try to get more specific since I couldn’t figure it out after going through your troubleshoot steps.
I’m using the following basic pdf.
test.pdf (102.8 KB)

Running the following validation code returns 6 issues and references ObjectId 18, 21 and 36.

document.Validate(dstStream, PdfFormat.PDF_UA_1);

I would expect to be able to find them running something like this but it returns null.

document.GetObjectById("18");

I can find the objects manually like so, but they don’t have an ID as far as I can tell.

document.Pages[1].Annotations[1];
document.Pages[1].Resources.Images;

ilyazhuykov · August 5, 2024, 3:57pm

@LionRednose
As far as I remember, object ids are refering to base level structure objects of PDF that usually converted to more user-friendly constructions

Could you share document that you mentioned so we can investigate?

LionRednose · August 5, 2024, 4:56pm

@ilya.zhuykov
Of course, here it is.
test.pdf (102.8 KB)

ilyazhuykov · August 5, 2024, 5:12pm

@LionRednose
Thanks, I’ll write as soon as I investigate this

ilyazhuykov · August 6, 2024, 3:39pm

@LionRednose
By first glance - you won’t get this objects by IDs mentioned in report with document.GetObjectById
As I understood objects on page may have a string identificator e.g “testImage” and this method helps to extract objects in such scenario
it’s usually a case for xml->pdf import where such IDs stated explicitly
Unfortunately it’s not the case in your situation
I’ll try to contact development team
if there’s no any solution/workaround for such situation, I’ll add a new ticket

ilyazhuykov · August 8, 2024, 10:58am

@LionRednose
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-57826

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

In short, any work with basic pdf objects is closed and it’s not expected from user to extract or operate with such information

it seems that solution should be something like following :

pdfDocument.Convert(convert, PdfFormat.PDF_UA_1, ConvertErrorAction.Delete);

It should remove non-content specific problems
In your case, however, problems are tied to content so it doesn’t help so I added task for development team

if you want to get some information about mentioned objects, you can try to use CosEdit to see basic PDF structure
but it will require to understand PDF documentation to some extent

LionRednose · August 13, 2024, 3:19pm

@ilyazhuykov
Appreciate the response Ilya, thank you.

The CosEdit recommendation ended up being very helpful, I believe that they rebranded to Apryse /PDFTron.

For anyone else who stumbles upon here, I managed to extract the URL by object id using the following example, specifically the “The following code snippet traverses all annotations in the document” part at the top.

It does require a small amount of setup beyond the example but I found the documentation quite good.