How to locate specific Word objects once saved to PDF

We have a use case where we use Aspose.Words to mail merge data into a DOCX template and save the result as Accessible PDF.

Then, we post process that PDF by adding more content at targeted locations via other tools.

Today we append these new objects to the end of the Word doc after making some assumptions about the free space on the page.

It would be far better if we could add several invisible/white/transparent rectangular placeholders for those extra content items the Word doc and then precisely locate them later in the PDF phase.

We need a word object we can add that will take up rectangular space and flow with the word content, but also have a unique ID/tag/identifier so we can find its exact coordinates later in the PDF world.

NOTE we are NOT using Aspose.PDF toolkit at this time.

What do you suggest?

Peter

@pcleaveland I am afraid there is no direct way to achieve this. You can try using a shape object with transparent background and white borders to mark a rectangular area in the output PDF. You can use shape alternative text to add some metadata to the marked area. For example see the following DOCX and output PDF:
in.docx (15.5 KB)
out.pdf (13.8 KB)

Hi Alexey, It’s been a while but we are finally experimenting with the strategy you suggested above. I’m having trouble. The the alternate text I define for the shapes is NOT present in the resulting PDF.

NDE26125_CASE01 - In.docx (23.3 KB)

NDE26125_CASE01 - Out.PDF (107.6 KB)

All the shape objects in the docx have alternate text, but none of them have the alternate when it gets saved as PDF.

I’m currently using Aspose.Words v20.2.0.0 and saving with options.Compliance = PdfCompliance.PdfA1a

Do I need the latest version, or a different save option?

@pcleaveland You should enable exporting document structure:

Document doc = new Document(@"C:\Temp\in.docx");
PdfSaveOptions opt = new PdfSaveOptions();
opt.ExportDocumentStructure = true;
opt.Compliance = PdfCompliance.PdfA1a;
doc.Save(@"C:\temp\out.pdf", opt);

Sorry, we still get no alt text in the PDF phase.

As you can see from the attached, the Word template shows the AltText and we are setting options.ExportDocumentStructure = true and we still get no alt text on the Figure when we look at the PDF result.

We are running the latest Aspose.Words v24.9.

What are we doing wrong?

YesAltTextButNotObvious

Fascinating - it looks like the alt text IS there if I dig deep enough, inside the raw tags in Acrobat.

I discovered it because Acrobat Read Aloud feature pronounced the Alt Text, even though I could not see it in the UI as I expected.

Now I have to look with our PDF tools (not Aspose.PDF) to see if I FIND those objects programmatically in the PDF phase.

I’d still like to understand the difference between your suggested approach and the results I got, but I might have the answer I needed.

@pcleaveland It looks like the problem is in your PDF viewer. Because I see alt text in the PDF document you have attached:

I also see alt text in the tags, like on your screenshot.