Make Text in TextBox of DOCX Header Footer Tagged as an Artifact in Converted PDF | C# .NET

Hi,
I encountered a problem with textboxes inside headers or footers while saving a docx file as pdf via Aspose.words. You can simply reproduce the problem:

  • open blank document with MS Word
  • add a header
  • insert a textbox with some text inside the header
  • save it
    now convert/save this docx as PDF with aspose.words. I would have expected that the text or textbox is tagged as an artifict in the resulting PDF File but it isn’t. If you put “normal” text, without a textbox, inside the header aspose works fine. Even if you save the above docx as pdf via MS Word itself, the resulting PDF file has the expected artifact tag. So I am assuming that Aspose.words has some problems interpreting a textbox inside a header or footer.

Thanks in advance

Chris

@Chris1010,

I have attached sample Word and PDF files here for your reference:

Which of the PDF files do you see this problem with? The “by aspose.words 20.8 - Tagged.pdf” was generated by using the following C# code of Aspose.Words for .NET API:

Document doc = new Document("C:\\Temp\\Textbox in header.docx");
PdfSaveOptions pdfSaveOptions = new PdfSaveOptions();
pdfSaveOptions.ExportDocumentStructure = true;
doc.Save("C:\\Temp\\20.8-Tagged.pdf", pdfSaveOptions);

What PDF editor did you use to observe this issue with Aspose.Words generated PDF files? Please also provide complete steps that we can perform on our end to verify that Textbox in Header of Aspose.Words generated PDF was not tagged as an artifact. We will then investigate the issue on our end and provide you more information.

Hi,
thanks for the fast reply. I did investigate your 3 PDF-Files and only in “by ms word 2019.pdf” I have found artifacts. We do text extracting from PDF Files with “PDFlib TET - Text Extraction Toolkit”. I’ts noticeable that only pdf files saved/generated with aspose.words have no artifacts attributes in it. When I use your “by ms word 2019.pdf” File everything is fine and the the text in your textbox header has the artifacts attributes. So it seems that there must be something wrong in the aspose code!?

Kind regards
Chris

@Chris1010,

For the sake of any correction in Aspose.Words API, we have logged this problem in our issue tracking system. Your ticket number is WORDSNET-20982. We will further look into the details of this problem and will keep you updated on the status of the linked issue. Sorry for the inconvenience.

@awais.hafeez
Thanks for your investigation according this problem. Hope to hear soon from you.

Kind regards
Chris

The issues you have found earlier (filed as WORDSNET-20982) have been fixed in this Aspose.Words for .NET 21.2 update and this Aspose.Words for Java 21.2 update.