Extra tags in table cells in tagged PDF (.NET)

I am working with tagged PDF and when open my pdf in reading order view in acrobat, I found that there are many weird spots appears in empty table cells. It seems that those spots are the tags that created from the end of table cell characters ("\a"). You can see them in the attached screenshot below.
2021-03-22_09h57_45.png (21.9 KB)
Can I remove these blank spots from the PDF by codes?

@jackth,

Please ZIP and attach the following resources here for testing:

  • Your simplified source Word document
  • Aspose.Words version 21.3 generated output file showing the undesired behavior
  • Your expected PDF file showing the desired output. You can create this file manually by using MS Word or any other editor.
  • A standalone simple Console application (source code without compilation errors) that helps us to reproduce this problem on our end and attach it here for testing. Please do not include Aspose.Words DLL files in it to reduce the file size.

As soon as you get these pieces of information ready, we will start investigation into your scenario and provide you more information.

1 Like

Here is the sample files which generate pdf with Aspose 21.3 tags-in-table-cells-sample-file.zip (31.2 KB)
Please use Acrobat Pro to view reading order and see the extra spots.
Code reference:

            string patha = "D:/Test/";
            myWord.Document d1 = new myWord.Document(patha + "Doc1.docx");
            var t = d1.GetChildNodes(NodeType.Table,true);
            
            PdfSaveOptions options = new PdfSaveOptions();

            options.ExportDocumentStructure = true;
            options.PageSet = PageSet.All;
            options.Compliance = PdfCompliance.PdfA1a;

            options.PageMode = myWord.Saving.PdfPageMode.UseOutlines;
            options.OutlineOptions.DefaultBookmarksOutlineLevel = 1;
            options.HeaderFooterBookmarksExportMode = HeaderFooterBookmarksExportMode.All;
             
            d1.Save(patha + "Doc1.pdf", options);

@jackth,

But, MS Word 2016 also produces similar output in PDF. Please see following screenshot and MS Word generated PDF:

So, this seems to be an expected behavior and not possible to remove table cell characters ("\a") without removing the Cells themselves.

1 Like

OK, thanks a lot.