Reading order is not preserved when converting from Word to PDF/UA

Hi Aspose,

I have noticed that reading order is not preserved when convertering from DOCX to PDF using Aspose.Words 23.4. Here is the code:

var document = new Aspose.Words.Document(stream);
var outputPdfCompliance = GetComplianceFromSetting(fileConverterProperties.OutputPdfComplianceLevel);
var saveOptions = GetDocumentSaveOptions(outputPdfCompliance);
using (var memoryStream = new MemoryStream())
{
    document.Save(memoryStream, saveOptions);
}
private SaveOptions GetDocumentSaveOptions(PdfCompliance outputPdfCompliance)
{
    string resourceName = "AsposeObjects.hyph_da_DK.dic";
    Hyphenation.RegisterDictionary("da-DK", Util.ReadResourceFileAsStream(resourceName));
    var saveOptions = new PdfSaveOptions
    {
        Compliance = outputPdfCompliance,
        DisplayDocTitle = true,
        CreateNoteHyperlinks = true,
        CustomPropertiesExport = PdfCustomPropertiesExport.Metadata,
    };
    saveOptions.OutlineOptions.HeadingsOutlineLevels = 9;
    return saveOptions;
}

outputPdfCompliance is PdfCompliance.PdfA1a.

I have attached the following three files as reference.
I use test_document.docx as testfile:
test_document.docx (84.5 KB)
When converting with the built in “Save As PDF” function from Microsoft Word, the following document is produced.
test_document_PDF_saved_with_word.pdf (94.0 KB)
When converting with Aspose.words, the following output is produced:
test_document_PDF_saved_with_aspose.pdf (96.0 KB)

The problem:

  • When I read the document up with the built in “Read Aloud” function, the order is correct. The document reads the sender and reciever info before continuing on to the first and second heading:
  • When I use PAC2021 to check the “Screen Reader Preview” for the PDF converted using word’s built in PDF conversion function, everything is as expected and the order is the same as the word document.
  • When I use PAC2021 to check the “Screen Reader Preview” for the PDF converted using Aspose, the order is wrong. Heading 1 and its contents gets read first, followed by the sender and reciever details, followed by Heading 2 and its contents.

The document is in Danish but is formatted as a standard mail and I hope it does not affect the understanding of the problem.

Links:
Read aloud function in word:
https://support.microsoft.com/en-us/office/listen-to-your-word-documents-5a2de7f3-1ef4-4795-b24e-64fc2731b001
PAC2021:
https://pdfua.foundation/en/pdf-accessibility-checker-pac/

@nnyy10
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25355

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

2 Likes

Ok, thanks for the fast response.

1 Like

@nnyy10 The problem with reading order appears because of floating tables. As a workaround for this problem you could consider altering the document layout by avoiding floating tables. Generally it is discouraged to use tables for layout purposes in the accessible documents.

1 Like