Hi Aspose,
I have noticed that reading order is not preserved when convertering from DOCX to PDF using Aspose.Words 23.4. Here is the code:
var document = new Aspose.Words.Document(stream);
var outputPdfCompliance = GetComplianceFromSetting(fileConverterProperties.OutputPdfComplianceLevel);
var saveOptions = GetDocumentSaveOptions(outputPdfCompliance);
using (var memoryStream = new MemoryStream())
{
document.Save(memoryStream, saveOptions);
}
private SaveOptions GetDocumentSaveOptions(PdfCompliance outputPdfCompliance)
{
string resourceName = "AsposeObjects.hyph_da_DK.dic";
Hyphenation.RegisterDictionary("da-DK", Util.ReadResourceFileAsStream(resourceName));
var saveOptions = new PdfSaveOptions
{
Compliance = outputPdfCompliance,
DisplayDocTitle = true,
CreateNoteHyperlinks = true,
CustomPropertiesExport = PdfCustomPropertiesExport.Metadata,
};
saveOptions.OutlineOptions.HeadingsOutlineLevels = 9;
return saveOptions;
}
outputPdfCompliance is PdfCompliance.PdfA1a.
I have attached the following three files as reference.
I use test_document.docx as testfile:
test_document.docx (84.5 KB)
When converting with the built in “Save As PDF” function from Microsoft Word, the following document is produced.
test_document_PDF_saved_with_word.pdf (94.0 KB)
When converting with Aspose.words, the following output is produced:
test_document_PDF_saved_with_aspose.pdf (96.0 KB)
The problem:
- When I read the document up with the built in “Read Aloud” function, the order is correct. The document reads the sender and reciever info before continuing on to the first and second heading:
- When I use PAC2021 to check the “Screen Reader Preview” for the PDF converted using word’s built in PDF conversion function, everything is as expected and the order is the same as the word document.
- When I use PAC2021 to check the “Screen Reader Preview” for the PDF converted using Aspose, the order is wrong. Heading 1 and its contents gets read first, followed by the sender and reciever details, followed by Heading 2 and its contents.
The document is in Danish but is formatted as a standard mail and I hope it does not affect the understanding of the problem.
Links:
Read aloud function in word:
https://support.microsoft.com/en-us/office/listen-to-your-word-documents-5a2de7f3-1ef4-4795-b24e-64fc2731b001
PAC2021:
https://pdfua.foundation/en/pdf-accessibility-checker-pac/