Optimize Document Contents for MS Word 2016 and 2019 | DOCX to PDF Conversion using Java

Hello, I have problem with generating PDF and DOCX files based on the same generation code.

It looks problem is conflict between CompatibilityOptions and StructuredDocumentTag for PDF when we use non breaking space characters.
In my project I use:

getCompatibilityOptions().optimizeFor(MsWordVersion.WORD_2016)

My usecase is very simple generate PDF with some kind of predefined text which contains nonbreaking spaces and checkboxes. Result should be the same for PDF generation and for DOCX generation.

For PDF when we use optimize version for Word >2007 nonbreaking space stop work as expected for example. For following Java code:

Document doc = new Document();
doc.getCompatibilityOptions().optimizeFor(MsWordVersion.WORD_2016);
DocumentBuilder builder = new DocumentBuilder(doc);
Run run = new Run(builder.getDocument());
run.setText("aaa aaaaaaaaaaa aa a aaaaa aaaaaaaaa aaaaa a aaaaaa aaaaaa aaaaaa aaaaaa aaaaaa).\u00A0\u00A0\u00A0\u00A0Yes\u00A0\u00A0");
builder.getCurrentParagraph().appendChild(run);
builder.getCurrentParagraph().appendChild(new StructuredDocumentTag(builder.getDocument(), SdtType.CHECKBOX, MarkupLevel.INLINE));
run = new Run(builder.getDocument());
run.setText("\u00A0\u00A0\u00A0\u00A0No\u00A0\u00A0");
builder.getCurrentParagraph().appendChild(run);
builder.getCurrentParagraph().appendChild(new StructuredDocumentTag(builder.getDocument(), SdtType.CHECKBOX, MarkupLevel.INLINE));

doc.save("output.docx");
doc.save("output.pdf");

We receive following DOCX (with PILCROW to show white spaces)
image.png (5.7 KB)

And following PDF:
image.png (9.5 KB)

I cannot find compatibility option which bring back nonbreaking space in PDF.

Can you suggest what can I do? Changing optimization is not an option for my case.

I use com.aspose:aspose-words:21.3 and com.aspose:aspose-pdf:21.3

@Damian_Gronczewski

We have tested the scenario using the latest version of Aspose.Words for Java 21.5 and have not found any issue with output DOCX and PDF. Please check the attached output documents. So, please use Aspose.Words for Java 21.5.
output.docx (7.9 KB)
output.pdf (19.4 KB)

Hello, In attached pdf you have different results that it is in DOCX. In PDF non-breaking space does not work in newest version also and problem still exists.

As result I want to EXACTLY the same behaviour in PDF and DOCX.

@Damian_Gronczewski

Please check the attached screenshots of DOCX and PDF. Both outputs are same.
DOCX.png (22.9 KB)
PDF.png (72.5 KB)

Could you please share some detail on this issue along with screenshots of problematic sections of output document?

Attached screens by you are even worst, code snippets from the first post renders 2 text Runs with 2 StructuredDocumenTags and few last characters are \u00A0 which is non-breaking space which should block wrapping inside this part of text. In Word everything works as expected. In PDF when we try to render with optimizeFor(MsWordVersion.WORD_2016) we have problem but when we completly ignore optimizeFor PDF also respects non-breaking spaces so I think we have problem with compability between optimizeFor for word>2007 and PDFs. I checked previous version of word in optimizeFor and in that case PDFs are also fine. I know that optimizeFor for Word and rendering PDF looks strange but main business requirement is that we have EXACTLY the same results in both files.

image.png (23.6 KB)

@Damian_Gronczewski

To ensure a timely and accurate response, please attach the following resources here for testing:

  • Please share MS Word version that you are using.
  • Please attach the output Word file that shows the undesired behavior.
  • Please attach the expected output Word file that shows the desired behavior.
  • Please attach the output PDF file that shows the undesired behavior.
  • Please attach the expected output PDF file that shows the desired behavior.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

Hello,

Below you can find all requested files with java code which generate it. Both docx files are the same only pdfs are different. I use Word 2019 but it looks like not relevant.
229397.zip (137.0 KB)

@Damian_Gronczewski

Thanks for sharing the documents. We are investigating this issue and will get back to you soon.

@Damian_Gronczewski

When saving document to PDF, Aspose.Words mimics the behavior of MS Word 2019. So, you are getting the expected behavior of Aspose.Words.

When you use above line of code, the PDF output should be according to MS Word 2016. Please perform the same scenario using MS Word 2016, you will get the same output.

No I did not receive in PDF expected behaviour. Line which you mention break PDF export which should not happen in previous attachments you can see what I want to achieve in PDF and this Word compatibility breaks non-breaking space logic in PDF which must be fixed.

@Damian_Gronczewski

The code expected.java generates the correct output. If you open the expected.docx in MS Word 2019, the output is same as expected.pdf. Please check the screenshots of expected output.
Expected PDF.png (79.1 KB)
Expected DOCX.png (22.0 KB)

The code invalid.java also generates the correct output. Please check the attached screenshots of PDF and screenshot of MS Word 2016.
MS Word 2016 output.png (27.5 KB)
PDF output.png (69.3 KB)

Please open the document that is generated with this line of code in MS Word 2016. If you open the document in MS Word 2019, the layout should be according to MS Word 2019.

Hope this answers your query.