Saving docx as pdf breaks layout

sample.zip (1.6 MB)
Hi,

we are using Aspose.Words to save Word Documents as pdf files, like so:

var doc = new Aspose.Words.Document(wordFile);
string pdfFile = wordFile.Replace(".docx",
“.pdf”);
doc.Save(pdfFile,new Aspose.Words.Saving.PdfSaveOptions
{
SaveFormat = Aspose.Words.SaveFormat.Pdf,
Compliance = PdfCompliance.PdfA1b
});

The result pdf is not satisfying. I will add a sample docx. The whole layout is been broken specially on the last pages. Everthing looks fine when using standard Word conversion. Hope you can help us out. We are using Aspose.Words 18.1.0.0.

Best regards

@david.hofmann.schleu

Thanks for your inquiry. We have tested your shared document using the latest version of Aspose.Words 18.11 and found no issue. Please upgrade to the latest version of Aspose.Words 18.11. If you still face problem, please attach Aspose.Words generated document showing undesired behavior.

Please check document for your reference.sample_18.11.zip (486.3 KB)

Hi mannanfazil,

thank you for your quick reply.
I noticed the same issues in your generated document.
Please check page 9-12.

@david.hofmann.schleu

Please note that Aspose.Words mimics the same behavior as MS Word does. Kindly compare MS Word and Aspose.Words generated PDF files. Both are showing same behavior. It seems that there is an issue in input Word document.

Please check Ms Word generated PDF for your reference.sample_converted_by_MS Word.zip (483.3 KB)

I would expect the same behaviour indeed, but there is a difference on page 9-12 only in aspose conversion.
I am using MS Word 2016. I will investigate in Input issues on the word doucment too.

Please check Word generated PDF for your reference.sample_converted_by_MS Word2016.zip (613.7 KB)

@david.hofmann.schleu

We also used MS Word 2016 to generate PDF but in provided Input document, there are some text is missing (e.g 4.2). Please check images for your reference.Images with missing text Kindly update Word document with proper text and then test. If still you face the issue then share updated Word document with us for testing.

Thanks

Hi and thank you for your Response. My document layout is still broken, but it seems to be a MS Word related issue. I noticed following behaviour. When opening the docx Document in protected view the layout is corrupted and looks exactly like after converting the document to pdf. However everything is looking fine in edit mode. You Can reproduce this with my sample document (e.g. check section 4.2).
Have you any advice how to handle this issue?

Best regards

@david.hofmann.schleu,

We are checking this scenario and will get back to you soon.

@david.hofmann.schleu,

Please first check the following outputs produced on our end by using OpenOffice 4.1.6, MS Word 2019 and Aspose.Words 19.5 versions.

These all have issues which indicates that there is something wrong in the document itself.

Regarding the problem with section 4.2, you can use the following code to fix this issue:

Document doc = new Document("E:\\sample\\sample.docx");

foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (para.ToString(SaveFormat.Text).Trim().StartsWith("4.2"))
    {
        para.ParagraphFormat.LeftIndent = 0;
        para.ParagraphFormat.FirstLineIndent = -0.2 * 72;
    }
}

doc.Save("E:\\sample\\19.5-.pdf");