Free Support Forum - aspose.com

Doc to Pdf - problem with formatting

Hallo,

i have following problem. We need to convert our customers doc to pdf, but there are some differences in resulting pdf. The most serious problem is with indenting of segments - for example first lines in points II. and III. in attached document.

I guess the problem is not in creating pdf, but in xml. There are attributes FirstlineIndent in heading tags. If i remove them, there are not any indents in resulting pdf. But I don´t know, why these attributes appear in that tags.

In addition, I don’t understand a formatting of source document well. It’s generated by some unknown program, and we cannot do anything with it. But customer wants pdf and wants it looks like original document.



Here is the code, i use to convert doc to pdf, but i guess, it’s not reason of this problem:



Aspose.Words.Document doc = new Document(docPath);

doc.Save(xmlPath, SaveFormat.AsposePdf);



Pdf pdf = new Pdf();

pdf.IsImagesInXmlDeleteNeeded = true;

pdf.IsTruetypeFontMapCached = true;

pdf.TruetypeFontMapPath = AppDomain.CurrentDomain.BaseDirectory + “fonts”;



pdf.BindXML(xmlPath, null);

pdf.SetUnicode();

pdf.Save(pdfPath);



I hope, you can help me. Thank you.

Hello!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for asking this. I’ll investigate the differences in DOC and PDF and find suggestions for you shortly.

In general it’s technically impossible to achieve 100% exact result of doc2pdf conversion. We usually give useful hints in particular situations to avoid severe differences. Even if documents come from third-parties we still can try any programmatic workarounds. If it’s not so difficult please specify what else than text indentation you are complaining on.

Regards,

Hello!<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

This document is really formatted with unusual approach. The strangest is why they needed section breaks where it is enough to have ordinary headings followed by text paragraphs. We could try to avoid them but it could give more pitfalls. I believe that they should have had some intent doing this. If you know that their documents steadily contain such formatting artifacts then you can patch directly Aspose.Pdf XML intermediate representation. I would use regular expressions for that, making experiments on several documents to determine what substitution is more reliable.

Regards,