Please see attached a PDF file (Intput.pdf) which is simply opened as a Document object and saved again as a PDF using the latest version of Aspose.Words 22.4.0
var doc = new Document("Input.pdf");
doc.Save("Output.pdf");
Doing so causes various indents and spacing issues in centered text and in list items, which can be seen when comparing the two documents side by side. Some examples are attached in pictures.
Input.pdf (126.7 KB)
Output.pdf (105.2 KB)
image.png (10.2 KB)
image.png (28.4 KB)
@ssmolkin1 Thank you for reporting this problem to us. It has been logged as WORDSNET-23708. We will keep you informed and let you know once it is resolved.
You should also node that upon importing PDF document, Aspose.Words converts Fixed Page representation of PDF document to Flow representation, which is natural for MS Word document.
Thanks, Alexey. I think the issues have to do with some gaps in the conversion to flow format. Attached is the Input.pdf file converted to DOCX format with Aspose and with MS Word. The version converted with Word doesn’t have the formatting issues that the Aspose conversion does. A couple things I notice with the Aspose conversion:
- Text that is centered is not converted to centered format, instead it is converted to left-aligned text with custom indentation on each line
- List items are not actually converted to list items, instead the list numbers are written with plaintext and there is custom indentation applied per line.
MS Word conversion is better, in that the centered text in the PDF is formatted as centered in the converted file, rather than intended, and the list items are converted to an actual list in Word.
Input.pdf (126.7 KB)
PdfToDocxWithAspose.docx (21.1 KB)
PdfToDocxWithWord.docx (25.0 KB)
@ssmolkin1 Thank you for additional information. I have added this information to the defect. We will keep you informed and let you know once it is resolved or we have information for you.