Hello,
I am experiencing significant issues when converting a PDF to Word using Aspose.PDF 25.8.0. The output DOCX introduces formatting and rendering problems that make the converted document difficult to use.
Attached files:
pdf_with_issues.pdf (881.9 KB)
pdf_with_issues_explained.pdf (1.2 MB)
Observed issues include:
- Characters replaced by blocks instead of spaces.
- Symbols cut off or misaligned.
- Text cut off at top/bottom of lines.
- Wrong fonts applied, italics missing, bold/underline not respected.
- Unicode / special characters not handled correctly:
- Some are incorrectly bolded or missing.
- In several places, Aspose.PDF inserts invalid Unicode characters into the Word document, which display as boxes and cannot be copied/searched reliably.
- Titles cut off too early or overlapping.
- Spacing, line-height, and missing characters in multiple places.
What I already tried with Aspose.PDF (v25.8.0):
- Standard conversion with Document.Save(…, SaveFormat.DocX).
- DocSaveOptions with Flow mode → text more editable but introduced other layout problems.
- DocSaveOptions with EnhancedFlow mode → improved structure in some areas, but formatting issues persisted.
- Verified that formatting issues persist regardless of mode or page selection.
- Tried replacing/removing invalid Unicode characters in the Word document, but this proved a dead end since the corruption occurs during conversion, not post-processing.
From my investigation, it does not appear that these issues can be solved simply by setting properties or using different API options — the problems seem to come from the conversion engine itself. That said, if I am mistaken, I would appreciate a concrete code example that demonstrates how to achieve a clean conversion.
Thank you for your assistance.
Best regards,
Dennis van der Pool
@dennisvanderpool
Hi Dennis,
I understand that you’re facing significant challenges with the PDF to Word conversion using Aspose.PDF 25.8.0. The issues you’ve described, such as characters being replaced by blocks, misaligned symbols, and formatting inconsistencies, can indeed be frustrating.
To address these issues, I recommend the following steps:
-
Update to the Latest Version: Ensure that you are using the latest version of Aspose.PDF. Newer releases often include improvements and bug fixes that could resolve the issues you’re experiencing.
-
Use Different Save Options: Since you’ve already tried various DocSaveOptions, consider experimenting with the following:
- DocSaveOptions with
UseAntiAliasing: This can help with rendering issues.
- DocSaveOptions with
OptimizeOutput: This option can reduce formatting inconsistencies.
-
Check Font Availability: Verify that the fonts used in the original PDF are available on the system where the conversion is taking place. Missing fonts can lead to incorrect rendering.
-
Post-Processing: If the conversion still results in invalid Unicode characters, implement a post-processing step to clean up the output. This could involve replacing or removing problematic characters programmatically.
-
Sample Code: Here’s a basic example of how to set up the conversion with DocSaveOptions:
Document pdfDocument = new Document("input.pdf");
DocSaveOptions saveOptions = new DocSaveOptions
{
Format = DocSaveOptions.DocFormat.DocX,
UseAntiAliasing = true,
OptimizeOutput = true
};
pdfDocument.Save("output.docx", saveOptions);
If these steps do not resolve the issues, I recommend reaching out to Aspose support with the specific PDF files and a detailed description of the problems. They can provide more targeted assistance.
Please let me know if this helps or if you have any further questions. I’m here to assist you!
Best regards!
Thank you for your reply.
Just to clarify:
I am already using Aspose.PDF 25.8.0 (the latest available at the time of writing).
I have tested with different DocSaveOptions, including Flow, EnhancedFlow, and Frame modes.
Flags such as UseAntiAliasing and OptimizeOutput do not resolve the issues.
Font availability has been checked — the same problems occur even on systems where the required fonts are present.
Post-processing of the DOCX output (removing/replacing invalid Unicode characters) has also been attempted, but this is a dead end since the corruption is introduced during conversion.
Because of this, I believe these issues cannot be solved by simply toggling properties or options. They seem to originate in the conversion engine itself.
Could you please escalate this to the Aspose.PDF development team?
I would appreciate either:
confirmation that this is a known limitation, or
a concrete code example that demonstrates how to achieve a clean conversion with the current API (if it is possible).
Thank you for your support.
@dennisvanderpool
We have tested the issue in our environment using version 25.8 of the API and were able to reproduce it. Based on initial observations, the problem appears to be related to font processing. However, a deeper investigation is required to determine the root cause and either resolve the issue or provide a suitable workaround. To address this, we have created a ticket (PDFNET-60541) in our issue tracking system and linked it to this forum thread. Rest assured, we will thoroughly examine the matter and keep you updated once a solution is available.
We apologize for any inconvenience this may have caused and appreciate your patience.
1 Like