Aspose.PDF.Drawing Converting PDF to Docx in Linux word splitting issue

IBurzoEvoRWS · June 20, 2025, 5:38am

Hello.
Library used here is
Aspose.PDF.Drawing [25.6.0], but tested with [25.5.0] as well.

We are using linux containers using amazon linux 2023 base image, and when trying to convert
a PDF document to Docx, we get some weird behavior that is not the same under windows.
It seems that random words in the generated Docx get new empty spaces mid-words, for example the text “Thomas Edison” in PDF results in "T homas E dison " in the Docx generated under linux.

Attached here is an archive with the original PDF used “Thomas Edison.pdf” as well as the generated Docx (the docx names contain the versions tested, both 25.5.0 and 25.6.0)
Edison.7z (1010.0 KB)

The bigger problem here is that the hyperlinked words are also “split” and thus resulting in a lot of hyperlink duplication in the docx itself, leading to issues on our end when we extract the content and/or further convert the docx to HTML using Aspose.Words

The conversion code used is the following:

Assembly asposePdfAssembly = Assembly.GetAssembly(typeof(Aspose.Pdf.Document));
Version asposeVersion = asposePdfAssembly.GetName().Version;
var pdfFilePath = “/app/Thomas Edison.pdf”;
var asposePdfSavePath = “/app/Thomas Edison”;

using (var pdfDocument = new Aspose.Pdf.Document(pdfFilePath))
{
var saveOptions = new DocSaveOptions
{
Format = DocSaveOptions.DocFormat.DocX,
Mode = DocSaveOptions.RecognitionMode.Flow,
RecognizeBullets = true,
AddReturnToLineEnd = false,
RelativeHorizontalProximity = 2.5f
};
try
{
pdfDocument.Save($“{asposePdfSavePath}{asposeVersion}.docx”, saveOptions);
Console.WriteLine(“<> OK”);
}
catch (Exception ex)
{
throw ex;
}
}

Please confirm if this reproduces on your end or do we need to change something regarding our options, thank you!

asad.ali · June 20, 2025, 10:45am

@IBurzoEvoRWS

The code snippet seems fine and this issue looks like related to specific PDF document.

We have opened the following new ticket(s) in our internal issue tracking system for further analysis and investigation. We will investigate and will deliver the fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-60150

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.