Issue with saving a Pdf file as Doc

Dear Aspose Team,

We updated our Aspose.Pdf and one of our test files caught an issue.
We use Aspose.Pdf to convert the .pdf files to .doc before we process them. After the Aspose.Pdf update the spaces between words became “ꢀ” characters.

The code we are using has not changed, and produced good result in the past.

                DocSaveOptions saveOptions = new DocSaveOptions();
                saveOptions.Format = DocSaveOptions.DocFormat.Doc;
                saveOptions.RecognizeBullets = RecognizeBullets;
                saveOptions.Mode = (ConversionType == PdfConversionType.TextFlow)
                                       ? DocSaveOptions.RecognitionMode.Flow
                                       : DocSaveOptions.RecognitionMode.Textbox;
                if (ProximityIsSet) saveOptions.RelativeHorizontalProximity = RelativeHorizontalProximity;

                using (Document doc = new Document(inputFilePath))
                {
                        doc.Save(outputFilePath, saveOptions);
                }

With the Aspose.Pdf version 20.6 the export was OK. With the Aspose.Pdf version 23.10 and the currently latest 23.11.1 the result we get is with “ꢀ” characters instead of spaces.

Attachment below:
BUG-12996.ZIP (342.8 KB)

Is there a setting we miss or is this a real issue?

We are waiting for your findings/answer.

Kind regards,
Bordi Tamas

[BUG-12996]

@tbordi
image.png (23.7 KB)
Please tell me what values are set for saveOptions.RecognizeBullets, saveOptions.Mode, saveOptions.RelativeHorizontalProximity?
(this is not obvious from the code you provided).

1 Like

Sorry for the rush. We set them from our own variables, which can differ in values.
To reproduce the issue please save with the following options:

DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.Format = DocSaveOptions.DocFormat.Doc;
saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow;
saveOptions.RecognizeBullets = false;

@tbordi
Thank you for the information provided.
I will check this scenario and write to you.

1 Like

@tbordi
I ran the attached code (I commented out the saveOptions.RelativeHorizontalProximity set) for the attached document with the Aspose.Pdf library 23.11.1 in .Net 6 and when I opened the converted file in MS Word 2016, I did not see squares instead of symbols. I also opened both attached docs and they also looked identical in my environment on MS Word.
image.png (85.2 KB).
However, in the preview pane actually displays squares.
I recommend that you use the option Mode = DocSaveOptions.RecognitionMode.EnhancedFlow (replacing the output file format with docx). With this approach, there were no squares in my environment.

var saveOptions = new DocSaveOptions()
{
    Format = DocSaveOptions.DocFormat.DocX,
    RecognizeBullets = false,
    Mode = DocSaveOptions.RecognitionMode.EnhancedFlow
    // RelativeHorizontalProximity = RelativeHorizontalProximity;
};

using (var doc = new Document(dataDir + "TestFile.pdf"))
{
    doc.Save(dataDir + "TestFile-out.docx", saveOptions);
}
1 Like

With the DocSaveOptions.RecognitionMode.EnhancedFlow setting and saving the file as docx the issue seems to be gone. We will test this on multiple cases and if nothing goes wrong => there is no issue. Thanks for your time and support :smiley:

@tbordi
Thanks for your feedback.
Yes, write in case of problems.