Using RecognitionMode Flow causes text to be cut

Hi,
I’m using Aspose.PDF to convert to a DOCX but when using RecognitionMode.Flow the top lines of the docx get cut. I believe this is due to a strange line spacing being applied.
I need the docx to look exactly like the PDF so this isn’t good enough. Is there a way to avoid it?

I have attached original pdf and an image showing the issue.
Code below:

var path = @"C:\test\186288839te.pdf";
var pdfDocument = new Document(path);
var o = new DocSaveOptions
{
    RecognizeBullets = true,
    Format = DocSaveOptions.DocFormat.DocX,
    Mode = DocSaveOptions.RecognitionMode.Flow
};
var filePath = Path.ChangeExtension(path, "docx");
pdfDocument.Save(filePath, o);

image.png (3.2 KB)

186288839te.pdf (36.9 KB)

@roberto.silva

Would you please try and use 24.8 version with EnhancedFlow option instead of only Flow and see if it helps. We tested in our environment and obtained attached output.
output.docx (12.4 KB)

Unfortunately we don’t have a paid version up to that date but even if we did the formatting is completely different on the output document you have provided.

Can you please provide a workaround or a fix? We’ll need to use another pdf conversion tool if you can’t.

Thanks

@roberto.silva

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-58058

We will look into the details of the formatting issues and as soon as we have some updates to share with you in this regard, we will notify you via this forum thread. Please spare us some time.

We are sorry for the inconvenience.