Text layer missing when converting Pdf to Docx

Hello,

I have attached a Pdf document that, when opened as an Aspose.Words.Document, does not capture the text layer.

If I decompressing the Pdf stream, then I can see that the text data is available (see the content stream at object 26 0 and the character id key at 27 0).

Why is the text layer missing in this case?

You can repro this issue with this snippet. I am using Aspose.Words 24.10.0

var document = new Document("./Document1.pdf");
document.Save("./Document1.docx", SaveFormat.Docx);

Document1.pdf (107.1 KB)

Regards,
Draftable.

@draftable
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-27512

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

The issues you have found earlier (filed as WORDSNET-27512) have been fixed in this Aspose.Words for .NET 24.11 update also available on NuGet.

That’s great news. We already tried it and it works. Thank you very much.

1 Like