PDF paragraphs extraction incorrectly

Hi there.

My application needs to extract paragraphs from PDF files.

In some PDF files it works just fine, but in some of them every line of the document is recognized as a paragraph.

For example, in this file: PASSE PAG (F&O) 2018.pdf (433.3 KB)

In this Screenshot (226.6 KB) we can see on the right the original paragraph and on the left how it was extracted using ASPOSE.PDF for .NET.

My code for paragraph extraction is written exactly as described in option 2 on this documentation: Extract Paragraph from PDF C#|Aspose.PDF for .NET

Can you guys help me out here?

Thanks.

@dionisioleonardo

Thanks for contacting support.

We were able to replicate the issue in our environment using your shared PDF document. Therefore, we have logged an issue as PDFNET-45591 in our issue tracking system for the sake of detailed investigation. We will look in details of the issue and keep you informed with the status of its resolution. Please be patient and spare us little time.

We are sorry for the inconvenience.

Hi there.

Any updates on this?

Thanks.

@dionisioleonardo

Thanks for your inquiry.

I am afraid that earlier logged issue is not yet resolved due to other pending issues in the queue. The issue was logged under normal support and it has low priority unlike priority support model where issues are resolved on urgent basis. As soon as we have some definite updates regarding resolution of the issue, we will surely let you know. Please spare us little time.

We are sorry for the inconvenience.

[Is](javascript::wink: [there](javascript::wink: [any](javascript::wink: [update](javascript::wink:
Is there any update

@dongsp

Regretfully, the issue could not get resolved yet. We will surely post an update in this thread as soon as some significant progress is made towards ticket resolution. Please give us some time.

We are sorry for the inconvenience.

The issues you have found earlier (filed as PDFNET-45591) have been fixed in Aspose.PDF for .NET 24.4.