Detect overflow/continuing paragraph

Hola,

I’m currently extracting the text from this sample.pdf (13.9 KB) with ParagraphAbsorber page by page like this:

Document pdfDocument = new(file)
foreach (var page in pdfDocument.Pages)
     { 
         paragraphAbsorber.Visit(page);
     }

The problem I’m running into is that the last paragraph of page 1 is overflowing to page 2. I know that a MarkupParagraph class has ContinuationPageNumbers and SecondaryPoints which seems to be of use in this issue. However, when running with my sample pdf file, these two properties of the last MarkupParagraph are null. I’m not sure how to resolve this while using ParagraphAbsorber page by page.

Thank you for any help or advice!

@nnguyen9644

Can you kindly share the complete sample code snippet that you are using to extract paragraphs and that can be used to observe the null properties about which you are describing? We will log an issue accordingly in our issue management system and share the ID with you.

Sorry for the delay, here is a simple snippet that I used:

Document pdfDocument = new(file);
ParagraphAbsorber paragraphAbsorber = new();
foreach (var page in pdfDocument.Pages)
     { 
         paragraphAbsorber.Visit(page);
         foreach (MarkupSection section in paragraphAbsorber.PageMarkups[0].Sections)
             {
                  foreach (MarkupParagraph paragraph in section.Paragraphs)
                        {
                                Console.WriteLine("{0},{1}",paragraph.ContinuationPageNumber,SecondaryPoints);
                        }
             }
     }

@nnguyen9644

We need to further investigate whether the feature you require is feasible or not. For the purpose, an investigation ticket as PDFNET-51709 has been logged in our issue management system. We will further look into its details and let you know once it is resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.