Extract paragraph from PDF in column layout

Hi there,

I have attached a PDF page which has paragraphs in column layout. When i try to programmatically read all the paragraphs, the Aspose tool is unable to identify the paragraphs are in column layout. Instead it is reading/extracting paragraphs horizontally.

Is there a way to extract a PDF column by column? In this page, i need to extract all the text from first column all the way to end of the page and then move on to 2nd column instead of going horizontally.

I’m using below code block.
// Open an existing PDF file
Document doc = new Document(file.FullName);
// Instantiate ParagraphAbsorber
ParagraphAbsorber absorber = new ParagraphAbsorber();
absorber.Visit(doc);

            foreach (PageMarkup markup in absorber.PageMarkups)
            {
                int i = 1;
                foreach (MarkupSection section in markup.Sections)
                {
                    int j = 1;

                    foreach (MarkupParagraph paragraph in section.Paragraphs)
                    {
                        StringBuilder paragraphText = new StringBuilder();

                        foreach (List<TextFragment> line in paragraph.Lines)
                        {                                
                            foreach (TextFragment fragment in line)
                            {                                   
                                paragraphText.Append(fragment.Text);
                            }
                            //paragraphText.Append("\r\n");
                        }
                        paragraphText.Append("\r\n");

                        Console.WriteLine("Paragraph {0} of section {1} on page {2}:", j, i, markup.Number);
                        Console.WriteLine(paragraphText.ToString());

                        j++;
                    }
                    i++;
                }

            }

My current output:

Paragraph 1 of section 1 on page 3:
2021 ESG Report

Paragraph 1 of section 2 on page 3:
Letter from CEO & Board Chair

Paragraph 2 of section 2 on page 3:
At CVS Healthr, we believe the health of our planet and the health of its people are inextricably linked. As we reimagine the health care experience, we are unwavering in our commitment to environmental, social and governance (ESG) priorities and health equity. We are a consumer-focused, purpose-driven company - that is who we are, and it is what sets us apart. As we bring our heart to every moment of your health, we are thinking broadly about what that means for individuals, communities and society.

Paragraph 1 of section 3 on page 3:
Sincerely,

Paragraph 1 of section 4 on page 3:
Our 15th annual ESG Report details key initiatives and our bold long-term goals to hold ourselves accountable to building a healthier and more equitable future for those we serve.

Paragraph 1 of section 5 on page 3:
Karen S. Lynch President and Chief Executive Officer

Paragraph 1 of section 6 on page 3:
our new Health Zones initiative, which aims to advance equity by improving health outcomes in high-risk communities across the country by adding SDOH and providing concentrated, holistic local investments. This includes numerous investments at the local level to increase access to health care, housing, education, food, labor and training and transportation.

Paragraph 1 of section 7 on page 3:
David W. Dorman Chair of the Board

Paragraph 1 of section 8 on page 3:
We’re proud that CVS Health leads the nation in both COVID-19 diagnostic testing and vaccination, having administered more than 32 million tests and more than 59 million vaccines in 2021. We brought life-saving vaccines to residents of long-term care facilities and other priority populations, and then to neighborhoods, campuses and worksites across the country. We also expanded our offerings to provide increased access to mental and virtual health care services.

Paragraph 1 of section 9 on page 3:
We also made great progress this year toward our goals for a sustainable future. Recognized as a corporate leader in climate action, we became one of the first companies to have our net-zero targets validated by the Science-Based Targets Initiative. We continue to reduce resource consumption through digital receipt options at the point of sale and pilots around reusable bag systems and non-plastic pill bottles. We have also introduced new water reduction goals, climate policies and approaches to waste management.

Paragraph 1 of section 10 on page 3:
This year we appointed our first ever Chief Health Equity Officer, who will help us create more solutions to reduce disparities and improve health outcomes. Ensuring equitable access to health care is critical now more than ever.

Paragraph 1 of section 11 on page 3:
Our work to build a Healthy 2030would not be possible without the drive and heart of our approximately 300,000 colleagues. We welcome the opportunity to collaborate with our stakeholders to achieve better outcomes for the people and communities we serve - and the planet.

Paragraph 1 of section 12 on page 3:
We’ve had a long-standing commitment to address the social determinants of health (SDOH) because we know that more than 80 percent of a person’s health is determined outside the doctor’s office. This year, we introducedPDF with Column layout.pdf (158.1 KB)

@sanjaybk

We have already noticed this issue and have logged it in our issue tracking system for the file you shared in your previous topic. Furthermore, a new ticket as PDFNET-51726 has also been logged for the source PDF that you have shared here. We will surely look into its details and let you know as soon as it is resolved. Please be patient and spare us some time.

We are sorry for the inconvenience.