Extracting a single line of text sometimes results in multiple text fragments (C#)

Hi all,

We are currently trying to use Aspose.PDF to extract text from a PDF file on a line-for-line, or paragraph, basis.
However we discovered that extracting the text fragments sometime results in multiple fragments for a single line of text.

For example, the line "Op dit moment ondervind ik veel geluidsoverlast van het bedrijf Jimmie’s Pizza gevestigd aan de " results in these text fragments:

  1. Op dit moment ondervind ik veel geluidsoverlast van het bedrijf
  2. Jimmie
  3. s Pizza
  4. gevestigd aan de

We have tried multiple approaches to extract text on a line-by-line basis, through the TextFragmentAbsorber and the ParagraphAbsorber but both yield the same result.

Is there an alternative method we can use to solve this requirement?

The test code I used with the TextFragmentAbsorber:

byte[] pdfFile = File.ReadAllBytes(@"Voorbeeld brief maskeren.pdf");

            // Convert the byte array to a memorystream so it can be processed

            MemoryStream payloadStream = new MemoryStream(pdfFile);

            // Import in Aspose

            Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(payloadStream);

            // Retrieve all the textfragments inside the PDF

            Aspose.Pdf.Text.TextFragmentAbsorber absorber = new Aspose.Pdf.Text.TextFragmentAbsorber();


            Aspose.Pdf.Text.TextFragmentCollection textFragmentCollection = absorber.TextFragments;

            int lNr = 0;

            foreach (Aspose.Pdf.Text.TextFragment textFrag in textFragmentCollection)


                // Get the text fragment

                var textFragment = textFrag.Text;

                Console.WriteLine("Line nr: " + lNr.ToString() + " text: " + textFragment);



Also added the test PDF we are processingVoorbeeld brief maskeren.pdf (70.3 KB)


We suggest you please read the following article to achieve your requirements. Hope this helps you.
Extract Paragraph from PDF C#