Text Extracting from PDF

Not Flat.pdf (584.5 KB)

text extracting from pdf is not readable. see output Screenshot 2023-08-09 220154.png (215.9 KB)

@mohdr

Would you please also share the code snippet that you are using to extract the text? We will test the scenario in our environment and address it accordingly.

            Document pdfDocument = new Document(pdfPath);
            pdfDocument.Repair();
            pdfDocument.Flatten();
            TextAbsorber textAbsorber = new TextAbsorber();

            textAbsorber.ExtractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
            // Setting scale factor to 0.5 is enough to split columns in the majority of documents
            // Setting of zero allows to algorithm choose scale factor automatically
            textAbsorber.ExtractionOptions.ScaleFactor = 0.5; /* 0; */
            pdfDocument.Pages.Accept(textAbsorber);
            String extractedText = textAbsorber.Text;

@mohdr

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-55259

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.