Text Extracting from PDF

mohdr · August 9, 2023, 4:29pm

text extracting from pdf is not readable. see output Screenshot 2023-08-09 220154.png (215.9 KB)

asad.ali · August 9, 2023, 10:33pm

Would you please also share the code snippet that you are using to extract the text? We will test the scenario in our environment and address it accordingly.

mohdr · August 10, 2023, 2:28am

            Document pdfDocument = new Document(pdfPath);
            pdfDocument.Repair();
            pdfDocument.Flatten();
            TextAbsorber textAbsorber = new TextAbsorber();

            textAbsorber.ExtractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
            // Setting scale factor to 0.5 is enough to split columns in the majority of documents
            // Setting of zero allows to algorithm choose scale factor automatically
            textAbsorber.ExtractionOptions.ScaleFactor = 0.5; /* 0; */
            pdfDocument.Pages.Accept(textAbsorber);
            String extractedText = textAbsorber.Text;

asad.ali · August 10, 2023, 10:12am

@mohdr

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-55259

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.