Hello,
I'm struggling to be able to read a PDF with 2 columns per page.
As you can read a PDF (version 9.3.0.0) file that has two columns with the exception of the header and footer? I can not find a code sample!
Hi Marco,
Thanks for your inquiry. We will appreciate if you please share your sample Pdf document and sample code. We will investigate it at our end and will provide you more inforamtion accorindly.
We are sorry for the inconveneicne caused.
Best Regards,
Hi, Tilal
I am facing the problem too.
- Can we extract the text contexts from a two columns PDF file.
- Then, How about the highlighted text, if the highlighted text is crossing two columns? Can we extract that?
the attached file is a sample PDF file.
Thank you.
bams-d-15-00314.1.pdf (9.4 MB)
Thank you for contacting support.
You may extract the text from multiple column PDF document be it highlighted or not. Please try using below code snippet in your environment and then share your kind feedback with us.
Document pdfDocument = new Document(dataDir + "bams-d-15-00314.1.pdf");
TextAbsorber textAbsorber = new TextAbsorber();
textAbsorber.ExtractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
//Setting scale factor to 0.5 is enough to split columns in the majority of documents
//Setting of zero allows to algorithm choose scale factor automatically
textAbsorber.ExtractionOptions.ScaleFactor = 0.5;
pdfDocument.Pages.Accept(textAbsorber);
String extractedText = textAbsorber.Text;
System.IO.File.WriteAllText(dataDir + "bams-d-15-00314.1.txt", extractedText);
We hope this will be helpful. Please feel free to contact us if you need any further assistance.
Farhan, it works, thanks for your help.
My further question is : can I extract the highlighted text (and only highlighted text) from the PDF.
Thank you.
Thank you for your kind feedback.
Please always create separate topics for separate inquiries. It helps us address your concerns efficiently. Moreover, please visit Extract Highlighted Text from PDF Document for your kind reference.