Read PDF line by line with its whitespaces

aliihsandiler · February 18, 2016, 8:21am

Hello,

I want to read a PDF file line by line. But this line has a lot of white spaces. I want to extract the text with these whitespaces. Could I do this with ASPOSE, and how?

Thanks,

codewarior · February 21, 2016, 10:15pm

Hi Ali.

Thanks for contacting support.

In order to accomplish your requirements, please try using the code snippet specified over Extract Text from Pages using Text Device

In case you encounter any issue, please share your resource PDF files, so that we can further look into this matter.

tilal.ahmad · February 21, 2016, 10:37pm

Hi Ali,

In addition to above reply, you may extract text from PDF document and preserve the formatting using “Pure” TextFormattingMode. Please check following sample code for the purpose. You can also extract text from a specified page region.

//open document<o:p></o:p>

Document pdfDocument = new Document(“input.pdf”);<o:p></o:p>

//create TextAbsorber object to extract
text<o:p></o:p>

TextAbsorber textAbsorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure));<o:p></o:p>

//accept the absorber for all the pages<o:p></o:p>

pdfDocument.Pages.Accept(textAbsorber);<o:p></o:p>

//get the extracted text<o:p></o:p>

string extractedText = textAbsorber.Text;<o:p></o:p>

// create a writer and open the file<o:p></o:p>

TextWriter tw = new
StreamWriter(“extracted-text.txt”);<o:p></o:p>

// write a line of text to the file<o:p></o:p>

tw.WriteLine(extractedText);<o:p></o:p>

// close the stream<o:p></o:p>

tw.Close();<o:p></o:p>

Please feel free to contact us for any further assistance.

Best Regards,