I want to extract text from PDF but failed. Does your product extract it

info_mediatrendit_co · August 18, 2014, 8:24am

Hi,

I copied the dll to bin folder and then the code works. But it cannot

retrieve the text as desired. I want to extract text from PDF but failed. Does your product

<span style=“font-size:12.0pt;font-family:“Times New Roman”,“serif”;mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>extract it. If yes then let us know.

Document pdfDocument = new Document(@"C:\AA\Maval_Puravani2\Test\A2050015.pdf");

//create TextAbsorber object to extract text

TextAbsorber textAbsorber = new TextAbsorber();

//accept the absorber for all the pages

pdfDocument.Pages.Accept(textAbsorber);

//get the extracted text

string extractedText = textAbsorber.Text;

// create a writer and open the file

TextWriter tw = new StreamWriter(@"C:\AA\Maval_Puravani2\Test\extracted-text.txt");

// write a line of text to the file

tw.WriteLine(extractedText);

// close the stream

tw.Close();

This gives output as following

Evaluation Only. Created with Aspose.Pdf.

So I cannot test if your component extracts correct text from the PDF.

thx,

M.Irfan.

tilal.ahmad · August 18, 2014, 11:21am

Hi Irfan,

Thanks for your inquiry. Yes, Aspose.Pdf supports the feature to extract text from PDF files. We will appreciate it if you please share your sample PDF document here. We will test the scenario and will update you accordingly.

We are sorry for the inconvenience caused.

Best Regards,

codewarior · August 19, 2014, 5:03am

Hi Irfan,

Thanks for sharing the resource file.

I have tested the scenario using Aspose.Pdf for .NET 9.5.0 where I have used the code snippet which you have shared earlier and as per my observations, the text is properly being extracted. Please try using a valid license file to properly extract the text. For your reference, I have also attached the text file containing extracted contents.

You may consider requesting a 30 days temporary license to test the API without any issues. For more details, please visit Get a temporary license

info_mediatrendit_co · August 20, 2014, 4:07am

how do i download the attachment?

codewarior · August 21, 2014, 2:33am

mediatrendit:
how do i download the attachment?

Hi Irfan,

From your above query, do you mean the steps to download/get attachments from PDF file ? If so is the case, then please follow the instructions specified over Get All the Attachments from a PDF Document

However if your requirement is to download the attachment shared in my earlier post, simply right click the file and save it over your system. In case you encounter any issue, please feel free to contact.

info_mediatrendit_co · August 27, 2014, 8:53am

<span style=“font-size:12.0pt;font-family:“Times New Roman”,“serif”;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-ansi-language:
EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA”>As I told before your
extract, failed to extract correct text from PDF. The reason is when the font

is embeded fully in PDF and having Ansi or Identity-H encoding, the extraction
is straight

forward like coping contents from Adobe Reader and pasting it in any editor.But
when the

font is embedded as subset, you cannot extract the text since Glyphs are
wrongly mapped.

Please find the PDF as an attachment. thx. irfan

codewarior · August 28, 2014, 5:29am

mediatrendit:

As I told before your extract, failed to extract correct text from PDF. The reason is when the font
is embeded fully in PDF and having Ansi or Identity-H encoding, the extraction is straight
forward like coping contents from Adobe Reader and pasting it in any editor.But when the
font is embedded as subset, you cannot extract the text since Glyphs are wrongly mapped.

Please find the PDF as an attachment. thx. irfan

Hi Irfan,

I am afraid I cannot see any PDF document with your previous post. Please double check at your end and also please share the code snippet which you are using so that we can test the scenario at our end. We are sorry for this inconvenience.