I want to extract text from PDF but failed. Does your product extract it

Hi,

I copied the dll to bin folder and then the code works. But it cannot

retrieve the text as desired. I want to extract text from PDF but failed. Does your product

<span style=“font-size:12.0pt;font-family:“Times New Roman”,“serif”;mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>extract it. If yes then let us know.

Document pdfDocument = new Document(@"C:\AA\Maval_Puravani2\Test\A2050015.pdf");

//create TextAbsorber object to extract text

TextAbsorber textAbsorber = new TextAbsorber();

//accept the absorber for all the pages

pdfDocument.Pages.Accept(textAbsorber);

//get the extracted text

string extractedText = textAbsorber.Text;

// create a writer and open the file

TextWriter tw = new StreamWriter(@"C:\AA\Maval_Puravani2\Test\extracted-text.txt");

// write a line of text to the file

tw.WriteLine(extractedText);

// close the stream

tw.Close();

This gives output as following

Evaluation Only. Created with Aspose.Pdf.

So I cannot test if your component extracts correct text from the PDF.

thx,

M.Irfan.

Hi Irfan,


Thanks for your inquiry. Yes, Aspose.Pdf supports the feature to extract text from PDF files. We will appreciate it if you please share your sample PDF document here. We will test the scenario and will update you accordingly.

We are sorry for the inconvenience caused.

Best Regards,

Hi Irfan,

Thanks for sharing the resource file.

I have tested the scenario using Aspose.Pdf for .NET 9.5.0 where I have used the code snippet which you have shared earlier and as per my observations, the text is properly being extracted. Please try using a valid license file to properly extract the text. For your reference, I have also attached the text file containing extracted contents.

You may consider requesting a 30 days temporary license to test the API without any issues. For more details, please visit Get a temporary license

how do i download the attachment?

mediatrendit:
how do i download the attachment?
Hi Irfan,

From your above query, do you mean the steps to download/get attachments from PDF file ? If so is the case, then please follow the instructions specified over Get All the Attachments from a PDF Document

However if your requirement is to download the attachment shared in my earlier post, simply right click the file and save it over your system. In case you encounter any issue, please feel free to contact.

<span style=“font-size:12.0pt;font-family:“Times New Roman”,“serif”;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-ansi-language:
EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA”>As I told before your
extract, failed to extract correct text from PDF. The reason is when the font

is embeded fully in PDF and having Ansi or Identity-H encoding, the extraction
is straight

forward like coping contents from Adobe Reader and pasting it in any editor.But
when the

font is embedded as subset, you cannot extract the text since Glyphs are
wrongly mapped.



Please find the PDF as an attachment. thx. irfan


mediatrendit:
As I told before your extract, failed to extract correct text from PDF. The reason is when the font
is embeded fully in PDF and having Ansi or Identity-H encoding, the extraction is straight
forward like coping contents from Adobe Reader and pasting it in any editor.But when the
font is embedded as subset, you cannot extract the text since Glyphs are wrongly mapped.

Please find the PDF as an attachment. thx. irfan
Hi Irfan,

I am afraid I cannot see any PDF document with your previous post. Please double check at your end and also please share the code snippet which you are using so that we can test the scenario at our end. We are sorry for this inconvenience.