Hi,
I copied the dll to bin folder and then the code works. But
it cannot
retrieve the text as desired. I want to extract text from PDF but failed. Does your
product
<span style=“font-size:12.0pt;font-family:“Times New Roman”,“serif”;mso-fareast-font-family:
Calibri;mso-fareast-theme-font:minor-latin;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>extract it. If yes then let us know.
Document
pdfDocument = new
Document(@"C:\AA\Maval_Puravani2\Test\A2050015.pdf");
//create
TextAbsorber object to extract text
TextAbsorber
textAbsorber = new TextAbsorber();
//accept the
absorber for all the pages
pdfDocument.Pages.Accept(textAbsorber);
//get the
extracted text
string
extractedText = textAbsorber.Text;
// create a writer
and open the file
TextWriter tw =
new StreamWriter(@"C:\AA\Maval_Puravani2\Test\extracted-text.txt");
// write a line of
text to the file
tw.WriteLine(extractedText);
// close the
stream
tw.Close();
This gives output as following
Evaluation Only. Created with Aspose.Pdf.
So I cannot test if your component extracts correct text
from the PDF.
thx,
M.Irfan.
Hi Irfan,
Thanks for your inquiry. Yes, Aspose.Pdf supports the feature to extract text from PDF files. We will appreciate it if you please share your sample PDF document here. We will test the scenario and will update you accordingly.
We are sorry for the inconvenience caused.
Best Regards,
Hi Irfan,
Thanks for sharing the resource file.
I have tested the scenario using Aspose.Pdf for .NET 9.5.0 where I have used the code snippet which you have shared earlier and as per my observations, the text is properly being extracted. Please try using a valid license file to properly extract the text. For your reference, I have also attached the text file containing extracted contents.
You may consider requesting a 30 days temporary license to test the API without any issues. For more details, please visit Get a temporary license
how do i download the attachment?
mediatrendit: how do i download the attachment?
Hi Irfan,
However if your requirement is to download the attachment shared in my earlier post, simply right click the file and save it over your system. In case you encounter any issue, please feel free to contact.
<span style=“font-size:12.0pt;font-family:“Times New Roman”,“serif”;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-ansi-language:
EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA”>As I told before your
extract, failed to extract correct text from PDF. The reason is when the font
is embeded fully in PDF and having Ansi or Identity-H encoding, the extraction
is straight
forward like coping contents from Adobe Reader and pasting it in any editor.But
when the
font is embedded as subset, you cannot extract the text since Glyphs are
wrongly mapped.
Please find the PDF as an attachment. thx. irfan
mediatrendit:
As I told before your
extract, failed to extract correct text from PDF. The reason is when the font
is embeded fully in PDF and having Ansi or Identity-H encoding, the extraction
is straight
forward like coping contents from Adobe Reader and pasting it in any editor.But
when the
font is embedded as subset, you cannot extract the text since Glyphs are
wrongly mapped.
Please find the PDF as an attachment. thx. irfan
Hi Irfan,
I am afraid I cannot see any PDF document with your previous post. Please double check at your end and also please share the code snippet which you are using so that we can test the scenario at our end. We are sorry for this inconvenience.