Free Support Forum - aspose.com

Testing the extracttext() method

Hi,

I try to test pdf.kit to see if I can do the things I needed!

I need to split a PDF file an extract Data from the pdf file.

When I use the extracttext method for the original pdf file I get an exception that I try to read over the end of the stream.

No problem so far, because for the splitted files the extracttext method work, but no matter how much files I generate, I always get the same textfile with following content:

Warning:This is the evaluation version of Aspose.Pdf.Kit. Some garbage text will be added randomly to your extracted text. Please purchase your license to extract text correctly.

So far I don't know if I can get the Infos from the pdf File that I need!

Please let me know if it's possible to test this function better?

Or is it perhaps the normal output when pdf.kit could not read any text in the pdf file?

Best Regards

Sören Kohls

Thanks for considering Aspose.

Plz go to our wiki page for how to split Pdf and extract Pdf data in details.

Split Pdf goes to : http://www.aspose.com/wiki/default.aspx/Aspose.Pdf.Kit/SplitPDFtoSinglePages.html

Extract Pdf data goes to :

http://www.aspose.com/wiki/default.aspx/Aspose.Pdf.Kit/ExtractText.html

BTW, our components will add some garbage text randomly into your extracted text because you are using the evaluation version. If you have a good feeling about our components, plz go to our products center and purchase a license.

this is my sourcecode for testing:

string inFile = "c:\\AR.PDF";

string outFile1 = "c:\\AR";

string outFile2 = ".pdf";

PdfFileInfo pfi = new PdfFileInfo(inFile);

int nop = pfi.NumberofPages;

nop++;

for (int i = 1; i < nop; i++)

{

FileStream inStream = new FileStream(inFile, FileMode.Open);

FileStream outputStream = new FileStream(outFile1 + i + outFile2, FileMode.Create);

PdfFileEditor editor = new PdfFileEditor();

int[] pages = new int[] { i };

editor.Extract(inStream, pages, outputStream);

editor = new PdfFileEditor();

outputStream.Close();

outputStream.Dispose();

inStream.Close();

inStream.Dispose();

PdfExtractor extractor = new PdfExtractor();

extractor.Password = "";

extractor.BindPdf("c:\\AR" + i + ".PDF");

extractor.ExtractText();

extractor.GetText("c:\\AR" + i + ".txt");

}

but after that the AR{0}.txt files only contains:

Warning:This is the evaluation version of Aspose.Pdf.Kit. Some garbage text will be added randomly to your extracted text. Please purchase your license to extract text correctly.

and no other text.

Now I want to know if there had to be more text in the text file, because i can't see any garbage text!

I don't want to buy a license to see that the text couldn't extracted.

Best Regards

S. Kohls

Thanks for considering Aspose.

At present we can't find blemish from codes above, so could u attach one of your pdf file for our testing at your convenience or directly mail to kevin.zuo@aspose.com?

It's tough for us to find out what lead such error without original Pdf file.

Many thanks.

Hi,

I will check if I can get a version which doesn't contain internal informations.

I will mail this to you when I get the file.

Thanks

Sören Kohls

Thanks for considering Aspose.

I have got your mail. The document attached will be tested. We will give you a reply ASAP.

Anything new? Could you read the text from the document?

Thanks for considering Aspose.

I've reproduce the error you had. GeorgieYuan is now working hard on this issue. A good response will be sent to u soon.

Dear S.Kohls,

We need about 1~2 weeks to solve this problem.

Dear programcsharp,

We have released a new version , which has fixed this bug , please download here:

http://www.aspose.com/Downloads/Aspose.Pdf.Kit/2.0.0.0/Default.aspx