Testing the extracttext() method

S.Kohls · March 13, 2006, 10:34am

Hi,

I try to test pdf.kit to see if I can do the things I needed!

I need to split a PDF file an extract Data from the pdf file.

When I use the extracttext method for the original pdf file I get an exception that I try to read over the end of the stream.

No problem so far, because for the splitted files the extracttext method work, but no matter how much files I generate, I always get the same textfile with following content:

Warning:This is the evaluation version of Aspose.Pdf.Kit. Some garbage text will be added randomly to your extracted text. Please purchase your license to extract text correctly.

So far I don't know if I can get the Infos from the pdf File that I need!

Please let me know if it's possible to test this function better?

Or is it perhaps the normal output when pdf.kit could not read any text in the pdf file?

Best Regards

Sören Kohls

KevinZuo · March 13, 2006, 6:16pm

Thanks for considering Aspose.

Plz go to our wiki page for how to split Pdf and extract Pdf data in details.

Split Pdf goes to Split PDF programmatically|Aspose.PDF for .NET

Extract Pdf data goes to :

Aspose Documentation

Search and Get Text from Pages of PDF Document with C#
This article explains how to use various tools to search and get a text from PDF docs. We can search with regular expression from particular or whole pages.

BTW, our components will add some garbage text randomly into your extracted text because you are using the evaluation version. If you have a good feeling about our components, plz go to the center of our product and purchase a license.

S.Kohls · March 14, 2006, 3:25am

this is my sourcecode for testing:

string inFile = "c:\\AR.PDF";

string outFile1 = "c:\\AR";

string outFile2 = ".pdf";

PdfFileInfo pfi = new PdfFileInfo(inFile);

int nop = pfi.NumberofPages;

nop++;

for (int i = 1; i < nop; i++)

{

FileStream inStream = new FileStream(inFile, FileMode.Open);

FileStream outputStream = new FileStream(outFile1 + i + outFile2, FileMode.Create);

PdfFileEditor editor = new PdfFileEditor();

int[] pages = new int[] { i };

editor.Extract(inStream, pages, outputStream);

editor = new PdfFileEditor();

outputStream.Close();

outputStream.Dispose();

inStream.Close();

inStream.Dispose();

PdfExtractor extractor = new PdfExtractor();

extractor.Password = "";

extractor.BindPdf("c:\\AR" + i + ".PDF");

extractor.ExtractText();

extractor.GetText("c:\\AR" + i + ".txt");

}

but after that the AR{0}.txt files only contains:

Warning:This is the evaluation version of Aspose.Pdf.Kit. Some garbage text will be added randomly to your extracted text. Please purchase your license to extract text correctly.

and no other text.

Now I want to know if there had to be more text in the text file, because i can't see any garbage text!

I don't want to buy a license to see that the text couldn't extracted.

Best Regards

S. Kohls

KevinZuo · March 14, 2006, 6:25am

Thanks for considering Aspose.

At present we can’t find blemish from codes above, so could you attach one of your pdf file here in your reply post for our testing at your convenience?

It’s tough for us to find out what lead such error without original Pdf file.

Many thanks.

S.Kohls · March 14, 2006, 6:56am

Hi,

I will check if I can get a version which doesn't contain internal informations.

I will mail this to you when I get the file.

Thanks

Sören Kohls

KevinZuo · March 15, 2006, 3:39am

Thanks for considering Aspose.

I have got your mail. The document attached will be tested. We will give you a reply ASAP.

S.Kohls · March 20, 2006, 5:31am

Anything new? Could you read the text from the document?

KevinZuo · March 20, 2006, 9:20am

Thanks for considering Aspose.

I've reproduce the error you had. GeorgieYuan is now working hard on this issue. A good response will be sent to u soon.

GeorgieYuan · March 26, 2006, 8:47pm

Dear S.Kohls,

We need about 1~2 weeks to solve this problem.

GeorgieYuan · April 18, 2006, 5:50pm

Dear programcsharp,

We have released a new version , which has fixed this bug , please download here: