This is a question about how to efficiently use Aspose.Pdf and Aspose.Pdf.Kit. In my app, I need to do a couple of things:
1. I need to check if the file is encrypted. (I use Aspose.Pdf.PdfFileInfo)
2. I need to extract text, attachments. ( I use Aspose.Pdf.Kit.PdfExtractor)
3. I need to extract annotations. (I use Aspose.Pdf.Kit.PdfContentEditor)
This means I will need to load the same file 3 times, which could lead to performance issues if we need to run a lot of pdfs.
Is there a way that this can be improved? For example, being able to extract annotations via PdfExtractor, and being able to see if file is encrypted via PdfExtractor as well?
It is possible to join but it will create more complexity. We have tried to make it simple so that developers can learn and use our library with no pain. Anyhow, I will discuss your concerns with the developers and if we have plan to support or give some functions together in one class then we will let you know. Right now, please use as it is. Thanks for suggestion.
Thanks.
Adeel Ahmad
Support Developer
Aspose Changsha Team http://www.aspose.com/Wiki/default.aspx/Aspose.Corporate/ContactChangsha.html
Hi, Becky After we discussed the problem in detail, we found it can’t be supported by merging several functions in one function since this will not improve the performance.
Anyway, you could use the stream parameter of the functions since this will open the file only once. But you should reset the position of the input stream before every operations like the following:
hi,I have reproduce this error using Aspose.Pdf.Kit 2.5.0.0, and I will fix this bug within two days. In the next hotfix, we will support extrat “FreeText” annotation.
The functionality we want to achieve is to see if text elements exist in the page or not, I was trying the ExtractText() funtion, it takes a while to run if the pdf is rather large. Is there (or could there be) a more efficient way to check text existence on a page?
Another question is: the ExtractText() function will throw exception when extracting the attached pdf. Can you see why?
This is the only way to extract Text from Pdf page right now. We are working on the Text per page issue. Right now there is no property to check that, that page contains text or not.
About the second issue, I have reproduced this error. We will try to fix it soon.
Thanks.
Adeel Ahmad
Support Developer
Aspose Changsha Team http://www.aspose.com/Wiki/default.aspx/Aspose.Corporate/ContactChangsha.html
I was using PdfExtractor.ExtractText() to extract a pdf that only has one sentence in it, it took about 20-30 seconds to do that. Is the performance of this method a known issue? Pdf I tested attached.
That will definitely be too much of a perfomance hit for us. Are you guys going to be able to come up with a solution for this within the next 2 weeks?
Is extracting text per page going to be supported anytime soon? Is the bug I sent you in an earlier post still being investigated (the bug is about ExtractText() throws exception on the example file I gave you).
Certainly we have plans in near future, but right now extracting text per page is in its development stages. You can try, but it has few limitations right now. You can use it like:
About the second, bug problem our developers are working hard to find the root cause of this problem. As Georgie already told that it is difficult to give a ETA for this problem, but I will again reconfirm it and will get back to you.
Thanks.
Adeel Ahmad
Support Developer
Aspose Changsha Team http://www.aspose.com/Wiki/default.aspx/Aspose.Corporate/ContactChangsha.html
We will provide a .Net2.0 version of Aspose.Pdf.Kit which support extracting text per page before tommorrow.
The ExtractText bug with PDF file that doesn’t contain text hasn’t fix now. We are working hard to solve this problem but we could not give an ETA now.
Seems to be working well for getting the text. It seem though that PdfExtractor.HasNextPageText() only works if you extract the text for the current page. Is this true? I.e. it seems we should be able to do the following:
// Starting a 0 because want to know if 1 - pageCount has text for (int i = 0; i < pageCount; i++ ) { extractor.StartPage = i; extractor.EndPage = i;
but this only seems to work if we ExtractText() before calling HasNextPageText(). We have cases when we only want to know if there is text but dont need to extract it. Let me know if I am just setting it up incorrectly.
2. Documents with no text:
We have some documents that have no text, but Extract() and GetText() are returning “blanks” and HasNextPageText() is returnnig true. I have attached an example.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.