Partial Loading

huseyincandan · January 7, 2015, 3:43am

Hi,

I want to get information from big PDF files and memory usage is important for us. For example, getting page count of PDF file, getting text of a spesific page, exporting one page in any format like image or html.

As I understand, creating document instance by following code

doc = new Aspose.Pdf.Document(fileName)

loads all documents to memory. Is there a way to get such information without load all document?

The case is also required for other types of documents (word, cells, etc), but I did not tested for them yet.

Thank you.

codewarior · January 7, 2015, 4:46am

huseyincandan:

I want to get information from big PDF files and memory usage is important for us. For example, getting page count of PDF file, getting text of a spesific page, exporting one page in any format like image or html.

Hi Huseyin,

Thanks for contacting support.

In order to get PDF file information, the document needs to be loaded into Document object. However we already have logged the requirement to get PDF information/properties without loading the entire document as PDFNEWNET-35033 in our issue tracking system.

Besides this, in order to get text or exporting certain PDF pages to Image or HTML format, you need to load the complete document (because you need to get the hold of specific pages inside pages collection of Document instance).

huseyincandan:

The case is also required for other types of documents (word, cells, etc), but I did not tested for them yet.

My fellow workers from Aspose.Words, Aspose.Cells and Aspose.Slides will reply accordingly.

muhammad.ijaz · January 13, 2015, 9:47am

Hi Huseyin,

Aspose.Cells allow you to load all or specific sheets as you can see in this article. As far as other APIs are concerned, they do not support partial loading because they need to build document layout in memory before extracting any information or converting to other formats.

Best Regards,