Loading a single page for a Linearized PDF

snewby · April 12, 2017, 1:23pm

I need to perform some processing a very large number of PDFs with thousands of pages. Ideally, we'd like to simply request the byte range for only the page we're interested in. If I know the byte range for a given page inside a linearized PDF, can I load this page with the Aspose API without having to supply the entire document bytes?

asad.ali · April 13, 2017, 11:09am

Hi Sid,

Thanks for contacting support.

You can determine the byte size/range of complete document by loading it into ByteArrayInputStream but in order to load a single page in memory, you can generate a new document in memory with particular/desired page(s) from large input file(s) and then load the new generated document into memory. This way the memory bytes will only contain the bytes of that particular page or set of page(s).

Please check following code snippet to achieve the single page in bytes and supply those bytes for further processing. Here I supplied bytes just to show the count of pages in new generated document.

Document doc = new Document(dataDir + “input.pdf”);
Document doc2 = new Document();
doc2.getPages().add(doc.getPages().get_Item(1));
ByteArrayOutputStream dstStream = new ByteArrayOutputStream();
doc2.save(dstStream);
ByteArrayInputStream srcStream = new ByteArrayInputStream(dstStream.toByteArray());
// Supply bytes of particular page
doc2 = new Document(srcStream);

In case if you need any further assistance, please feel free to contact us.

Best Regards,

codewarior · April 13, 2017, 11:27am

Hi Sid,

Thanks for contacting support.

Adding more to Asad’s comments, the approach he has shared is related to extracting certain page from main PDF file and then instantiating a new Document instance. However as per my understanding, instead of loading complete input file, you need to only load certain pages. If so is the case, then I am afraid currently Aspose.Pdf for Java does not support this feature. We are sorry for this inconvenience.

snewby · April 13, 2017, 3:35pm

Thanks guys. I was referring to the just loading the bytes for the page (along with maybe the linearized dictionary). I was afraid that it wasn’t possible. We’ll explore other routes. Thanks again for the reply!

asad.ali · April 14, 2017, 11:07am

Hi Sid,

Thanks for your kind feedback. Please keep using our API and in case of any further assistance, please feel free to contact us. We will be happy to extend our support.

Best Regards,