Split Word's pages into separate documents using Java

Hi ,
How to get pages from the word document?
I checked with pagecount.It is showing less page count than actual pages in document.
There are no page breaks in our document. We have page numbers available in our document.
So please provide sample code / related api classes/methods to retrieve actual pages seperately?
Thanks.

Hi Sonali,

Thanks for your query. It would be great, If you share your document for investigation purposes.

Hi Sonali,

Thanks for your inquiry.

Sonali:
How to get pages from the word document?

Please note that MS Word documents are flow documents and are not natively laid out into lines and pages so currently there is no direct way to extract the content of a given page. However, I think, you can achieve this by using the utility methods available in the attached ‘PageNumberFinder’ class.

Sonali:
I checked with pagecount.It is showing less page count than actual pages in document.

Could you please attach your input Word document, you’re getting this problem with, here for testing? I will investigate the issue on my side and provide you more information.

Please let me know if I can be of any further assistance.

Best Regards,

No specific document. For any word docuemnt , if we can seperate pages directly. We are looking for aspose word java solution . So can you please provide similar utitlity in java. (for older 4.0.3 and latest 11.3 version as well).
Thanks.

Hi Sonali,

I regret to share with you that PageFinder utility is available for .NET version only at the moment. However, I will write this utility in java and update you asap.

We apologize for your inconvenience.

Hi Tahir,

Is the PageFinder utility available in Java now? I would like to take a look at it if it can help solve few of the issues we are facing.

Thank you.

Hi Sonali,

Please accept my apologies for your inconvenience. I will work over PageFinder utility for Java and share it with you by the end of this weekend.

We appreciate your patience.

Hi Sonali,

Thanks for your patience. I have completed the PageFinder utility for Java. Please find this utility in attachment. Hope this helps you. Please let us know if you have any more queries.

The issues you have found earlier (filed as WORDSNET-2978) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(61)

i have been on this link:
https://releases.aspose.com/words/java
and i have downloaded the latest Aspose.Words for Java 14.2.0, but still i couldn’t find a solution to my problem in it, am i missing anything?
P.S: my problem is that i can’t select and extract a certain page from a word page document and write it as another word document.

Hi,

Thanks for your inquiry. I have already answered a similar query of yours here in this post. I would suggest you please follow that thread for further proceedings. If we can help you with anything else, please feel free to ask.

Best regards,

Hi there,

Thanks for your inquiry. Please see the attached classes and try executing the following code:

Document doc = new Document("C:\Temp\in.doc");
LayoutCollector layoutCollector = new LayoutCollector(doc);
doc.updatePageLayout();
DocumentPageSplitter splitter = new DocumentPageSplitter(layoutCollector);
Document pageDoc = splitter.GetDocumentOfPage(5);
pageDoc.save("C:\\Temp\\out.doc");

Hope this helps you.

thanks a lot for your reply,
it works perfectly, but I’m afraid i might have another problem,
you know i am using the following in order to change the order of pages in a word document, adding the following makes the loading really slow especially if the document has too many pages,
even with cashing, the issue is still the same, it is really slow loading big documents at least the first time.
hope you have any ideas that might help me,
can i switch pages orders of a word document using Aspose word?
thanks,

Hi,

Thanks for your inquiry. The LayoutCollector class invokes page layout which builds the document in memory so note that with large documents this can take time. Secondly, what do you mean by switching page orders? Please clarify this requirement in more details.

Best regards,

hello,

thanks for your reply,
what i meant by switching pages orders is the following:
i have a case where i split a word document into separate pages and show those pages in a tree as thumbnails, for the user to be able to know which page is which, so if i have a word document pages 1…to 10, if the user clicks on page 1 and hits the down arrow, page 1 becomes page 2 and 2 becomes 1,

i know this might be a long shot, but i thought i would give it a try because with large document my way is taking SOME time to do it, so maybe there is something that is already there and might help.

thank you,

Hi,

Thanks for the additional information. In this case, you simply need to generate Document instances (or array of Document class) per each page in your main document. Once your user selects a page number (e.g. 2nd page), you can send second Document instance to him.

Best regards,

Thank You for your code. I’ve tried for splitting document that was imported from HTML.

doc.getPageCount() result is 2. I want to get the second page.
I used GetDocumentOfPage(2) but the outcome document was not only page 2, but the whole document just like before splitting.

Could you help me, solve the problem ?

Thanks

Hi there,

Thanks for your inquiry. In case you are using an older version of Aspose.Words, I would suggest you please upgrade to the latest version (v14.8.0) from here and let us know how it goes on your side.

Please call the Document.updatePageLayout method before using PageSplitter as shown below.

Document doc = new Document("C:\Temp\in.doc");

LayoutCollector layoutCollector = new LayoutCollector(doc);

doc.updatePageLayout();

DocumentPageSplitter splitter = new DocumentPageSplitter(layoutCollector);

Document pageDoc = splitter.GetDocumentOfPage(2);

pageDoc.save("C:\Temp\out.doc");

I hope, this helps. If the problem still remains, please attach your input Word document here for testing. I will investigate the issue on my side and provide you more information.

Thanks for your reply.

I did not use any word document as an input.
I got the document from opening an URL of a form, and then convert it to Document Object.

Here is the code :

URL url = new URL("http://oa.ptpjb.com/" + paramDb + "/0/" + paramDocUNID + "?OpenDocument");
URLConnection webClient = url.openConnection();
URLConnection webClient = url.openConnection();

InputStream inputStream = webClient.getInputStream();

int pos;
ByteArrayOutputStream bos = new ByteArrayOutputStream();
while ((pos = inputStream.read()) != -1)
    bos.write(pos);

byte[] dataBytes = bos.toByteArray();
ByteArrayInputStream byteStream = new ByteArrayInputStream(dataBytes);
LoadOptions loadOptions = new LoadOptions();
loadOptions.setBaseUri("http://oa.ptpjb.com");
Document doc = new Document(byteStream, loadOptions);

LayoutCollector layoutCollector = new LayoutCollector(doc);
doc.updatePageLayout();
DocumentPageSplitter splitter = new DocumentPageSplitter(layoutCollector);
Document pageDoc = splitter.GetDocumentOfPage(2);

response.setContentType("application/vnd.ms-word");
response.setHeader("content-disposition", "inline; filename=" + paramFileName);
ServletOutputStream outputStream = response.getOutputStream();
pageDoc.save(outputStream, SaveFormat.DOCX);

Is there a way to split the document that is imported by opening an URL ?

Best Regards

Hi there,

Thanks
for your inquiry. Yes, you can split document that is imported by opening an URL. Please share the URL if it is public or you may share your input HTML document here for testing. We will investigate the issue and provide you more information about your query.