Free Support Forum - aspose.com

Looping through pages in a word-document

Hi. I am building an application that will let the user extract text out of a word-doc. I want to give the user a choice of specifying which page numbers they want to extract.

How can I access specific pages in a word doc? How can I find the page-breaks? (see attached screenshot)
So far, all the word docs I have tested, I found only one Section in the sections collection of the document object. Is it true that document.Section.Body.Count is always the same as the PageCount.
Please help.
Thanks.

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. MS Word document is flow document and does not contain any information about its layout into lines and pages. That is why there is no way to get particular page using Aspose.Words. MS Word layouts content of a document into lines and pages on the fly.

However, you can extract content separated by page breaks. Please see the following link for more information:

http://www.aspose.com/community/blogs/aspose.words-for-.net-java-reporting-services-and-jasperreports/archive/2007/04/16/73244.aspx

Regarding Body.Count it returns count of immediate children of this node. For instance if body contains 4 paragraphs and 3 tables, Body.Count returns 7.

Best regards.

Hi. I have been trying to use your method of extracting pages from a word document using pageBreaks as detailed here:

http://www.aspose.com/community/blogs/aspose.words-for-.net-java-reporting-services-and-jasperreports/archive/2007/04/16/73244.aspx

But if any of the pages have a Table on them, then it simply skips all the pages before the Table. Is this a known issue or a bug? Please take a look at the attached word doc. When I run the ExtractPages function, the code ignores everything before the first table.
Please help.

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your inquiry. The code works fine on my side. It extracts four pages from your document as expected. Please attach your output documents.

Best regards.

Thanks Alexey. But as you will see in this attached file, in Part1.doc the documents starts at the Table. Whereas in my source document, there was a lot of content before that Table that has just disappeared in the extraction process. See zip file.

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for additional information. I got different result on my side. Please make sure you did not change anything in the code. If you changed something, please attach your code here for testing.

Best regards.

The issues you have found earlier (filed as WORDSNET-2978) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(38)