Free Support Forum - aspose.com

Get plain text for a page (Render to text functionality)

I'm searching for a way to get the plain text for a given page of a Word document.

When I use the GetText method I can recognise the 'hard' page breaks but not the soft page breaks.

Is'nt there a method to render the document to plain text format of RTF including all page breaks?

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. MS Word document is flow document and does not contain any information about its layout into lines and pages. So there is no way to determine where page starts or ends.

In Aspose.Words we use our own Rendering Engine, which layouts Word document into pages. This rendering engine is used to convert documents to fixed page formats like PDF, XPS and images.

In one of future versions, we are going to expose information of nodes layout. This will help you determine on which page and where on the page the particular node is located. I will notify you as soon as this feature is supported.

Best regards.

Hi Alexey,

Thank you for your answer.

I will explain what functionality we try to implement: Our product displays the pages of a Word document as images. We provide a search functionality with which the user can search for a specific text. We then want to display the page(s) that contains a search hit and highlight the matching texts. When the user clicks 'Search next' we want to display the next page that has one or more search hits.

We managed to implement the part of the highlighting (for all matches in the document), but finding out on which page a match is located is a problem. So, when the user presses 'Serach next' we do not know what page to present.

I understand that MS Word is a flow document and that the pages are determined when the document is actually rendered. A solution to our problem would be that you would implement a SaveToTxt method that renders one or more pages and outputs plain text with specific markers on the page boundaries similar to the SaveToImage and SaveToPdf methods.

I'm not sure we could use this new functionality regarding the location of a particular node. But maybe you have an idea how we could implement the search funtionality as described above using your product?

Hi

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thank you for additional information. I fully understand your requirements. However, I cannot suggest you any way to implement this at the moment.

This will be possible once we expose layout information of nodes. In this case, while replace process, you can determine on which page the matched text is located and show the correct page. I will notify you as soon as this feature is available.

Best regards.

The issues you have found earlier (filed as WORDSNET-2978) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(14)