Can I find the location of text in a Word document?

I would like to put some known text in a Word doc and programmatically find it's x,y location with respect to the document or page using .NET. Is that possible? I was looking at using the OpenXML SDK but I don't see a path to getting that to work. Or could I use something like an image or bookmark to do this? I'm evaluating Aspose and curious if it can help with this.

Ultimately I'm trying to save the Word doc as a PDF or image (using a library like Aspose) and overlay some content at that location. Any ideas are appreciated.

Hi Bryan,

Thanks for your inquiry. Unfortunately, Aspose.Words does not support the requested feature at the moment. However, we had already logged this feature request as WORDSNET-2978 in our issue tracking system. You will be notified via this forum thread once this feature is available.

Moreover, you can use image or bookmark in your document to achieve your requirements. You can navigate the DocumentBuilder cursor to a different location in a document using various MoveToXXX methods.

We apologize for your inconvenience.

Thanks. If we were to convert the word doc to a PDF and then try to locate text or a marker of some kind, would that be supported by Aspose.PDF? Would it need to be a barcode or bookmark or something else to make sure it found it reliably? Maybe this is better posted in the other forums.

Thanks

Hi Bryan,

Thanks for your inquiry. Your query is more related to Aspose.PDF component. I am moving this thread to Aspose.PDF forum and my colleagues from Aspose.PDF team will reply you shortly.

Hi Bryan,


Thanks for contacting support.

I am pleased to share that Aspose.Pdf for .NET supports the feature to search the text inside PDF document and get its coordinates/dimensions. Please visit the following link for further details on Search and Get Text Segments from All Pages of PDF Document

You can also find and replace text inside PDF document and also add image/text over particular location. Please visit the following link for further information on

The Search and Get Text Segments from All Pages of PDF Document example seemed to be what I am looking for. Can you clarify one thing for me? Is the YIndent of the TextSegment relative to the bottom of the page? I assumed x,y would be top,left based and I wasn’t sure if it would be relative to the document or the page. It is looking like it is relative to the bottom left of the page it is on. Is that correct? I’m obviously new to the API so I’m not sure if I’m missing something. The documentation for both XIndent and YIndent say that they are the X coordinate.

And would these be the way to get the size of each page:

Console.WriteLine("pdfDocument.Pages[1].Rect.Height : {0} ", pdfDocument.Pages[1].Rect.Height);

Console.WriteLine("pdfDocument.Pages[1].ArtBox.Height : {0} ", pdfDocument.Pages[1].ArtBox.Height);

Console.WriteLine("pdfDocument.Pages[1].Rect.Width : {0} ", pdfDocument.Pages[1].Rect.Width);

Console.WriteLine("pdfDocument.Pages[1].ArtBox.Width : {0} ", pdfDocument.Pages[1].ArtBox.Width);

Thanks

Hi Bryan,

Thank you for the feedback.

bdoc7aspose:

The Search and Get Text Segments from All Pages of PDF Document example seemed to be what I am looking for. Can you clarify one thing for me? Is the YIndent of the TextSegment relative to the bottom of the page? I assumed x,y would be top,left based and I wasn’t sure if it would be relative to the document or the page. It is looking like it is relative to the bottom left of the page it is on. Is that correct?

The YIndent of the TextSegment is calculated from the bottom of the page. The XIndent and YIndent starts from bottom left of the page.

bdoc7aspose:

The documentation for both XIndent and YIndent say that they are the X coordinate.

http://www.aspose.com/docs/display/pdfnet/Position+XIndent+Property

http://www.aspose.com/docs/display/pdfnet/Position+YIndent+Property

Thank you for pointing out the issue. We have fixed the documentation.

bdoc7aspose:

And would these be the way to get the size of each page:

Console.WriteLine("pdfDocument.Pages[1].Rect.Height : {0} ", pdfDocument.Pages[1].Rect.Height);

Console.WriteLine("pdfDocument.Pages[1].ArtBox.Height : {0} ", pdfDocument.Pages[1].ArtBox.Height);

Console.WriteLine("pdfDocument.Pages[1].Rect.Width : {0} ", pdfDocument.Pages[1].Rect.Width);

Console.WriteLine("pdfDocument.Pages[1].ArtBox.Width : {0} ", pdfDocument.Pages[1].ArtBox.Width);

Your understanding is correct. You can see further details using the following documentation link regarding the page properties.

Get Page Properties

Please feel free to contact support in case you need any further assistance.

Thank You & Best Regards,

The issues you have found earlier (filed as WORDSNET-2978) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(7)

Hello,

Can I find the exact position of word/text in MS word or PDF using Aspose? Is there any property for that?

@mahima92

Thank you for contacting support.

You may find any text in a PDF document and retrieve its position with TextFragment.Position property as explained in Search and Get Text from All the Pages of PDF Document.

In case you also want to find position of text in a word document then please create a separate topic in Aspose.Words forum and we will guide you accordingly.

@mahima92

You can get the position of text in a Word document by using Aspose.Words. In this case, we suggest you following solution.

  1. In your case, we suggest you please implement IReplacingCallback interface and find the desired text. Please read the following article.
    Find and Replace

  2. In IReplacingCallback.Replacing, move the cursor to the matched node (text) and insert the bookmark.

  3. Use the following code snippet to find the position of inserted bookmark that is before the desired text.

    Document doc = new Document(MyDir + “in.docx”);
    LayoutCollector collector = new LayoutCollector(doc);
    LayoutEnumerator enumerator = new LayoutEnumerator(doc);

    Bookmark bookmark = doc.Range.Bookmarks[“bookmark_name”];
    enumerator.Current = collector.GetEntity(bookmark.BookmarkStart);
    Console.WriteLine(enumerator.Rectangle);

A post was split to a new topic: Search a text in a word document (.doc or .docx)