Extracting Text Content from Linked Textboxes in Word Document (C# .NET)

officedev · July 7, 2021, 10:37pm

I have a document with two textboxes, one linked to the other. Inspecting the contents using the DOM (or OpenXML for that matter) shows all of the text in one of the boxes. However, at runtime Word is flowing some of the text over to the linked textbox. My goal is to extract the text from each textbox separately, as it would be rendered on the page. Is this possible?

I’ve gotten as far as using a LayoutCollector and LayoutEnumerator to determine the position on the page of each paragraph contained within the populated textbox, and it appears that I can compare that against the position/size of the textbox shapes. This feels brittle though, so I’m wondering if there is a better way.

Does the text property of a linked textbox ever get populated after layout is done? Or is it just overlaid over the linked textbox at render time, even though the other textbox “owns” that text from a DOM point of view?

awais.hafeez · July 8, 2021, 6:28am

@officedev,

Please ZIP and upload a simplified source Word DOCX document containing the linked textboxes here for testing. We will then investigate the scenario on our end and provide you more information.

officedev · July 8, 2021, 3:46pm

test2.docx (17.3 KB)

Please see attached. In Word the “eeee” text at number 5 on that list gets split into the second linked textbox. However, inspecting the shapes in the DOM shows all of the text in the first textbox.

Thanks.

awais.hafeez · July 8, 2021, 7:10pm

@officedev,

We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-22473. We will further look into the details of this requirement and will keep you updated here on the status of the linked ticket. We apologize for your inconvenience.

awais.hafeez · August 6, 2021, 7:57am

@officedev,

Regarding WORDSNET-22473, we have completed the work on this issue and concluded that we will not be able to implement the fix to your issue in Aspose.Words’ API.

Unfortunately, there is no simple way to achieve this. The only one is using the LayoutCollector and LayoutEnumerator as you are already using.

No, it is not populated after layout is done/built. Layout engine just converts Aspose.Words’ DOM into internal layout model that is acceptable by Aspose.Words’ internal renderers.