Hi everyone,
I am wondering if there is a way to get the bounding box coordinates for every character of the rendered document using Aspose.Words.
To my knowledge, the smallest node is a “Run” which can hold multiple characters. I am able to get the bounding box of it using the example provided in here
However, I would like to get a bounding box for each character inside that run.
The only topic related to this was in the link provided above but I didn’t find it that helpful. I would really appreciate some extra details.
Please advise.
Thanks
@Mike1992 You can split Run
nodes in your document so that they contain only one character per Run
. For example see the following code:
Document doc = new Document(@"C:\Temp\in.docx");
List<Run> runs = doc.GetChildNodes(NodeType.Run, true).Cast<Run>().ToList();
foreach (Run r in runs)
{
Run currentRun = r;
while (currentRun.Text.Length > 1)
{
currentRun = SplitRun(currentRun, 1);
}
}
private static Run SplitRun(Run run, int position)
{
Run afterRun = (Run)run.Clone(true);
run.ParentNode.InsertAfter(afterRun, run);
afterRun.Text = run.Text.Substring(position);
run.Text = run.Text.Substring(0, position);
return afterRun;
}
Thanks you very much @alexey.noskov for your detailed answer.
Do you have any thoughts on how to access each character bounding box after splitting the Runs?
I tried using this example, but the LayoutEnumerator.Rectangle is not showing the split Runs. I am wondering if there is a way to access the bounding box of a Run without the LayoutEnumerator?
For context, I am trying to render the page, draw the bounding box for each character and save that to an image.
@Mike1992 Document object model is different from layout document model. Run
node in flow document can span several lines or even pages. That is why text content in the document is split into spans (see LayoutEntityType.Span) in layout model. Using LayoutEnumerator you can get bounding box of individual span, but not of an individual character.
However, you can try calculating an approximate bounding box of each character within a span by calculating length of the string in the span and splitting bounding box. Of course width of each character depends on the font applied, but such approach can give you an approximate position of each character within a span.