Hi,Support:
Is there any method to read body range texts line by line? It seems there is only method to read text range paragraph by paragraph or character by character?
Thanks for your help.
Hi,Support:
Is there any method to read body range texts line by line? It seems there is only method to read text range paragraph by paragraph or character by character?
Thanks for your help.
@ducaisoft As you know there is no concept of page or line in MS Word documents due to their flow nature. The consumer applications build document layout on the fly, the same does Aspose.Words using it’s own layout engine. LayoutCollector
and LayoutEnumerator
classes provides a limited access to document layout information.
For example, the following code demonstrates the basic technique of splitting document content into lines:
Document doc = new Document(@"C:\Temp\in.docx");
// Split all Run nodes in the document to make them not more than one word.
List<Run> runs = doc.GetChildNodes(NodeType.Run, true).Cast<Run>().ToList();
foreach (Run r in runs)
{
Run current = r;
while (current.Text.IndexOf(' ') >= 0)
current = SplitRun(current, current.Text.IndexOf(' ') + 1);
}
// Wrap all runs in the document with bookmakrs to make it possibel to work with LayoutCollector and LayoutEnumerator
runs = doc.GetChildNodes(NodeType.Run, true).Cast<Run>().ToList();
List<string> tmpBookmakrs = new List<string>();
int bkIndex = 0;
foreach (Run r in runs)
{
// LayoutCollector and LayoutEnumerator does nto work with nodes in header/footer or in textboxes.
if (r.GetAncestor(NodeType.HeaderFooter) != null || r.GetAncestor(NodeType.Shape) != null)
continue;
BookmarkStart start = new BookmarkStart(doc, string.Format("r{0}", bkIndex));
BookmarkEnd end = new BookmarkEnd(doc, start.Name);
r.ParentNode.InsertBefore(start, r);
r.ParentNode.InsertAfter(end, r);
tmpBookmakrs.Add(start.Name);
bkIndex++;
}
// Now we can use collector and enumerator to get runs per line in MS Word document.
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
object currentLine = null;
foreach (string bkName in tmpBookmakrs)
{
Bookmark bk = doc.Range.Bookmarks[bkName];
enumerator.Current = collector.GetEntity(bk.BookmarkStart);
while (enumerator.Type != LayoutEntityType.Line)
enumerator.MoveParent();
if (currentLine != enumerator.Current)
{
currentLine = enumerator.Current;
Console.WriteLine();
Console.WriteLine("-------=========Start Of Line=========-------");
}
Run run = bk.BookmarkStart.NextSibling as Run;
if (run != null)
Console.Write(run.Text);
}
private static Run SplitRun(Run run, int position)
{
Run afterRun = (Run)run.Clone(true);
run.ParentNode.InsertAfter(afterRun, run);
afterRun.Text = run.Text.Substring(position);
run.Text = run.Text.Substring(0, position);
return afterRun;
}
This looks can not split each paragrap line by line.
@ducaisoft You can process a particular paragraph using the provided code - the approach is the same.
Yes! I test it, but it can only extract many runs into the list, whereas can not extract each line into the list.
Maybe, this code may work correctly in C++, but may fail in Net. Would translate it into Net demo for me testing?
Could you translate the demo code as VB.net ones?
@ducaisoft Here is the same code in VB.NET
:
Dim doc As Document = New Document("C:\Temp\in.docx")
' Split all Run nodes in the document to make them Not more than one word.
Dim runs As NodeCollection = doc.GetChildNodes(NodeType.Run, True)
For Each r As Run In runs
Dim current As Run = r
While (current.Text.IndexOf(" ") >= 0)
current = SplitRun(current, current.Text.IndexOf(" ") + 1)
End While
Next
' Wrap all runs in the document with bookmakrs to make it possibel to work with LayoutCollector and LayoutEnumerator
runs = doc.GetChildNodes(NodeType.Run, True)
Dim tmpBookmakrs As List(Of String) = New List(Of String)
Dim bkIndex As Integer = 0
For Each r As Run In runs
' LayoutCollector And LayoutEnumerator does nto work with nodes in header/footer Or in textboxes.
If (r.GetAncestor(NodeType.HeaderFooter) IsNot Nothing Or r.GetAncestor(NodeType.Shape) IsNot Nothing) Then
Continue For
End If
Dim bkStart As BookmarkStart = New BookmarkStart(doc, String.Format("r{0}", bkIndex))
Dim bkEnd As BookmarkEnd = New BookmarkEnd(doc, bkStart.Name)
r.ParentNode.InsertBefore(bkStart, r)
r.ParentNode.InsertAfter(bkEnd, r)
tmpBookmakrs.Add(bkStart.Name)
bkIndex = bkIndex + 1
Next
' Now we can use collector and enumerator to get runs per line in MS Word document.
Dim collector As LayoutCollector = New LayoutCollector(doc)
Dim enumerator As LayoutEnumerator = New LayoutEnumerator(doc)
Dim currentLine As Object = New Object()
For Each bkName As String In tmpBookmakrs
Dim bk As Bookmark = doc.Range.Bookmarks(bkName)
enumerator.Current = collector.GetEntity(bk.BookmarkStart)
While (enumerator.Type <> LayoutEntityType.Line)
enumerator.MoveParent()
End While
If (currentLine IsNot enumerator.Current) Then
currentLine = enumerator.Current
Console.WriteLine()
Console.WriteLine("-------=========Start Of Line=========-------")
End If
Dim run As Run = TryCast(bk.BookmarkStart.NextSibling, Run)
If (run IsNot Nothing) Then
Console.Write(run.Text)
End If
Next
Private Function SplitRun(run As Run, position As Integer) As Run
Dim afterRun As Run = CType(run.Clone(True), Run)
run.ParentNode.InsertAfter(afterRun, run)
afterRun.Text = run.Text.Substring(position)
run.Text = run.Text.Substring(0, position)
Return afterRun
End Function
Thank you very much!
I test it and it works well!
And another issue is that how to split a given paragrap instead of the whole document?
@ducaisoft Use the same code, but process the Run
nodes from a single Paragraph
:
Dim runs As NodeCollection = someParagraph.GetChildNodes(NodeType.Run, True)