I am using LayoutCollector and LayoutEnumerator to get the positions of paragraphs in a document. Some paragraphs have hidden content mixed with visible ones, and so visible overall. But the method GetEntity returns null.
For example, paragraphs with the text: “To search for”. These paragraphs are visible in MS Word but are not available from within LayoutCollector.
I attached the document.
Code sample:
var doc = new Document(inPath);
var lc = new LayoutCollector(doc);
var paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
foreach (var para in paragraphs)
{
var res = lc.GetEntity(para);
if (res == null)
{
Console.WriteLine($"Hidden paragraph: {para.ToString(SaveFormat.Text).Trim()}");
}
}
I use Aspose.Words.dll 25.2.0.0 and Microsoft® Word for Microsoft 365 MSO (Version 2501 Build 16.0.18429.20132) 64-bit.
test34.zip (17.5 KB)
@licenses The behavior is correct. the entity returned by LayoutCollector.GetEntity
method for a Paragraph
node is a paragraph break span. If the paragraph break is hidden, there is nothing to return. You can modify your code like this:
Document doc = new Document(@"C:\Temp\in.docx");
LayoutCollector lc = new LayoutCollector(doc);
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
if (para.ParagraphBreakFont.Hidden)
continue;
object res = lc.GetEntity(para);
if (res == null)
{
Console.WriteLine($"Hidden paragraph: {para.ToString(SaveFormat.Text).Trim()}");
}
}
So, there is no way to get a location of visible content inside such paragraphs?
@licenses You can wrap content of the paragraph to temporary bookmark and them get coordinates of the start and end of this bookmark:
Document doc = new Document(@"C:\Temp\in.docx");
NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
int bookmakrIndex = 0;
foreach (Paragraph p in paragraphs)
{
// Skip paragraphs in header/footer and in shapes.
if (p.GetAncestor(NodeType.HeaderFooter) != null && p.GetAncestor(NodeType.Shape) != null)
continue;
string bkName = string.Format("tmp_bk_{0}", bookmakrIndex++);
p.PrependChild(new BookmarkStart(doc, bkName));
p.AppendChild(new BookmarkEnd(doc, bkName));
}
// Node once we split Runs into smaller parts we can calculate rectangle occuped by SDT
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
foreach (Paragraph p in paragraphs)
{
Bookmark wrappingBookmark = null;
foreach (Bookmark bk in p.Range.Bookmarks)
{
if (bk.Name.StartsWith("tmp_bk_"))
{
wrappingBookmark = bk;
break;
}
}
if (wrappingBookmark == null)
continue;
enumerator.Current = collector.GetEntity(wrappingBookmark.BookmarkStart);
RectangleF start = enumerator.Rectangle;
enumerator.Current = collector.GetEntity(wrappingBookmark.BookmarkEnd);
RectangleF end = enumerator.Rectangle;
Console.WriteLine("{0} - {1}", start, end);
// Remove temporaty boormark.
wrappingBookmark.Remove();
}
Thanks, I got the idea.
I thought, there was an internal feature to LayoutCollector, but I will try to use the proposed workaround.
1 Like