Missing entities for Paragraph objects with hidden elements in LayoutCollector

I am using LayoutCollector and LayoutEnumerator to get the positions of paragraphs in a document. Some paragraphs have hidden content mixed with visible ones, and so visible overall. But the method GetEntity returns null.
For example, paragraphs with the text: “To search for”. These paragraphs are visible in MS Word but are not available from within LayoutCollector.
I attached the document.

Code sample:

var doc = new Document(inPath);
var lc = new LayoutCollector(doc);

var paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
foreach (var para in paragraphs)
{
	var res = lc.GetEntity(para);
	if (res == null)
	{
		Console.WriteLine($"Hidden paragraph: {para.ToString(SaveFormat.Text).Trim()}");
	}
}

I use Aspose.Words.dll 25.2.0.0 and Microsoft® Word for Microsoft 365 MSO (Version 2501 Build 16.0.18429.20132) 64-bit.
test34.zip (17.5 KB)

@licenses The behavior is correct. the entity returned by LayoutCollector.GetEntity method for a Paragraph node is a paragraph break span. If the paragraph break is hidden, there is nothing to return. You can modify your code like this:

Document doc = new Document(@"C:\Temp\in.docx");
LayoutCollector lc = new LayoutCollector(doc);

foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (para.ParagraphBreakFont.Hidden)
        continue;

    object res = lc.GetEntity(para);
    if (res == null)
    {
        Console.WriteLine($"Hidden paragraph: {para.ToString(SaveFormat.Text).Trim()}");
    }
}

So, there is no way to get a location of visible content inside such paragraphs?

@licenses You can wrap content of the paragraph to temporary bookmark and them get coordinates of the start and end of this bookmark:

Document doc = new Document(@"C:\Temp\in.docx");

NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
int bookmakrIndex = 0;
foreach (Paragraph p in paragraphs)
{
    // Skip paragraphs in header/footer and in shapes.
    if (p.GetAncestor(NodeType.HeaderFooter) != null && p.GetAncestor(NodeType.Shape) != null)
        continue;

    string bkName = string.Format("tmp_bk_{0}", bookmakrIndex++);
    p.PrependChild(new BookmarkStart(doc, bkName));
    p.AppendChild(new BookmarkEnd(doc, bkName));
}

// Node once we split Runs into smaller parts we can calculate rectangle occuped by SDT
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
foreach (Paragraph p in paragraphs)
{
    Bookmark wrappingBookmark = null;
    foreach (Bookmark bk in p.Range.Bookmarks)
    {
        if (bk.Name.StartsWith("tmp_bk_"))
        {
            wrappingBookmark = bk;
            break;
        }
    }

    if (wrappingBookmark == null)
        continue;

    enumerator.Current = collector.GetEntity(wrappingBookmark.BookmarkStart);
    RectangleF start = enumerator.Rectangle;

    enumerator.Current = collector.GetEntity(wrappingBookmark.BookmarkEnd);
    RectangleF end = enumerator.Rectangle;

    Console.WriteLine("{0} - {1}", start, end);

    // Remove temporaty boormark.
    wrappingBookmark.Remove();
}

Thanks, I got the idea.
I thought, there was an internal feature to LayoutCollector, but I will try to use the proposed workaround.

1 Like