LayoutCollector returning the incorrect page number for a node

Hi,

I am using LayoutCollector to determine the page of a node within a Document, and removing that node if it exists on a page before a specificied page number. Here is the code:

public Stream RemoveUnwantedPages(Stream file, int selectedStartPage = 1)
{
    // Load the document from the input stream
    Document doc = new(file);

    LayoutCollector layoutCollector = new(doc);

    layoutCollector.Clear();
    doc.UpdatePageLayout();


    // Iterate through all nodes and remove content that exists before the selectedStartPage
    foreach (Section section in doc.Sections.Cast<Section>())
    {

        var nodes = section.Body.GetChildNodes(NodeType.Any, true);
        foreach (Node node in nodes)
        {
            int pageIndex = layoutCollector.GetStartPageIndex(node);
            int endIndex = layoutCollector.GetEndPageIndex(node);
            int pagesSpanned = layoutCollector.GetNumPagesSpanned(node);

            if (pageIndex < selectedStartPage) // remove content before the selected start page
            {
                var parent = node.ParentNode;
                parent.RemoveChild(node);
            }
        }

    }

    // Save the modified document to a MemoryStream
    MemoryStream outputStream = new();

    doc.Save(outputStream, SaveFormat.Docx);

    // Reset the stream position to the beginning
    outputStream.Position = 0;

    // Return the modified document as a stream
    return outputStream;
}

When I choose page 2 as the selectedStartPage, the first node on page 2 (containing the text “Recall a challenging project and how you overcame obstacles.”) is incorrectly being attributed to page 1 of the document, so it is being removed in error. I checked to see if maybe the start index was on page 1, and the end index was on page 2, but both indexes returned 1 for the node in question.

Here is the document: parsingPhrasesdocx.docx (17.0 KB)

Is there anything I’m doing wrong or should do differently to ensure that the nodes are attributed to the correct pages?

Thank you!

@dgoodspeed Such problems might occur because the fonts used in your input document are not available on the machine where document is processed. The fonts are required to build document layout. If Aspose.Words cannot find the font used in the document, the font is substituted . This might lead into fonts mismatch and document layout differences due to the different fonts metrics and as a result incorrect page detection. You can implement IWarningCallback to get notifications when font substitution is performed.
Please see our documentation to learn where Aspose.Words looks for fonts:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/