Hi,
I am using LayoutCollector to determine the page of a node within a Document, and removing that node if it exists on a page before a specificied page number. Here is the code:
public Stream RemoveUnwantedPages(Stream file, int selectedStartPage = 1)
{
// Load the document from the input stream
Document doc = new(file);
LayoutCollector layoutCollector = new(doc);
layoutCollector.Clear();
doc.UpdatePageLayout();
// Iterate through all nodes and remove content that exists before the selectedStartPage
foreach (Section section in doc.Sections.Cast<Section>())
{
var nodes = section.Body.GetChildNodes(NodeType.Any, true);
foreach (Node node in nodes)
{
int pageIndex = layoutCollector.GetStartPageIndex(node);
int endIndex = layoutCollector.GetEndPageIndex(node);
int pagesSpanned = layoutCollector.GetNumPagesSpanned(node);
if (pageIndex < selectedStartPage) // remove content before the selected start page
{
var parent = node.ParentNode;
parent.RemoveChild(node);
}
}
}
// Save the modified document to a MemoryStream
MemoryStream outputStream = new();
doc.Save(outputStream, SaveFormat.Docx);
// Reset the stream position to the beginning
outputStream.Position = 0;
// Return the modified document as a stream
return outputStream;
}
When I choose page 2 as the selectedStartPage, the first node on page 2 (containing the text “Recall a challenging project and how you overcame obstacles.”) is incorrectly being attributed to page 1 of the document, so it is being removed in error. I checked to see if maybe the start index was on page 1, and the end index was on page 2, but both indexes returned 1 for the node in question.
Here is the document: parsingPhrasesdocx.docx (17.0 KB)
Is there anything I’m doing wrong or should do differently to ensure that the nodes are attributed to the correct pages?
Thank you!