Free Support Forum - aspose.com

Remove content from word document based on selected TOC

Hello,

we have a requirement to remove parts of word doc based on selected TOC headers:

- in the first step we need to display TOC (how to get TOC I found here https://forum.aspose.com/t/60963v)

- in the second step a user should mark TOC headers he needs to remove and selected headers are sent back to the service

- in the third step we should remove selected parts from the document (based on TOC header)

Can you advice a code how to implement the third step?

Thanks.

Mirek

Hi Mirek,


Thanks for your inquiry. Could you please attach the following Word documents here for testing?

Input document: The document in which user has marked TOC headers in the second step.
Target document: The document showing the desired behaviour (with selected parts from the document being removed). It is the document generated after the execution of the third step and you can manually create it using Microsoft Word. We just need to learn as to how you want your final document to be generated like.

I will then investigate the problem on my side and provide you more information.

Best regards,

Hallo,

Attached are two documents:

- Input document contains all TOC headers

- Target document show how document should look like after:

---- the header 2 "Benefits Environment in Canada" has been removed

---- the subheader "Rating" has been removed from header 3 "Underwriting and Rating ArrangementsUnderwriting and Rating Arrangements"It means, user should be able to remove different parts based on header or subheader selection.

Thanks very much for your help.

Mirek

Hi Mirek,


Thanks for your inquiry.

The Table of Content (TOC field) in your document is built based off the content that is formatted with Heading (1, 2, 3 and so on) styles. In this case, all you need is to find the required headings, remove them and update the TOC field. For example, please see the following code:

// Load document

Document doc = new Document(@"C:\Temp\InputDocument.doc");

// Keep track if the Paragraph containing the heading text is found

bool isParagraphFound = false;

// Stores nodes that are to be removed from document

ArrayList nodesToBeDeleted = new ArrayList();

// Get a collection of all paragraph nodes in the document

Node[] paragraphs = doc.GetChildNodes(NodeType.Paragraph, true).ToArray();

foreach (Paragraph paragraph in paragraphs)

{

// We are interested in only Paragraphs with Heading 1 style

if (paragraph.ParagraphFormat.StyleIdentifier.Equals(StyleIdentifier.Heading1))

{

// Filter Heading 1 paras and find one that contains the search string

if (paragraph.Range.Text.StartsWith("Benefits Environment in Canada"))

{

isParagraphFound = true;

// We need to delete all nodes present in between the startPara node

// and the next Paragraph with Heading 1

Paragraph startPara = paragraph;

do

{

nodesToBeDeleted.Add(startPara);

startPara = startPara.NextSibling as Paragraph;

}

while (!startPara.ParagraphFormat.StyleIdentifier.Equals(StyleIdentifier.Heading1));

// Break the main loop as we got the list of nodes to be removed

if (isParagraphFound)

break;

}

}

}

// Remove all nodes

foreach (Node node in nodesToBeDeleted)

node.Remove();

// Re-build the TOC field

doc.UpdateFields();

// Save the final document

doc.Save(@"C:\Temp\out.doc");


I hope, this helps.

Best regards,