Search Replace Paragraphs Break Text Box Content in Header Footer of Word Document & Apply Font Formatting using C# .NET

Hi, Support:
How to use this dll to perform some special search and replace operation?
For example:
Task1: Search and delete the words whose fontsize is less 6 pound, or whose font color is lightcolor or graycolor or colorwhite , or which are hidden words.

Task2: Search each paragraphBreack and then replace them with given text.

Task3: search each paragraph and then replace it with 2 chars indent.

Task4: Search each word in the main body and textboxs and then replace it with formatstyle such as new fontname,fontsize,fontcolor…

Task5: Search each word in the headers-footers and then replace it with formatstyle such as new fontname,fontsize,fontcolor…

Task6: Search some paragraphs with background color and then replace them without background color

Task7: Search some highlighted words and then replace them without highlight

Task8: Search words in tables and then replace them with new fontname,fontize,fontcolor

FindOptions

@ducaisoft,

First off, please note that Formatting is applied on a few different levels. For example, let’s consider formatting of simple text. Text in documents is represented by Run element and a Run can only be a child of a Paragraph. You can apply formatting 1) to Run nodes by using Styles e.g. a Glyph Style, 2) to the parent of those Run nodes i.e. a Paragraph node (possibly via paragraph Styles) and 3) you can also apply ‘direct formatting’ to Run nodes by using Run attributes (Font). In this case the Run will inherit formatting of Paragraph Style, a Glyph Style and then direct formatting.

Assuming that the document contains only ‘direct formatting’ then you can achieve this by using the following code: (see sample document: input.zip (19.0 KB))

Document doc = new Document("E:\\Temp\\input.docx");

foreach(Run run in doc.GetChildNodes(NodeType.Run, true))
{
    if (run.Font.Hidden || run.Font.Size<6 || !run.Font.Color.IsEmpty)
    {
        run.Remove();
    }
}

doc.Save("E:\\temp\\20.6.docx");

The following code simulates the above behavior by moving cursor to just before the Paragraph Mark (break) character of a Paragraph and then adds new content:

Document doc = new Document("E:\\Temp\\input.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
builder.MoveToParagraph(0, -1);
builder.Write(" new text at the end of paragraph");
doc.Save("E:\\temp\\20.6.docx");

If this is not what you are looking for then please ZIP and upload sample input and expected Word documents here for our reference.

Please try running the following code (note that indentations of Header/Footer Paragraphs will remain intact):

Document doc = new Document("E:\\Temp\\input.docx");
foreach(Section sec in doc.Sections)
{
    foreach (Paragraph para in sec.GetChildNodes(NodeType.Paragraph, true))
        para.ParagraphFormat.CharacterUnitLeftIndent = 2;
}
doc.Save("E:\\temp\\20.6.docx");

Please try running the following code:

string fontName = "Tahoma";
int fontSize = 14;
Color fontColor = Color.Green;

Document doc = new Document("E:\\Temp\\input.docx");
// To apply formatting to everything (including textboxes) in Body
foreach (Section sec in doc.Sections)
{
    foreach (Paragraph para in sec.Body.GetChildNodes(NodeType.Paragraph, true))
    {
        para.ParagraphBreakFont.Size = fontSize;
        para.ParagraphBreakFont.Name = fontName;
        para.ParagraphBreakFont.Color = fontColor;
        foreach (Run run in para.GetChildNodes(NodeType.Run, true))
        {
            run.Font.Size = fontSize;
            run.Font.Name = fontName;
            run.Font.Color = fontColor;
        }
    }
}

//// To apply formatting to textboxes only
//foreach (Shape shape in doc.GetChildNodes(NodeType.Shape, true))
//{
//    if (shape.ShapeType == ShapeType.TextBox)
//    {
//        foreach (Run run in shape.GetChildNodes(NodeType.Run, true))
//        {
//            run.Font.Size = fontSize;
//            run.Font.Name = fontName;
//            run.Font.Color = fontColor;
//        }
//    }
//}

doc.Save("E:\\temp\\20.6.docx");

You need to do slight modification in the code of Task4 like this:

string fontName = "Tahoma";
int fontSize = 14;
Color fontColor = Color.Green;
Document doc = new Document("E:\\Temp\\input.docx");
foreach (Section sec in doc.Sections)
{
    foreach (HeaderFooter headerFooter in sec.HeadersFooters)
    {
        foreach (Paragraph para in headerFooter.GetChildNodes(NodeType.Paragraph, true))
        {
            para.ParagraphBreakFont.Size = fontSize;
            para.ParagraphBreakFont.Name = fontName;
            para.ParagraphBreakFont.Color = fontColor;
            foreach (Run run in para.GetChildNodes(NodeType.Run, true))
            {
                run.Font.Size = fontSize;
                run.Font.Name = fontName;
                run.Font.Color = fontColor;
            }
        }
    }

    //// to change formatting of Table contents
    //foreach(Table table in doc.GetChildNodes(NodeType.Table, true)){
    //    foreach (Paragraph para in table.GetChildNodes(NodeType.Paragraph, true))
    //    {
    //        para.ParagraphBreakFont.Size = fontSize;
    //        para.ParagraphBreakFont.Name = fontName;
    //        para.ParagraphBreakFont.Color = fontColor;
    //        foreach (Run run in para.GetChildNodes(NodeType.Run, true))
    //        {
    //            run.Font.Size = fontSize;
    //            run.Font.Name = fontName;
    //            run.Font.Color = fontColor;
    //        }
    //    }
    //}
}
doc.Save("E:\\temp\\20.6.docx");

The following code will detect for any Shading background color applied to Paragraphs and then removes it:

Document doc = new Document("E:\\Temp\\input.docx");
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (!para.ParagraphFormat.Shading.BackgroundPatternColor.IsEmpty)
    {
        para.ParagraphFormat.Shading.BackgroundPatternColor = Color.Empty;
    }
}
doc.Save("E:\\temp\\20.6.docx");

Please try the following code:

Document doc = new Document("E:\\Temp\\input.docx");
foreach (Run run in doc.GetChildNodes(NodeType.Run, true))
{
    if (!run.Font.HighlightColor.IsEmpty)
    {
        run.Font.HighlightColor = Color.Empty;
    }
}
doc.Save("E:\\temp\\20.6.docx");

You may also want to look into the following article:

Please let us know if you have any troubles during implementing any of the above tasks and we will be glad to look into this further for you.

Thanks for your demos.
I have worked for the tasks by using those demo codes before, however, those codes are found to be very low efficient when performing those tasks, therefore, I want to look for some high efficient method.
By using those demo code, they work the tasks out by traversing each paragraph or run, which will take a long time that could be tolerable!
for example:
There is a document that has more than 200 pages and has more than 3000 paragraphs. It will take about more than 1 minute to finish the operation by traversing each paragraph or run, whereas, the MS Office Word only need serval second to finish the same task.

Is there any high efficient method to perform those tasks? And could the dll performs the tasks quickly like that of MS Word?

@ducaisoft,

Please ZIP and upload a sample Word document and Aspose.Words’ code (simplified console application without compilation errors) that help us to reproduce/observe this undesired performance issue on our end. We will then investigate the scenario on our end and provide you more information.