Extract Content between two placeholders and convert extracted content to html

Hi there. i am new to aspose and require some help.
Requirement:- I have one DocX file (PFA source.docx) which contains ==START== and ==END== placeholder (just a simple text as per attachment). All i want is extract content between Those two placeholder, get the HTML of that extracted body content and replace content along with ==START== and ==END== to BODY_CONTENT in actual docx file. all other part of docx should be remain same (as per required.docx file).

My idea is use ExtractContentBetweenParagraphs to get content and pass them to new Document(in memory) and save that new doc as HTML.

Questions:-

  1. how to get paragraph number of ==START== and ==END==?
    2)how to replace Text of ==START== with BODY_CONTENT without changing format and remove content with ==END==

Attchmt.zip (17.1 KB)

@parthu,

You can achieve this by implementing the following workflow:

Document doc = new Document(MyDir + @"source.docx");

// 1. Find start and end Paragraphs
Paragraph start = null;
Paragraph end = null;

foreach(Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (para.ToString(SaveFormat.Text).StartsWith("==START=="))
    {
        start = para;
        start.Runs[0].Text = "BODY_CONTENT";
    }
    if (para.ToString(SaveFormat.Text).StartsWith("==END=="))
    {
        end = para;
        end.Runs[0].Text = "";
    }
}

if (start != null && end != null)
{
    // 2. Extract content
}

// 3. Remove content between start and end paragraphs

doc.Save(MyDir + @"18.1.docx");

Thanks for solution @awais.hafeez… one more question based on your ans that how do i Remove content between start and end paragraphs in one shot.

@parthu,

Please try using the following code:

...
...
...
if (start != null && end != null)
{
    // 2. Extract content
}

// 3. Remove content between start and end paragraphs
if (start != null && end != null)
{
    DocumentBuilder builder = new DocumentBuilder(doc);
    builder.MoveTo(start);

    BookmarkStart bmStart = builder.StartBookmark("bm");
    BookmarkEnd bmEnd = builder.EndBookmark("bm");

    end.AppendChild(bmEnd);

    bmStart.Bookmark.Text = string.Empty;
    bmStart.Bookmark.Remove();
}
...
...
...

Thanks for a solution. @awais.hafeez

Hey @awais.hafeez that delete bookmark stuff is also deleting BODY_CONTENT which we have previously set while fetching start paragraph. i have tried with builder.MoveTo(start.getNextSibling()); also but didn’t worked for me.

@parthu,

Thanks for your inquiry. To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input Word document
  • Aspose.Words generated output document showing the undesired behavior
  • Your expected Word document showing the correct output
  • Please create a standalone simplified console application (source code without compilation errors) that helps us reproduce your specific problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we’ll start further investigation into your issues and provide you more information. Thanks for your cooperation.