How to remove the caption in document

Hi Team,

I have extracted the image from Docx. how to remove the caption in the document. kindly help me.

Document: Bookmark1.docx (161.2 KB)

Thanks in advance

Regards,
Mahi

@Mahesh39 Could you please give more details about what you want to achieve? And please provide the desired output if possilbe.

Hi @Konstantin.Kornilov,

Thanks for your response

Sorry for the delayed response. kindly find the required output.

Input: Bookmark1.docx (161.2 KB)

Sample output: sample_output.docx (165.7 KB)

Regards,
Mahesh

@Mahesh39 The general solution to your request depends on the documents structure you want to process. Particularly you need a way to identify paragraphs containing the caption. For the provided document following options could be used:

  1. You could identify caption paragraphs as two paragraphs prior to first shape node. But please note that in the provided document the actual shape consists of several shapes nodes in several paragraphs. So it may be difficult to identify the caption this way for other documents of similar formatting.
var firstShape = doc.GetChild(NodeType.Shape, 0, true);
var firstShapePara = firstShape.GetAncestor(NodeType.Paragraph);
firstShapePara.PreviousSibling.Remove();
firstShapePara.PreviousSibling.Remove();
  1. You could identify caption paragraphs as paragraphs with OutlineLevel=0. Please note that for other documents there may be other paragraphs with OutlineLevel=0 which are not related to caption.
var paraToRemove = doc.GetChildNodes(NodeType.Paragraph, true)
    .Where(para => ((Paragraph)para).ParagraphFormat.OutlineLevel == 0);
foreach (var node in paraToRemove)
    node.Remove();

Also if you have control over the creation of the document you could set up an easier way to determine caption paragraphs like using specific style or marking them with bookmarks.