We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Remove Paragrah Breaks

I have a set of documents in which each have an unwanted page break and two empty paragraph breaks that follow it. I have found the sample code to remove the page breaks and that works great. However, I cannot seem to figure out how to remove the trailing paragraph breaks.

Can you help? Attached is a sample document.

My code (as you can see I tried to set RemoveEmptyParagraphs and execute a MailMerge, that didn't work):

foreach (FileInfo fi in diSourceDir.GetFiles("*.doc"))

{

Document langDoc = new Document(fi.FullName);

NodeCollection runs = langDoc.GetChildNodes(NodeType.Run, true);

foreach (Run run in runs)

{

while (run.Text.IndexOf(ControlChar.PageBreakChar) >= 0)

run.Text = run.Text.Remove(run.Text.IndexOf(ControlChar.PageBreakChar), 1);

}

//langDoc.MailMerge.RemoveEmptyParagraphs = true;

//DataTable dtFake = new DataTable();

//dtFake.Columns.Add("Col1");

//dtFake.Rows.Add("junk");

//dtFake.AcceptChanges();

//langDoc.MailMerge.Execute(dtFake);

langDoc.Save(txtChargeLanugageDir.Text + @"\Converted\" + fi.Name);

}

Hi Joe,

Is the document you attached your template or your output? If it’s your template you could just look into removing the page break and paragraph from inside Microsoft word, which would save you having to do it programmatically. To do this turn on the formatting marks and then delete the page break and empty paragraphs.

If that’s not the case then you could try to remove the empty paragraphs in your code by using something like this:

NodeList paras = doc.SelectNodes("//Paragraph");

for (int i = paras.Count - 1; i >= 0; i--)<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

{

Paragraph para = (Paragraph)paras[i];

if (para.Runs.Count == 0)

para.Remove();

}

Please ask if you have any further queries.

Thanks,

Thanks for the quick reply Adam.

The doc attached is a template used in my application, but the purpose here is to run a one time fix on a directory of several hundred of these.

Is there a way to start removing empty paragraphs only after the page break is found and removed? I'm quite certain that in each of these docs I can simply truncate everything after the page break. I've searched on ways to do that with no luck.

I will work with your sample, thanks again!

Hi Joe,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

I see, it’s understandable then you don’t want to do this manually.

You can try using a code structure like this below to acheive deleting paragraphs only after the page break.

NodeCollection nodes = doc.GetChildNodes(NodeType.Any, true);

bool delete = false;

foreach (Node node in nodes)

{

if (node.NodeType == NodeType.Run)

{

Run run = (Run)node;

while (run.Text.IndexOf(ControlChar.PageBreakChar) >= 0)

{

run.Text = run.Text.Remove(run.Text.IndexOf(ControlChar.PageBreakChar), 1);

delete = true;

}

}

else if (node.NodeType == NodeType.Paragraph)

{

if (delete)

{

Paragraph para = (Paragraph)node;

para.Remove();

}

}

}

Thanks,