I need to remove all blank lines from the top of each page of a
word doc generated via Aspose word.
Hi Nitin,
Thanks for your inquiry. You can first check whether a Paragraph has any child nodes and then remove it if it does not has any children. Please try run the following code snippet:
Document doc = new Document(@"C:\Temp\in.docx");
Node[] nodes = doc.GetChildNodes(NodeType.Paragraph, true).ToArray();
foreach(Paragraph paragraph in nodes)
if (!paragraph.HasChildNodes)
paragraph.Remove();
doc.Save(@"C:\Temp\out.docx");
I hope, this helps.
Best regards,
Hi Awais
See the attached document.
This doc was completely generated with Aspose Word (v13.2.0.0)
I am still getting a blank line above the header text (Heading 1 and Heading 2),
and also above a paragraph on a new page.
Hi Nitin,
Thanks for the additional information. Well, there are two types of blank lines in your attached document.
- Blank lines caused by page breaks at the end of each page: The presence of page breaks at the end of each page means the paragraph that is on the bottom of the previous page and on the top of the next page are the same paragraph (a page break is just a character so the paragraph spans across the two pages). This means Paragraph.HasChildNodes will return ‘true’ and because of this the code posted in my previous post doesn’t remove it. To overcome this problem, please try run the following code snippet:
Document doc = new Document(@"C:\Temp\remove_blank_lines_from_word_doc_test.docx");
ArrayList nextParagraphs = new ArrayList();
NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
foreach(Paragraph para in paragraphs)
{
if (para.ParagraphFormat.PageBreakBefore)
para.ParagraphFormat.PageBreakBefore = false;
foreach(Run run in para.Runs)
{
if (run.Text.Contains(ControlChar.PageBreak))
{
run.Text = run.Text.Replace(ControlChar.PageBreak, string.Empty);
nextParagraphs.Add(para.NextSibling);
}
}
}
foreach(Paragraph para in nextParagraphs)
if (para != null)
para.Runs.Insert(0, new Run(doc, ControlChar.PageBreak));
doc.Save(@"C:\Temp\out.docx");
- Blank lines caused by “\r” character: It’s relatively easy to spot them, please use the following code to remove such Paragraphs:
Document doc = new Document(@"C:\Temp\remove_blank_lines_from_word_doc_test.docx");
Node[] nodes = doc.GetChildNodes(NodeType.Paragraph, true).ToArray();
foreach(Paragraph paragraph in nodes)
if (!paragraph.HasChildNodes && paragraph.Range.Text.Equals(ControlChar.Cr))
paragraph.Remove();
doc.Save(@"C:\Temp\out.docx");
I hope, this helps.
Best regards,
Hi Awaise,
This code remove ALL blank lines in the doc.
Even blank lines between paragraphs.
What i need is to remove ONLY the blank lines at the top of each page (if they exist).
Please see attached images,
Thanks
Hi Nitin,
Thanks for your inquiry. Please attach this Word document here for testing? I will investigate the problem further and provide you code to remove those lines.
Best regards,
Hi Awaise,
Here’s the attachement…
So, i want to remove all blank lines at the top of each page which causes unnecessary space between the top edge and the start of the content.
As you can see in the attached doc, sometimes there is one blank line
and sometimes there are more.
Thanks
Hi Nitin,
Thanks for the additional information. I am working over your query and will get back to you soon.
Best regards,
How about something like this?
public Document RemoveLeadingBlanks(Document doc)
{
bool AllWhiteSpace = true;
foreach(Section sec in doc.Sections)
{
foreach(Paragraph p in sec.Body.Paragraphs)
{
foreach(Run r in p.Runs)
{
foreach(char c in r.GetText())
if (!Char.IsWhiteSpace(c))
{
AllWhiteSpace = false;
return doc;
}
if (AllWhiteSpace)
r.Remove();
}
if (AllWhiteSpace)
p.Remove();
}
}
return doc;
}
Awais - any chance you can look into my issue?
Thanks
Hi Nitin,
Thank you for being patient. Please try run the following code snippet to be able to achieve what you’re looking for:
Document doc = new Document(@"C:\Temp\test.docx");
// pageBreakParagraphs holds Paragraphs containing the PageBreaks
ArrayList pageBreakParagraphs = new ArrayList();
Node[] paragraphs = doc.GetChildNodes(NodeType.Paragraph, true).ToArray();
foreach(Paragraph para in paragraphs)
{
if (para.ParagraphFormat.PageBreakBefore)
para.ParagraphFormat.PageBreakBefore = false;
foreach(Run run in para.Runs)
{
if (run.Text.Contains(ControlChar.PageBreak))
{
run.Text = run.Text.Replace(ControlChar.PageBreak, string.Empty);
pageBreakParagraphs.Add(para);
}
}
}
// paragraphsToRemove contains the empty Pargarphs at the start of each Page that you want to remove
ArrayList paragraphsToRemove = new ArrayList();
foreach(Paragraph para in pageBreakParagraphs)
{
if (para.NextSibling != null)
{
Paragraph tempPara = (Paragraph) para.NextSibling;
while (tempPara.Range.Text.Equals(ControlChar.Cr))
{
paragraphsToRemove.Add(tempPara);
tempPara = (Paragraph) tempPara.NextSibling;
}
}
}
// Remove these Pargraphs
foreach(Paragraph para in paragraphsToRemove)
para.Remove();
// Insert the PageBreak at the next non-empty Paragraph
foreach(Paragraph para in pageBreakParagraphs)
if (para.NextSibling != null)
((Paragraph) para.NextSibling).ChildNodes.Insert(0, new Run(doc, ControlChar.PageBreak));
doc.Save(@"C:\temp\out.docx");
I hope, this helps.
Best regards,
Hi Awais,
That worked!!
Thank you so much…
You guys are the best!
Hi Nitin,
Thanks for your feedback. Please let us know any time you have any further queries. We are always glad to help you.
Best regards,