Hi
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Thanks for your inquiry. I think, you can use DocumentVisitor to achieve what you need. For example, please try using the following code:
Document doc = new Document(@"Test001\source.doc");
ParagraphResolver resolver = new ParagraphResolver();
doc.Accept(resolver);
doc.Save(@"Test001\out.doc");
doc.Save(@"Test001\out.epub");
=======================================================
private class ParagraphResolver : DocumentVisitor
{
public override VisitorAction VisitParagraphEnd(Paragraph paragraph)
{
// Get next node after the paragraph.
CompositeNode nextNode = (CompositeNode)paragraph.NextSibling;
// If paragraph is empty and the next node is also enpty paragraph, remove the paragraph.
if (!paragraph.HasChildNodes && nextNode != null && !nextNode.HasChildNodes)
{
paragraph.Remove();
}
// If both paragraphs are not empty, concatenate them
else if (paragraph.HasChildNodes && nextNode != null && nextNode.NodeType == NodeType.Paragraph && nextNode.HasChildNodes)
{
// If the next paragraph starts with tab, remove it.
if (nextNode.FirstChild.NodeType == NodeType.Run)
{
Run run = (Run)nextNode.FirstChild;
run.Text = run.Text.StartsWith("\t") ? run.Text.Substring(1) : run.Text;
}
foreach (Node node in nextNode.ChildNodes)
paragraph.AppendChild(node);
}
return VisitorAction.Continue;
}
}
Hope this helps.
Best regards,
Thank you for the prompt reply.
It worked great for the paragraphs.
I am trying to take the same approach for other chars (see attached image - iding chars.JPG)
I am going the route of VisitSpecialChar GetText
however I need to know if there is some documentation that map Aspose GetText return values
to word chars (as in iding chars.JPG)
Thanks in advance...
Hi Brian,
I believe you are looking for the enumerations contained in the ControlChar class.
Thanks,
I added the following function to the above class as an initial test…
Hi
<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />
Thanks for your request. Tabs are not considered as special characters in Word documents. If you need to remove all tabs, you can try using Find and Replace method, like shown below:
Document doc = new Document(@"Test001\in.doc");
doc.Range.Replace("\t", "", false, false);
doc.Save(@"Test001\out.doc");
Best regards,