My reqiurement is to convert the line breaks in a docuemnt to paragraph breaks so as to have proper numbering.
private static void NumberAllParagraphs(Node beginingNode, Node endingNode)
{
Node currentNode = beginingNode;
Paragraph paragraph = currentNode as Paragraph;
Regex regex = new Regex(ControlChar.LineBreak);
ReplaceEvaluator repEval = new ReplaceEvaluator(ReplaceActionPerformed);
while (currentNode != endingNode)
{
if (currentNode is Paragraph)
{
//paragraph.Range.Replace(ControlChar.LineBreak, ControlChar.ParagraphBreak , true, false);
//paragraph.Range.Replace(regex, repEval, true);
(currentNode as Paragraph).Range.Replace(regex, repEval, true);
}
currentNode = currentNode.NextSibling;
}
//if (currentNode is Paragraph) (currentNode as Paragraph).Range.Replace(ControlChar.LineBreak, ControlChar.ParagraphBreak, true, false);
// if (currentNode is Paragraph) (currentNode as Paragraph).Range.Replace(regex, repEval, true);
}
static ReplaceAction ReplaceActionPerformed(object sender, ReplaceEvaluatorArgs e)
{
DocumentBuilder builder = new DocumentBuilder(e.MatchNode.Document);
builder.MoveTo(e.MatchNode);
// builder.InsertHtml("");
builder.Write("/r");
e.MatchNode.Remove();
return ReplaceAction.Skip;
//int index = e.MatchNode.Document.FirstSection.Body.Paragraphs.IndexOf(builder.CurrentParagraph);
////insert next paragraph content to current paragraph.
//foreach (Run run in e.MatchNode.Document.FirstSection.Body.Paragraphs[index + 1].Runs)
//{
// builder.CurrentParagraph.AppendChild(run);
//}
////remove next paragraph
//e.MatchNode.Document.FirstSection.Body.Paragraphs[index + 1].Remove();
// return ReplaceAction.Replace;
}
as it can be seen, I tried different things in the ReplaceActionPerformed. but none of that is working…actually when I change the builder.Write("") to some simple text string it works, so it is not wokring for paragraph breaks in particular and by the way for any special characters.
Thanks for your inquiry. In your code you missed that line break can be placed not at the beginning of the matched run. So you should split the matched node. Please try using the following code:
// Open source document.
Document doc = new Document(@"Test001\in.doc");
// Search for line breaks
doc.Range.Replace(new Regex(ControlChar.LineBreak), new ReplaceEvaluator(ReplaceActionPerformed), false);
// Save output document.
doc.Save(@"Test001\out.doc");
static ReplaceAction ReplaceActionPerformed(object sender, ReplaceEvaluatorArgs e)
{
// Create document builder.
DocumentBuilder builder = new DocumentBuilder((Document)e.MatchNode.Document);
// This is a Run node that contains either the beginning or the complete match.
Node currentNode = e.MatchNode;
// The first (and may be the only) run can contain text before the match,
// in this case it is necessary to split the run.
if (e.MatchOffset > 0)
currentNode = SplitRun((Run)currentNode, e.MatchOffset);
// We should remove LineBreak.
Run currentRun = (Run)currentNode;
currentRun.Text = currentRun.Text.Substring(1);
// Move to the run.
builder.MoveTo(currentRun);
// Insert paragraph break.
builder.Writeln();
return ReplaceAction.Skip;
}
///
/// Splits text of the specified run into two runs.
/// Inserts the new run just after the specified run.
///
private static Run SplitRun(Run run, int position)
{
Run afterRun = (Run)run.Clone(true);
afterRun.Text = run.Text.Substring(position);
run.Text = run.Text.Substring(0, position);
run.ParentNode.InsertAfter(afterRun, run);
return afterRun;
}
However even this is not working for me. I tried removing all the line breaks into paragraph breaks using InsertHtml and then using the span tags as well.but still the formatting is screwed up.
Thank you for additional information. Could you please attach your input, output and expected documents here? I will investigate the problem on my side and provide you more information.
Thank you for additional information. First of all, I think you should remove inserted paragraphs is they are empty. I modified the code:
static ReplaceAction ReplaceActionPerformed(object sender, ReplaceEvaluatorArgs e)
{
// Create document builder.
DocumentBuilder builder = new DocumentBuilder((Document)e.MatchNode.Document);
// This is a Run node that contains either the beginning or the complete match.
Node currentNode = e.MatchNode;
// The first (and may be the only) run can contain text before the match,
// in this case it is necessary to split the run.
if (e.MatchOffset > 0)
currentNode = SplitRun((Run)currentNode, e.MatchOffset);
// We should remove LineBreak.
Run currentRun = (Run)currentNode;
currentRun.Text = currentRun.Text.Substring(1);
// Move to the run.
builder.MoveTo(currentRun);
// Insert paragraph break.
builder.Writeln();
if (string.IsNullOrEmpty(builder.CurrentParagraph.ToTxt().Trim()))
builder.CurrentParagraph.Remove();
return ReplaceAction.Skip;
}
However, this modification will not do all you need. In your desired result after “Definitions” item list level is changed, but in other places where line breaks will be replaced with paragraph breaks list level is not changed. How are you going to determine whether changing if the list level is needed or not?
Another thing text starting with “If multiple Force Account Equipment Rates apply” is not a list item, because the source paragraph is not list item.
Thank you for additional information. Could you also show me code where you use this method or create simple application (this would be better), which will allow me to reproduce the problem on my side? I will check the issue and provide you more information.
Thank you for addition information. Unfortunately, I cannot run your code on my side, since you use external assemblies. It seems you use HTML Agility Pack or something else to process your HTML. Could you please simplify your code and create simple application, which I can run on my side. Sorry for inconvenience.