How to replace hard line breaks (paragraph) with soft line breaks (line feed)

I am converting RTF documents to HTML. I don't have control over the source customer system providing the RTF. The RTF has paragraph breaks where it really should have line breaks. These are resulting as /par int the RTF and

in the HTML output. When this is rendered to text it results in 2 line feeds whereas the browser suppresses the second line feed due to the style attribute. I need these

to be rendered as simply
.


What is the recommended way to convert all paragraphs in the source RTF where p.ParagraphFormat.SpaceAfter == 0 to line breaks? Should I use VisitorAction VisitParagraphStart or some other method? Thanks.

This is more or less the reverse of the following post.
<a href="https://forum.aspose.com/t/61934</a></div>

Hi Derek,


Thanks for your inquiry. Aaspose.Words exports a line break character as
and Paragraphs in Word document are exported as

tags in HTML. I think, you can post process HTML file and replace

string with
. Please let me know if I can be of any further assistance.
Best regards,

My question was how can I do this with Aspose.Words. I prefer to do the processing within the RTF.


Do you see any issue with the code below to do this?

private static void RemoveHardParagraphBreaks(Document asposeDoc)
{
var paragraphCollection = asposeDoc.GetChildNodes(NodeType.Paragraph, true).ToArray();
Paragraph previousParagraph = null;
foreach (Paragraph paragraph in paragraphCollection)
{
if (paragraph.ParentNode.NodeType == NodeType.Table || paragraph.ParentNode.NodeType == NodeType.Cell || paragraph.ParentNode.NodeType == NodeType.Row)
{
previousParagraph = null;
continue;
}
//Remove blank lines.
//if (paragraph.ChildNodes.Count == 0 && paragraph.ParentNode != null)
//{
// paragraph.Remove();
// continue;
//}
if (previousParagraph != null)
{
if ((int)paragraph.ParagraphFormat.SpaceBefore == 0 && (int)paragraph.ParagraphFormat.SpaceAfter == 0)
{
foreach (Node node in paragraph.ChildNodes)
{
previousParagraph.AppendChild(node.Clone(true));
node.Remove();
}
previousParagraph.AppendChild(new Run(asposeDoc, ControlChar.LineBreak));
paragraph.Remove();
}
else
previousParagraph = paragraph;
}
else
previousParagraph = paragraph;
}
}

Hi Derek,


Thanks for the additional information. Please attach your sample Word document and expected HTML file for our reference. We will investigate the issue on our end and provide you code to achieve this.

Best regards,

Sample files attached to original message. Thanks.

Hi Derek,


Thanks
for sharing the detail. Please use the following code example to replace Paragraph break with line break. Hope this helps you. Please let us know if you have any more queries.


Document doc = new Document(MyDir + "SampleLineBreakIssue.rtf");

Paragraph firstParagraph = doc.FirstSection.Body.FirstParagraph;

Node[] paras = doc.GetChildNodes(NodeType.Paragraph, true).ToArray();

foreach (Paragraph para in paras)

{

if (firstParagraph == para)

continue;

if (para.GetAncestor(NodeType.Cell) != null)

{

Cell cell = ((Cell)para.GetAncestor(NodeType.Cell));

if (cell.ParentRow.IsLastRow && cell.IsLastCell && cell.LastParagraph == para)

{

Node node = para.NextPreOrder(doc);

while (node.NodeType != NodeType.Paragraph)

node = node.NextPreOrder(doc);

if (node == null)

break;

firstParagraph = (Paragraph)node;

continue;

}

firstParagraph = cell.FirstParagraph;

continue;

}

// Create a run with line break.

Run lineBreak = new Run(doc, "\v");

// Insert it at the end of the curent paregraph.

firstParagraph.AppendChild(lineBreak);

// copy content of the previouse paragraph into the cirrent.

while (para.HasChildNodes)

firstParagraph.AppendChild(para.FirstChild);

// Remove next paragraph.

para.Remove();

}

doc.Save(MyDir + "Out.html");