I am having a problem with importing HTML documents that contain the PRE tag. The PRE tag is a block element in HTML and creates a line break when displayed in the browser. When I load a html into Aspose.Words to create it as a word document, it is loading the PRE tags as inline, not creating the line break between them.
Is there a way to enforce the PRE tag behavior as it should be within HTML when loading through Aspose.Words?
Look at the below in a browser, and then how it looks after processing:
<html>
<head />
<body>
<P>Before test<PRE>first PRE block</PRE><PRE>second PRE block</PRE>after test</P>
</body>
</html>
Thanks for your request.I managed to reproduce the problem on my side. Your request has been linked to the appropriate issue.You will be notified as soon as it is resolved.
As a workaround, I think, you can try pre-process your HTML.For example, you can use code like the following:
string html = File.ReadAllText("yourHtml.html");
html = html.Replace("<PRE>", "<P>");
html = html.Replace("</PRE>", "");
// Create new document
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
// Insert modified HTML
builder.InsertHtml(html);
// Save the output as DOC
doc.Save("out.doc");
Thank you for your prompt answer. For a workaround, we’re doing something similar to what you recommended. We can’t lose the rest of the PRE behavior, so we’re doing this:
html = html.Replace("<PRE>", "<P><PRE>");
html = html.Replace("</PRE>", "</PRE></P>");