Unwanted Empty p nodes during HTML conversion

Goldenking · September 25, 2008, 6:49am

Hi,

I’m using Aspose.words API for java.

I’m writing single line text using DocumentBuilder.write().
But the output HTML has few empty paragraphs around the single line.

how can I avoid that.
Is there any API to achieve this? so that I get only the single line which I had actually written

Kindly help me.

Thanks & regards
Thangaraj

alexey.noskov · September 25, 2008, 7:32am

Hi
Thanks for your inquiry. I tested this on my side and it seems all works fine. Here is my code:

// Create the document and DocumentBuilder
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
// Write text into the document
builder.write("Hello World!!!");
// Save output doc
doc.save("C:\\Temp\\out.html", SaveFormat.HTML);

Here is output HTML.

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Aspose.Words for Java 2.4.2" />
<title></title>
</head>
<body>
<div class="Section1">
<p style="margin-left:0pt; margin-right:0pt; margin-top:0pt; margin-bottom:0pt; "><span style="font-family:'Times New Roman'; font-size:12pt; ">Hello World!!!</span></p>
</div>
</body>
</html>

Could you please provide me your code and attach output document? I will investigate the problem and provide you more information.
Best regards.

Goldenking · September 25, 2008, 8:18am

Thanks for your prompt reply,

Actually my scenario is : I need to write the footer content alone in a separate HTML.

When I tried the following code I got the out.html with unwanted extra p nodes.

Document doc = new Document("onlyFooter.doc");
// Create temporary document
Document tempDoc = new Document();
// Get child nodes from header
NodeCollection footerChildren = doc.getFirstSection().getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_PRIMARY).getChildNodes();
// Loop through all nodes inside header
for (int i = 0; i < footerChildren.getCount(); i++)
{
    // Import node
    Node dstNode = tempDoc.importNode(footerChildren.get(i), true, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    tempDoc.getFirstSection().getBody().appendChild(dstNode);
}
// Save temporary document in HTML fromat
tempDoc.save("out.html", SaveFormat.HTML);

Here I have attached the DOC.

alexey.noskov · September 25, 2008, 9:50am

Hi

Thank you for additional information. The empty document always contains empty paragraph in the body of document (this is by design of MS Words documents). Also your document contains empty Header so in the output HTML document you will get two empty paragraphs. You can just remove these empty paragraphs from the output HTML. You can use regular expressions to achieve this.
Best regards.

Goldenking · September 26, 2008, 12:26am

Thanks for your prompt reply and solution.

Regards
Thangaraj