Hello, I’m making the transformation from HTML to Word document with aspose.words in java using a .dotx and the first page of the transformed word document is always empty, the document starts from the second page. Do you know maybe why this happens?
Thank you in advance!
The issue of the first page being empty after transforming HTML to a DOCX document using Aspose.Words may stem from the content structure in your HTML or the template (.dotx) you are using. Here are several aspects to investigate:
-
HTML Content: Verify that your HTML does not contain any leading empty elements or unnecessary breaks that could result in an empty first page.
-
Template Settings: If you are utilizing a .dotx template, examine the template for any settings or content that might be causing the first page to appear empty. Templates can have predefined margins, headers, or footers that may influence the layout.
-
Page Breaks: Check for any page breaks in your HTML that could be forcing content to start on the second page.
-
Document Structure: Consider creating a new document without a template to see if the issue persists. This can help identify whether the problem is related to the template or the HTML content itself.
If the problem continues, please share a minimal example of your HTML and the code you are using for the transformation, so that further assistance can be provided.
@Thomai_Chelidoni Could you please provide your input and output document along with code that will allow us to reproduce the problem? We will check the issue and provide you more information.
Yes, of course I’m also sharing with you the .dotx.
Input:
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>fedlex-data-test</title>
</head>
<body>
<div>
<div>
<p class="main-title">Ordonnance <br>sur la responsabilité civile en matière nucléaire</p>
<p class="second-title">(ORCN)</p>
</div>
<div id="preamble">
<div>
<p class="man-author">Le Conseil fédéral suisse,</p>
<p class="ingress-man">vu l’art. 31, al. 1, de la loi du 13 juin 2008 sur la responsabilité civile en matière nucléaire (LRCN)<a name="_ftnref1"/>
<sup>
<a href="#_ftn1" id="fnbck-d21664e25">1</a>
</sup>,</p>
<p class="man-verb">arrête:</p>
<div class="footnotes">
<div id="_ftn1" style="-aw-footnote-isauto:1">
<p class="footnotes">
<sup>
<a href="#_ftnref1">1</a>
</sup>
<sup/>
<a href="#_ftnRS %0A " data-rs-uri="https://fedlex.data.admin.ch/eli/cc/2022/43" data-rs="732.44" data-lang="fr">RS <b>732.44</b>
</a>
</p>
</div>
</div>
</div>
</div>
<main id="maintext">
<section id="sec_1">
<h1> Section 1 Montant total de la couverture </h1>
<div class="collapseable">
<article id="art_1">
<h2>
<b>Art. 1</b> En général </h2>
<div class="collapseable">
<p class="man-referenz"> (art. 8, al. 2, LRCN)</p>
<p class="absatz-man">Le montant total de la couverture est de 1200 millions d’euros, auxquels s’ajoutent 10 % de ce montant pour les intérêts et pour les coûts alloués par une autorité judiciaire:<a name="_ftnref2"/>
<sup>
<a href="#_ftn2" id="fnbck-d21664e52">2</a>
</sup>
</p>
<p class="Struktur-1">a. ως «Φορολογική Διοίκηση» νοείται η Ανεξάρτητη Αρχή Δημοσίων Εσόδων (Α.Α.Δ.Ε.);</p>
<p class="Struktur-1">c. par transport de: </p>
<p class="Struktur-2">1. combustibles nucléaires irradiés dont le poids total des substances nucléaires est supérieur à 100 kg,</p>
<p class="Struktur-2">2. Το π.δ. 142/2017 «Οργανισμός του Υπουργείου Οικονομικών», (Α ́181).</p>
<div class="footnotes">
<div id="_ftn2" style="-aw-footnote-isauto:1">
<p class="footnotes">
<sup>
<a href="#_ftnref2">2</a>
</sup>
<sup/> Nouvelle teneur selon le ch. I de l’O du 23 nov. 2022, en vigueur depuis le 1<sup>er</sup> janv. 2023 (<a href="#_ftnRO%C2%A0%C2%A0812">RO <b>2022</b> 812</a>).</p>
</div>
</div>
</div>
</article>
</div>
</section>
</main>
</body>
</html>
Output:
TestCase_final.docx (24.8 KB)
Java Code:
package org.example;
import com.aspose.words.*;
import com.aspose.words.Document;
import com.aspose.words.HtmlLoadOptions;
import com.aspose.words.ImportFormatMode;
import java.io.*;
import java.util.*;
public class Main {
public static void main(String[] args) throws Exception {
try {
// Load Aspose license
License license = new License();
license.setLicense("licenses/Aspose.Words.Java.lic");
// Load your template (DOTX with custom styles)
Document template = new Document("template/TestCase_Template.dotx");
// Load HTML as a separate document
HtmlLoadOptions options = new HtmlLoadOptions();
options.setLoadFormat(LoadFormat.HTML);
Document htmlDoc = new Document("input/TestCase.html", options);
// Append HTML into template first
template.appendDocument(htmlDoc, ImportFormatMode.USE_DESTINATION_STYLES);
// Define your mapping between HTML classes and DOTX styles
Map<String, String> classToStyle = new HashMap<>();
classToStyle.put("main-title","Main Title");
classToStyle.put("second-title","Second Title");
classToStyle.put("man-author","Man author");
classToStyle.put("ingress-man","Ingress");
classToStyle.put("man-verb","Man Verb");
classToStyle.put("footnotes","Footnotes");
classToStyle.put("Heading 1","Heading 1");
classToStyle.put("man-referenz", "Man Referenz");
classToStyle.put("absatz-man","Absatz Man");
classToStyle.put("Struktur-1","Str 1");
classToStyle.put("Struktur-2","Str 2");
classToStyle.put("Heading 2","Heading 2");
classToStyle.put("man-tab","Man Tab");
for (Paragraph para : (Iterable<Paragraph>) template.getChildNodes(NodeType.PARAGRAPH, true)) {
String className = para.getParagraphFormat().getStyleName();
if (classToStyle.containsKey(className)) {
String styleName = classToStyle.get(className);
Style targetStyle = template.getStyles().get(styleName);
if (targetStyle != null) {
para.getParagraphFormat().setStyle(targetStyle);
}
}
}
//Save DOCX
htmlDoc.save("output/Testing3.docx", SaveFormat.DOCX);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Dotx Template:
TestCase_Template.docx (17.2 KB)
Thank you in advance!
@Thomai_Chelidoni You should remove empty section from your template before appending document loaded from HTML to it. Please try modifying your code like this:
// Append HTML into template first
template.removeAllChildren();
template.appendDocument(htmlDoc, ImportFormatMode.USE_DESTINATION_STYLES);
Thank you for your response, it’s working for the first empty page but it doesn’t take the custom styling from the dotx in this way.
@Thomai_Chelidoni have you tied using DocumentBuilder.insertHtml
method to insert HTML into template instead loading HTML into a separate document and appending it?