I am creating a word document. From that I have to add a html content (file) string with starting and ending with hidden tags. Starting tag is add along with the html content. But between end tag and html content contains an extra line space that i don’t want. It create a headache for me. Please help me to remove that extra line space.
I am attaching my source code along with output and expected output. Extra space issue poc.zip (38.5 KB)
Please note that minimal valid Body node needs to contain at least one Paragraph. So when you create the document from HTML, an empty paragraph exists at the end of document. You can remove it using following modified code before calling DocumentBuilder.insertDocument method.
public static Document generateDocument(Document document) throws Exception {
// dstDoc.protect(ProtectionType.READ_ONLY);
// Creating builder for the document
DocumentBuilder builder = new DocumentBuilder(document);
try {
insertHiddenWord(builder,"t", false);
ByteArrayInputStream bais = new ByteArrayInputStream(description().getBytes());
LoadOptions opts = new LoadOptions();
opts.setLoadFormat(LoadFormat.HTML);
Document tempDoc = new Document(bais, opts);
if(!tempDoc.getLastSection().getBody().getLastParagraph().hasChildNodes())
tempDoc.getLastSection().getBody().getLastParagraph().remove();
builder.insertDocument(tempDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
insertHiddenWord(builder,"t", true);
}catch (Exception e) {
System.out.println("Error while insert html to the doc");
}
return document;
}
My html content is coming from customer side, so it may contains extra line break at the end.
So if I execute the given code, I may loose the last empty paragraph that added by the customer. I want to print the word document with the exact html that provided by the customer.
In this case, we suggest you please use DocumentBuilder.InsertHtml(String, HtmlInsertOptions) method to insert the HTML. You need to use second parameter as HtmlInsertOptions.RemoveLastEmptyParagraph. This option removes the empty paragraph that is normally inserted after HTML that ends with a block-level element.
The code you suggested is removing the extra line break at the end. But as you can go through the code, I am binding html in the form of builder.insertDocument to avoid the line break in the starting (line break between starting tag and html). If I go through with this solution then, html will start with an extra space. I will attach updated source code with your solution and the output Extra space issue poc (2).zip (27.5 KB)
.
Please share the input document that your customers are using along with problematic output document. We will check your documents and write the code example according to your requirement.
I am uploading changed sample code with importing word content in html form. Attaching the input document input.docx (15.1 KB) where I have added two content wrapped with tags. In that contents inside the starting(|t1| or |t2|) and ending tag (|/t1| or |/t2|) will be taken and converting it to html and store it in the db. Same html I will be taken and created as a document and download it. So downloaded document should be same as the uploaded one. But when i try with my code it generating an extra line break after the table output.docx (8.7 KB) (Please refer the first case). If I go with the your above solution, then the line break after the table (please refer second case) will remove (that I want in the downloaded one). My input and output document should be same.
I tried your solution but my output document is attaching output.docx (8.6 KB). In the input document second case, there is a line break after the table. It is missing in the output. I am attaching the sample code wrdHtmlWithReplacePoc (2).zip (51.0 KB) with updated with your solution.
We have reviewed your code and noticed that the document generated after extracting contents does not contain the last empty paragraph. We are investigating this issue and will get back to you soon.
Please use the following modified method to get the desired output. You can find the modified code between comment //Modified code.... We have attached the output document with this post for your kind reference. 21.9 output.docx (8.7 KB)
public static Document generateDocument(List<String> htmlList) throws Exception {
Document document = new Document();
DocumentBuilder builder = new DocumentBuilder(document);
for (int i=0;i<2; i++){
String html = htmlList.get(i);//.replace("-aw-import:ignore", "");
System.out.println(html);
try {
insertHiddenWord(builder, "t" +( i + 1), false);
ByteArrayInputStream bais = new ByteArrayInputStream(html.getBytes());
LoadOptions opts = new LoadOptions();
opts.setLoadFormat(LoadFormat.HTML);
Document tempDoc = new Document(bais, opts);
//Modified code...
if(tempDoc.getLastSection().getBody().getLastParagraph().toString(SaveFormat.TEXT).trim().length() == 0)
tempDoc.getLastSection().getBody().getLastParagraph().remove();
//Modified code...
builder.insertDocument(tempDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
insertHiddenWord(builder, "t" + (i + 1), true);
} catch (Exception e) {
System.out.println("Error while insert html to the doc");
}
}
return document;
}
In our system we are using ckeditor in UI. So while creating document, sometimes html is coming from ckeditor. My problem is if any table added as a last item in html, then in downloaded document after table is not taking the paragraph after spacing. It coming congested without any space. But if table is in middle it coming properly. uploading the sample output.docx (8.8 KB). Attaching the sample code with one ckeditior output html wrdHtmlWithReplacePoc (3).zip (41.6 KB)
Please note that Aspose.Words mimics the behavior of MS Word. If you perform the same scenario using MS Word, you will get the same output.
The paragraph space after is set for paragraphs in HTML. You can use ParagraphFormat.SpaceAfter property as shown below to get the desired output. Hope this helps you.
I have tried above solution and added a default space after value 12. But I couldn’t find any change in my output. Attaching the output output.docx (8.8 KB). Also attaching the sample code updated the above solution wrdHtmlWithReplacePoc (4).zip (41.4 KB)
Please check the attached screenshot. The paragraph space after is 0.0 for desired paragraph. You can use the same approach to set the paragraph properties for all others paragraphs. space after.png (73.3 KB)
I think you are not understand my problem. In the output first case (between hidden character t1), with table at the end not have the paragraph space. you can see there output.docx (8.8 KB) only the particular table end is little congested. Other all maintains a paragraph space. This issue happening only if the table is at the end. I want the end table also maintain the same paragraph space after value.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.