Html to Pdf using DocumentBuilder (Aspose.Words)

Hi,
I am using DocumentBuilder for creating PDF file from html, but the pdf generated is not the same as html.
If I am converting html to pdf directly without using DocumentBuilder the PDF is proper.
Attached test html.
Regards,
Divesh Salian

Hi Divesh,

Thanks for your inquiry. Perhaps, you are using an older version of Aspose.Words; as with Aspose.Words v11.11.0, I am unable to reproduce this problem on my side. I would suggest you please upgrade to the latest version of Aspose.Words i.e. v11.11.0 and let us know how it goes on your side. I hope, this will help.

I have used the following code snippet to generate PDF file. Please find the output PDF files in attachment.

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
builder.InsertHtml(File.ReadAllText(MyDir + "Test07.html"));
doc.Save(MyDir + "out.pdf");

Document doc2 = new Document(MyDir + "Test07.html");
doc2.Save(MyDir + "out2.pdf");

Hi,Thanks for the update.Am using the latest version…
Am not reading the file directly as we need to do some intermdeiate process we are going line by line…
here is the code Let me know if am missing something

string w = Server.MapPath("~/") + @"Styles\Aspose.Total.lic";
Aspose.Words.License licenseWord = new Aspose.Words.License();
licenseWord.SetLicense(w);
List HtmlLines = new List();
StreamReader srNew = new StreamReader(@"C:\Users\divesh\Desktop\Desktop\DummyTest.html");
string line;
string strNew = string.Empty;
while ((line = srNew.ReadLine()) != null)
{
    strNew += line;
    HtmlLines.Add(line);
}
srNew.Close();
Aspose.Words.Document doc = new Aspose.Words.Document();
DocumentBuilder builder = new DocumentBuilder(doc);
for (int i = 0; i <HtmlLines.Count; i++)
{
    builder.InsertHtml(HtmlLines[i]);
} 
doc.Save(@"C:\Users\divesh\Desktop\Desktop\TestCreateHtmlViaBuilder.pdf");

attached pdf

Hi Divesh,
Thanks for sharing the details. The behavior of DocumentBuilder.InsertHtml method is correct in the shared scenario. DocumentBuilder.InsertHtml method inserts HTML string into the document. You can use InsertHtml to insert an HTML fragment or whole HTML document.
If you read html file line by line, there is possibility to get the html like only "<TR>", "<TD> tag or "font-family: Verdana; font-size: 10pt; color: #333333; line-height: 1.5em;" in a line (see the attached image) which is not a complete HTML fragment. Please use a complete HTML fragment in DocumentBuilder.InsertHtml method as shown below:

builder.InsertHtml("<P align='right'>Paragraph right</P>" + 
    "<b>Implicit paragraph left</b>" + 
    "<div align='center'>Div center</div>" + 
    "<h1 align='left'>Heading 1 left.</h1>");

If you have scenario in which you are reading html line by line, you can combine html lines and use InsertHtml method as shown in following code snippet.

List<string> HtmlLines = new List<string>();
StreamReader srNew = new StreamReader(MyDir + @"DummyTest.html");
string line; 
string strNew = string.Empty;

while ((line = srNew.ReadLine()) != null)
{ 
    strNew += line;HtmlLines.Add(line); 
}
srNew.Close();

Aspose.Words.Document doc = new Aspose.Words.Document();
DocumentBuilder builder = new DocumentBuilder(doc);

StringBuilder sb = new StringBuilder();
for (int i = 0; i < HtmlLines.Count; i++)
{
    sb.Append(HtmlLines[i]);
}

builder.InsertHtml(sb.ToString()); 

doc.Save(MyDir + "out.pdf");

Hope this answers your query. Please let us know if you have any more queries.

Thanks for the input…
We need to add the comment(Ms word comment) as well on particular text so we are using line by apporach.
Logic is such that…
Consider a string…
Hello world
Aspose helps me a lot
File conversion is very easy.
So if I need to add a comment on “helps” we use a logic like…
Insert first line hello world directly into builder as there is no comment
Since we need to add a comment on helps… we split the string as

  1. Aspose
    2.helps
  2. me a lot

add “Aspose” directly into builder
Add a comment on “helps” then add it to builder.
and then “add me a lot”…
beacuse the problem u mentioned above we are not able to get the pdf proper
Kindly help in accomplish this logic
Regards,
Divesh Salian

Hi Divesh,

Thanks for sharing the detail. In this case, I suggest you please insert the complete HTML into your main document and then find the Paragraph node to which you want to add comments. After that add comments to that Paragraph node. Following code example show how to add comments to a specific Paragraph.

You can use the same appraoch shared at following documentation link to find specific text.
https://docs.aspose.com/words/java/find-and-replace/

Please let us know if you have any more queries.

// Open an existing document to add comments to a paragraph.
Document doc = new Document(MyDir + "in.docx");
Node[] nodes = doc.GetChildNodes(NodeType.Paragraph, true).ToArray();
// E.g this is the Paragraph to which comments will added
Paragraph paragraph = (Paragraph) nodes[2];
DocumentBuilder builder = new DocumentBuilder(doc);
// Create a Comment.
Comment comment = new Comment(doc);
// Insert some text into the comment.
Paragraph commentParagraph = new Paragraph(doc);
commentParagraph.AppendChild(new Run(doc, "This is comment!!!"));
comment.AppendChild(commentParagraph);
// Move to paragraph where comments will be added
builder.MoveTo(paragraph);
// Insert comment
builder.InsertNode(comment);
// Save output document.
doc.Save(MyDir + "AsposeOut.docx");

Hi,
Via the above mentioned logic I will get only the text… Is there any way I can find the html tags along with it.
Regards,
Divesh Salian

Hi Divesh,

Thanks for your inquiry. Aspose.Words do not provide any HTML parser. In your case, you need to write your own HTML parser and insert correct HTML into document by using DocumentBuilder.InsertHtml method.

*divesh_iris:
Logic is such that…
Consider a string…

Hello world

Aspose helps me a lot

File conversion is very easy.
So if I need to add a comment on “helps” we use a logic like…
Insert first line hello world directly into builder as there is no comment*

You can also achieve your requirement by following the logic mention below:

  • Insert specific tags / text like <Comment your comments > with text to which you want to comment e.g “helps”

  • The output text will be like “helps” <Comment * your comments *>

  • Combine whole html as describe above

  • Insert complete html by using DocumentBuilder.InsertHtml method into your document

  • At this position you will have text like (<Comment your comments>) with each specific text to which you want to add comment.

  • At the end, You can use the same approach shared at following documentation link to find specific text and add comments by using the code mentioned before in this thread.

https://docs.aspose.com/words/java/find-and-replace/

Moreover, I suggest you please check the code mentioned at following forum thread for find and replace text.
https://forum.aspose.com/t/29656
https://forum.aspose.com/t/49898

Hope this answers your query. Please let us know if you have any more queries.

Thanks for the help…
As per your previous comment I added a special text where I need to add a comment.
The comment which is getting added on the document is at the end.Its not coming on that paragraph text.
Do I need to set the text for new paragraph on which comment need to be added ??
I made us of Run class to set the text but that dint work

string serverPath = Server.MapPath("~/") + @"Styles\Aspose.Total.lic";
Aspose.Words.License licenseWord = new Aspose.Words.License();
licenseWord.SetLicense(serverPath);
Aspose.Words.Document doc = new Aspose.Words.Document(@"C:\Users\divesh\Desktop\Desktop\sample - Copy.html");
Node[] nodes = doc.GetChildNodes(NodeType.Paragraph, true).ToArray();
// E.g this is the Paragraph to which comments will added
Aspose.Words.Paragraph paragraph = (Aspose.Words.Paragraph) nodes[12];
DocumentBuilder builder = new DocumentBuilder(doc);
// Create a Comment.
Aspose.Words.Comment comment = new Aspose.Words.Comment(doc);
// Insert some text into the comment.
Aspose.Words.Paragraph commentParagraph = new Aspose.Words.Paragraph(doc);
Run run = new Run(doc);
run.Text ="HeeloWorld";
// paragraph.AppendChild(run);
commentParagraph.AppendChild(run);
commentParagraph.AppendChild(new Run(doc, "This is comment!!!"));
comment.AppendChild(commentParagraph);
// Move to paragraph where comments will be added
builder.MoveTo(paragraph);
// Insert comment
builder.InsertNode(comment);
// Save output document.
doc.Save(@"C:\Users\divesh\Desktop\Desktop\AsposeOut.docx");

Regards,
Divesh Salian

Hi Divesh,

Thanks for your inquiry. Please share your sample input HTML to which you want to add comments. Please manually create your expected Word document using Microsoft Word and attach it here for our reference. We will investigate how you want your final Word output be generated like. Once I have documents, I will share code according to your requirements.

A post was split to a new topic: Insert HTML into document