HTML-to-PDF Line Wrapping Issue

jobrown5 · January 5, 2016, 9:38am

Hello!

We are seeing some odd behavior around line wrapped during HTML-to-PDF conversion. Here is the issue:

Say we have the following HTML:

of blank is approved. Said attorney’s fee shall

Note the space after “approved” and the lack of a space before “Said”–that’s important. If during the PDF conversion, a line break falls between the first letter of “approved” and the last letter of “Said”, the conversion will break either word in half. For instance, if the end of the line comes after the “e” in “approved”, we get this in the output PDF:

…of blank is approve

d. Said attorney’s fee shall…

If we change the HTML to the following (correct spacing) and rerun the conversion, it moves “approved.” to the next line appropriately:

of blank is approved. Said attorney’s fee shall

…of blank is

approved. Said attorney’s fee shall…

Unfortunately, we are using a rich text editor, so we don’t have much control over the HTML that is generated. Any help or guidance anyone can provide would be greatly appreciated. Thanks!

tilal.ahmad · January 6, 2016, 9:43am

Hi there,

We are sorry for the inconvenience caused. I am afraid I am not clear about the issue, I will appreciate it if you please share your input html and resultant PDF along with sample code. We will look into it and will guide you accordingly.

Best Regards,

jobrown5 · January 6, 2016, 10:24am

Thanks for following up! I’m attaching two examples. Order_preview_1.pdf is formatted correctly, with the proper line wrapping. Order_preview_2.pdf is incorrect. As you can see, the word “approved” is broken between two lines.

The only difference between the markup used to generate each is the spacing around the closing tag. In order_preview_2.html there is a space before the tag, and in order_preview_1.html there is not.

jobrown5 · January 7, 2016, 9:39am

Hi Tilal… any thoughts on this issue given the information that I provided yesterday? Thanks!

tilal.ahmad · January 7, 2016, 9:42am

Hi there,

I am sorry for the inconvenience. I am looking into it and will update you accordingly.

Best Regards,

tilal.ahmad · January 7, 2016, 10:20am

Hi there,

Thanks for your patience. I have tested the scenario using new DOM approaches, both HTML file to PDF conversion and adding HTML string into PDF document but I am afraid I am unable to notice the formatting issue but some special characters’ rendering issue.

As requested above, please share your sample code as well. So we will look into it and provide you information accordingly.

Sample Code:

// Instantiate Document object
Document doc1 = new Document();

// Add a page to pages collection of PDF file
Aspose.Pdf.Page page = doc1.Pages.Add();

// Instantiate HtmlFragment with HTML contents
HtmlFragment titel = new HtmlFragment(new StreamReader(myDir + "order_preview_2.html").ReadToEnd());

// Add HTML Fragment to paragraphs collection of page
page.Paragraphs.Add(titel);

// Save PDF file
doc1.Save(myDir + "order_preview_2_htmlfragment.pdf");

//////HTML file to PDF//////
System.IO.StreamReader htmlFile = new System.IO.StreamReader(myDir + "order_preview_2.html");
string html = htmlFile.ReadToEnd();
HtmlLoadOptions options = new HtmlLoadOptions();
Document doc = new Document(new MemoryStream(Encoding.UTF8.GetBytes(html)), options);
doc.Save(myDir + "order_preview_2.pdf");

We are sorry for the inconvenience caused.

Best Regards,

jobrown5 · January 7, 2016, 11:06am

We were using a different method to convert the HTML to PDF. We were loading the HTML into a Text object and that into Sections in a Pdf object. I guess that was incorrect because I changed it to the method used in the article you referenced and it worked! Everything seems to be wrapping properly now; however, now the images fail to render in the output PDF. We were previously using the following line of code to render images in the PDF:

pdf.HtmlInfo.ImgUrl = String.Format("{0}\", HostingEnvironment.MapPath("~/Content/Images/PDF/Orders"));

How do we do this same thing using the above approach? Here’s the code I have now…

HtmlLoadOptions htmlLoadOptions = new HtmlLoadOptions();

htmlLoadOptions.PageInfo.Margin = new Aspose.Pdf.MarginInfo(60, 60, 60, 60);

htmlLoadOptions.PageInfo.DefaultTextState = new Aspose.Pdf.Text.TextState(12);

Document doc = new Document(new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)), htmlLoadOptions);

jobrown5 · January 7, 2016, 11:29am

Nevermind, I figured out that the constructor takes the base path to the images. Here’s my code now…

HtmlLoadOptions htmlLoadOptions = new HtmlLoadOptions(HostingEnvironment.MapPath("~/Content/Images/PDF/Orders"));

htmlLoadOptions.PageInfo.Margin = new Aspose.Pdf.MarginInfo(60, 60, 60, 60);

htmlLoadOptions.PageInfo.DefaultTextState = new Aspose.Pdf.Text.TextState(12);

using (var htmlStream = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(html)))

{

Document doc = new Document(htmlStream, htmlLoadOptions);

using (var ms = new MemoryStream())

{

doc.Save(ms);

return ms.ToArray();

}

I think I’m all set. Thanks so much for your help!!

codewarior · January 7, 2016, 12:23pm

Hi Jonathan,

Thanks for the acknowledgement. We are glad to hear that your problem is resolved. Please continue using our API’s and in the event of any further query, please feel free to contact.