Html to Pdf Issues

mihai.runcan · March 3, 2015, 10:06am

Hi, we need to convert some HTML files into PDFs and add headers and footers to every PDF page, but we seemed to have encountered a few problems.

Regarding Aspose convert from HTML to PDF there seems to be some issues. First, the Aspose version 9.9 was used, but even after the update to 10.1.0 the same issues appeared.

We used Aspose.Pdf.Document class with code example from http://www.aspose.com/docs/display/pdfnet/Convert+HTML+to+PDF+Format
to load a stream into the PDF, but the third line of the code below gives an error Value cannot be null. Parameter name: path1

HtmlLoadOptions options =  new HtmlLoadOptions(basePath);
MemoryStream stream1 = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(content.Content));

Document pdfDocument =  new Document(stream1, options);

My question in this case is how can we load an HTML string without getting the error. Can you provide help on why that error appears.

The second approach was to use Aspose.Pdf.Generator.Pdf class, but the problem is that there is no formatting of the text, all text appears same size and too big for a page.

Aspose.Pdf.Generator.Pdf pdf = new Pdf();

// specify the Character encoding for for HTML file
pdf.HtmlInfo.CharSet = "UTF-8";
pdf.HtmlInfo.CharsetApplyingLevelOfForce = HtmlInfo.CharsetApplyingForceLevel.UseWhenImpossibleDetectFromContent;

pdf.BindHTML(stream, basePath);

using (MemoryStream ms = new MemoryStream())
{
    pdf.Save(ms);
    return ms.ToArray();
}

My opinion is that CSS is not loading, but I’m not sure. Do you have some insights that might help us with this?

I’ve also read that there is no support for floating CSS this was used for footer when some text needs to be aligned to left and some to right. Is there another way to accomplish this with Aspose?

Could you help us with this?

Thanks in advance

codewarior · March 4, 2015, 5:34am

mihai.runcan:
We used Aspose.Pdf.Document class with code example from http://www.aspose.com/docs/display/pdfnet/Convert+HTML+to+PDF+Format to load a stream into the pdf, but the third line of the code below gives an error Value cannot be null. Parameter name: path1
     HtmlLoadOptions options = new HtmlLoadOptions(basePath);
         MemoryStream stream1 = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(content.Content));

     Document pdfDocument = new Document(stream1, options);
     My question in this case is how can we load an html string without getting the error. Can you provide help on why that error appears.

Hi Mihai,

Thanks for contacting support. I have tested the scenario using Aspose.Pdf for .NET with following code snippet and I am unable to notice any issue. HTML file is properly being converted to PDF format. Can you please share the resource HTML which you are using so that we can test the conversion in our environment.

[C#]

// Read the contents of HTML file into StreamReader object
StreamReader r = File.OpenText(@"c:/pdftest/Untitled1_Filled.html");

HtmlLoadOptions options = new HtmlLoadOptions("c:/pdftest/");

MemoryStream stream1 = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(r.ReadToEnd()));
Document pdfDocument = new Document(stream1, options);
Console.WriteLine(pdfDocument.Pages.Count);

mihai.runcan:
2. The second approach was to use Aspose.Pdf.Generator.Pdf class, but the problem is that there is no formatting of the text, all text appears same size and too big for a page. Aspose.Pdf.Generator.Pdf pdf = new Pdf();
// specify the Character encoding for for HTML file
pdf.HtmlInfo.CharSet = "UTF-8";
pdf.HtmlInfo.CharsetApplyingLevelOfForce = HtmlInfo.CharsetApplyingForceLevel.UseWhenImpossibleDetectFromContent;

pdf.BindHTML(stream, basePath);

using (MemoryStream ms = new MemoryStream())
{
    pdf.Save(ms);
    return ms.ToArray();
}

My opinion is that css is not loading, but I’m not sure. Do you have some insights that might help us with this?

Aspose.Pdf.Generator is legacy approach and we recommend using latest Document Object Model of Aspose.Pdf namespace. Also please note that all the enhancements and bug fixing is being performed in this model.

mihai.runcan:
3) I’ve also read that there is no support for floating css this was used for footer when some text needs to be aligned to left and some to right. Is there another way to accomplish this with Aspose?

Can you please share some resource files which can help us in replicating the issue in our environment. We are sorry for this inconvenience.

mihai.runcan · March 4, 2015, 11:05am

Thanks for your answer. I can’t share the html document. Anyway I’ve looked further into this issues I’ve got other issues

1) It seems that the path1 error comes from adding the following css to the page. in tags.

@font-face {
font-family: ‘entypo’;
src: url(‘entypo.eot’);
src: url(‘entypo.eot?#iefix’) format(‘embedded-opentype’),
url(‘entypo.woff’) format(‘woff’),
url(‘entypo.ttf’) format(‘truetype’),
url(‘entypo.svg#entypo’) format(‘svg’);
font-weight: normal;
font-style: normal;
}

Do I miss something in order to load those fonts? In any case you should give the developers a more specific error.

2)Another issue is that when I create the document I get a waiting time of 1 minute.

I got the content of the html into a string in the variable content.Content
The string contains ~ 4000 lines from which about 3000 are css in and also contains some embedded images for example
background: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAGXRFWHRTb2Z0d2FyZ to get an idea.
Is that time expected and if yes can you suggest some ways to reduce the time?

Below is the code to create the document from the string.
HtmlLoadOptions htmloptions = new HtmlLoadOptions(basePath1);
MemoryStream stream1 = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(content.Content));
Document doc = new Document(stream1, htmloptions);

3) Regarding footing the html sections looks like this:

Name of the application

Report created on: 05/03/2015 ($p of $P)

($p of $P) is current page of number of pages
Could you help me on how to put this in the footer of the pages?

Thanks,

codewarior · March 5, 2015, 12:26pm

mihai.runcan: Thanks for your answer. I can’t share the html document. Anyway I’ve looked further into this issue and found other issues:
It seems that the path1 error comes from adding the following CSS to the page in <head><style> tags.
@font-face {
    font-family: 'entypo';
    src: url('entypo.eot');
    src: url('entypo.eot?#iefix') format('embedded-opentype'),
         url('entypo.woff') format('woff'),
         url('entypo.ttf') format('truetype'),
         url('entypo.svg#entypo') format('svg');
    font-weight: normal;
    font-style: normal;
}
Do I miss something in order to load those fonts? In any case, you should give the developers a more specific error.

Hi Mihai,

Thanks for sharing the details.

I have tested the scenario and noticed the same problem. For the sake of correction, I have logged this issue as PDFNEWNET-38331 in our issue tracking system. We will further look into the details of this problem and keep you updated on the status of correction. Please be patient and spare us some time. We are sorry for this inconvenience.

C#

// Read the contents of HTML file into StreamReader object
StreamReader r = File.OpenText(@"c:/pdftest/51x.html");
HtmlLoadOptions options = new HtmlLoadOptions(@"c:/pdftest/");
options.PageInfo.Height = 600;
options.PageInfo.Width = 400;
MemoryStream stream1 = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(r.ReadToEnd()));
Document pdfDocument = new Document(stream1, options);
pdfDocument.Save(@"c:/pdftest/51x.pdf");

codewarior · March 5, 2015, 1:13pm

mihai.runcan:

Another issue is that when I create the document I get a waiting time of 1 minute.

I got the content of the HTML into a string in the variable content.Content.

The string contains ~4000 lines from which about 3000 are CSS in <head><style> and also contains some embedded images for example
`background: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAGXRFWHRTb2Z0d2FyZ to get an idea.

Is that time expected and if yes can you suggest some ways to reduce the time?

Below is the code to create the document from the string.
HtmlLoadOptions htmloptions = new HtmlLoadOptions(basePath1);
MemoryStream stream1 = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(content.Content));
Document doc = new Document(stream1, htmloptions);

Hi Mihai,

The time taken by API to perform HTML to PDF conversion depends upon the contents being transformed inside PDF document. In order for us to test the scenario, we request you to please share the resource/input HTML so that we can test the conversion in our environment.

mihai.runcan:

Regarding footing the HTML sections look like this:

Name of the application
Report created on: 05/03/2015 (1 of 5)

($P of $P) is current page of number of pages.

Could you help me on how to put this in the footer of the pages?

$Pand$Pare replaceable symbols but they only work when using Aspose.Pdf.Generator.Text object. However if you need to place Page number count in Footer of document, remove` tag containing page numbering information and please try using PageNumberStamp instance to accomplish this requirement.

[C#]

// Read the contents of HTML file into StreamReader object
StreamReader r = File.OpenText(@"c:/pdftest/51x.html");
HtmlLoadOptions options = new HtmlLoadOptions(@"c:/pdftest/");
MemoryStream stream1 = new MemoryStream(System.Text.Encoding.UTF8.GetBytes(r.ReadToEnd()));
Document pdfDocument = new Document(stream1, options);
Console.WriteLine(pdfDocument.Pages.Count);

foreach (Aspose.Pdf.Page current_page in pdfDocument.Pages)
{
    //create page number stamp
    PageNumberStamp pageNumberStamp = new PageNumberStamp();
    pageNumberStamp.Format = $"Report created on: 05/03/2015  ({current_page.Number} of {pdfDocument.Pages.Count})";
    pageNumberStamp.BottomMargin = 10;
    pageNumberStamp.HorizontalAlignment = Aspose.Pdf.HorizontalAlignment.Center;
    pageNumberStamp.StartingNumber = 1;

    //add stamp to particular page
    pdfDocument.Pages[current_page.Number].AddStamp(pageNumberStamp);
}
pdfDocument.Save("c:/pdftest/51x.pdf");

aspose.notifier · March 12, 2020, 8:55pm

The issues you have found earlier (filed as PDFNET-38331) have been fixed in Aspose.PDF for .NET 20.3.