Converting HTML to a PDF Document

Problem Statement

Convert, using Aspose PDF for Java, an HTML document that includes several external CSS resources, into a PDF document.

Background

We currently use PrinceXML to convert HTML and CSS into a PDF document. The process is relatively simple and works as follows:

  • Generate HTML
  • Feed said HTML into the PrinceXML engine
  • Get a PDF

Since PrinceXML support both CSS2.1 and CSS3, we are able to leverage the following at-rules to control how the document looks:

  • @page
  • @media

PrinceXML also implemented custom directives (a DSL) on-top the CSS engine, providing support controlling margins, pagination, headers, and footers in a declarative fashion. Support for standard CSS makes it simple to manage generating a PDF only using HTML and CSS.

Challenges with an Aspose

Aspose for PDF supports the workflow described above however, there are enough deficiencies that make it impossible to generate PDFs from HTML and CSS alone. These include:

  • No support for @page CSS at-rule
  • No support for @media CSS at-rule
  • No support for controlling margins and orientation using CSS alone (must be done after creating the document)

The lack of support for the first two rules make it impossible support headers and footers from CSS (for example, using @media print). Instead, one has to generate the content first and then inject the headers and footers manually on each page. Additionally, supporting page numbers can only be done after creating the document.

Additionally, multi-byte languages don’t render correctly in the final document.

Question(s)

  • Will support for @page and @media be added any time in the near future?
  • Will support for multi-byte languages be improved?

@stevechikwaya

Thank you for contacting support.

Would you please share source HTML file as well as the PDF files generated with 3rd party tool if that depicts your expected output, along with the PDF file generated by Aspose.PDF API. We will log a feature request for your requirements. However, we will be able to share any ETA for availability of these feature after the tickets will be logged and investigated.

Moreover, please create a separate topic for the problem with multi-byte languages while sharing respective data, we will investigate it in our environment to help you out.

Furthermore, you may try to convert the HTML file to PDF with Aspose.HTML for .NET API as CSS media queries screen and print are already supported in the API.

@Farhan.Raza,

Would you please share source HTML file as well as the PDF files generated with 3rd party tool if that depicts your expected output, along with the PDF file generated by Aspose.PDF API. We will log a feature request for your requirements. However, we will be able to share any ETA for availability of these feature after the tickets will be logged and investigated.

I’ll have to clean the output and send the details. I’ll do this in a few days.

Moreover, please create a separate topic for the problem with multi-byte languages while sharing respective data, we will investigate it in our environment to help you out.

Will do.

Furthermore, you may try to convert the HTML file to PDF with Aspose.HTML for .NET API as CSS media queries screen and print are already supported in the API.

I was excited to try this but unfortunately, this doesn’t work either. I took the example from the front page and ran it locally. It’s quite shocking that an example on the front page doesn’t work at all. Below is the simple java code I’m running:

import com.aspose.html.HTMLAnchorElement;
import com.aspose.html.HTMLDocument;
import com.aspose.html.collections.NodeList;
import com.aspose.html.dom.Node;

public class Html {
  public static void main(String[] args) {
    // create an instance of HTMLDocument & load HTML from URL
    HTMLDocument document = new HTMLDocument("https://www.aspose.com");
    // get all nodes of type anchor
    NodeList nodelist = document.getDocumentElement().querySelectorAll("a");
    // display anchor text & href values for all nodes
    for (Node node : nodelist) {
      HTMLAnchorElement anchor = (HTMLAnchorElement) node;
      System.out.println("Text: " + node.getTextContent() + " Href: " + anchor.getHref());
    }
  }
}

Here’s how I’m compiling the class:

$ javac -cp .:aspose-html-18.8.jar Html.java

Here’s how I’m running the class:

$ java -cp .:aspose-html-18.8.jar Html

Below is the result I get:

Exception in thread "main" class com.aspose.html.internal.p67.z4: Failed to parse base URL.
com.aspose.html.internal.p65.z1.m42(Unknown Source)
com.aspose.html.Url.<init>(Unknown Source)
com.aspose.html.HTMLDocument.<init>(Unknown Source)
Html.main(Html.java:9)
	at com.aspose.html.internal.p65.z1.m42(Unknown Source)
	at com.aspose.html.Url.<init>(Unknown Source)
	at com.aspose.html.HTMLDocument.<init>(Unknown Source)
	at Html.main(Html.java:9)

Note: this happens with every single HTMLDocument constructor. I’d expect that at the very least, a simple program displayed on the marketing page runs correctly. Also note, I am using the latest jar, 18.8.

@stevechikwaya

Please take all the time you need and get back to us as per your convenience. Moreover, we have tried the same code snippet and it is working fine in our environment. Please try executing it in a console application and then share your kind feedback with us.

If you still face the issue, then please share a narrowed down sample application so that we may try to reproduce and investigate it in our environment.

Moreover, we have tried the same code snippet and it is working fine in our environment. Please try executing it in a console application and then share your kind feedback with us.
If you still face the issue, then please share a narrowed down sample application so that we may try to reproduce and investigate it in our environment.

The code snippet I shared is the whole application. Below are all the details:

Environment:

$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

Application folder layout:

$ ls -la
./
../
Html.java
aspose-html-18.8.jar

Html.java:

import com.aspose.html.HTMLAnchorElement;
import com.aspose.html.HTMLDocument;
import com.aspose.html.collections.NodeList;
import com.aspose.html.dom.Node;

public class Html {
  public static void main(String[] args) {
    // create an instance of HTMLDocument & load HTML from URL
    HTMLDocument document = new HTMLDocument("https://www.aspose.com");
    // get all nodes of type anchor
    NodeList nodelist = document.getDocumentElement().querySelectorAll("a");
    // display anchor text & href values for all nodes
    for (Node node : nodelist) {
      HTMLAnchorElement anchor = (HTMLAnchorElement) node;
      System.out.println("Text: " + node.getTextContent() + " Href: " + anchor.getHref());
    }
  }
}

I cloned the repo (GitHub - aspose-html/Aspose.HTML-for-Java: Aspose.HTML for Java examples) and noticed that it worked fine on my machine however, once I updated the jar from version 18.5.1 to version 18.8, I got the dreaded error. Seems like everything was working in version 18.5 (doesn’t work in previous versions) and in subsequent versions, nothing works.

To replicate, simply clone the repo and change the version in the pom from 18.5.1 to 18.8.

@stevechikwaya

Thank you for elaborating it.

We have managed to reproduce it with JDK 1.8 and Eclipse whereas it did not occur with JDK 1.7 and NetBeans. Therefore, a ticket with ID HTMLJAVA-162 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

@Farhan.Raza,

Thanks for escalating the issue. One other thing to note, when using Aspose.HTML 18.5.1, it appears that fonts aren’t recognized. Below is an example file along with the error received.

Operating System:

$ uname -v
Darwin Kernel Version 16.7.0: Thu Jun 21 20:07:39 PDT 2018; root:xnu-3789.73.14~1/RELEASE_X86_64
(macOS Sierra v10.12.6)

Environment:

$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)

Application folder layout:

$ ls -la
./
../
Html.java
aspose-html-18.5.1.jar

Html.java:

import com.aspose.html.HTMLDocument;

public class Html {
  public static void main(String[] args) {
    String html = "<!DOCTYPE html><html lang=\"en-us\"><head><title>Test</title></head><body><span>Hello World</span></body></html>";
    HTMLDocument htmlDocument = new HTMLDocument(html, "");
  }
}

Compile and run the program:

$ javac -cp .:aspose-html-18.5.1.jar Html.java
$ java -cp .:aspose-html-18.5.1.jar Html

Resulting Error:

java.lang.AssertionError: Cannot read a name from the name table in a font.
	at com.aspose.html.internal.ms.System.Diagnostics.Debug.fail(Unknown Source)
	at com.aspose.html.internal.p30.z26.m298(Unknown Source)
	at com.aspose.html.internal.p30.z26.m663(Unknown Source)
	at com.aspose.html.internal.p30.z26.m2(Unknown Source)
	at com.aspose.html.internal.p30.z11.m1(Unknown Source)
	at com.aspose.html.internal.p30.z11.<init>(Unknown Source)
	at com.aspose.html.internal.p30.z8.m614(Unknown Source)
	at com.aspose.html.internal.p30.z8.m613(Unknown Source)
	at com.aspose.html.internal.p30.z8.m7(Unknown Source)
	at com.aspose.html.internal.p30.z10.m2(Unknown Source)
	at com.aspose.html.internal.p30.z10.m1(Unknown Source)
	at com.aspose.html.internal.p30.z10.m3(Unknown Source)
	at com.aspose.html.internal.p181.z2.m1(Unknown Source)
	at com.aspose.html.internal.p180.z10.m1(Unknown Source)
	at com.aspose.html.internal.p203.z6.m6(Unknown Source)
	at com.aspose.html.internal.p138.z33.<init>(Unknown Source)
	at com.aspose.html.internal.p138.z5.m1(Unknown Source)
	at com.aspose.html.internal.p138.z5.<init>(Unknown Source)
	at com.aspose.html.internal.p202.z21.m2(Unknown Source)
	at com.aspose.html.internal.p151.z1.<init>(Unknown Source)
	at com.aspose.html.internal.p136.z3.m2296(Unknown Source)
	at com.aspose.html.internal.p137.z1.m1(Unknown Source)
	at com.aspose.html.internal.p230.z1.m1(Unknown Source)
	at com.aspose.html.rendering.HtmlRenderer.render(Unknown Source)

I looked for, but didn’t see any resolutions to the above error. Is this something macOS related or a user error?

@stevechikwaya

We have worked with the code snippet shared by you but we have not been able to reproduce it in our environment. It is not occurring with older or latest version of the API. Since support is provided based on latest available version so please upgrade to latest version and share your kind feedback with us.

If you still face this problem then please share a narrowed down sample project reproducing this issue, for our reference.

@Farhan.Raza

We have worked with the code snippet shared by you but we have not been able to reproduce it in our environment. It is not occurring with older or latest version of the API. Since support is provided based on latest available version so please upgrade to latest version and share your kind feedback with us.
If you still face this problem then please share a narrowed down sample project reproducing this issue, for our reference.

I’m not able to validate using aspose-html 18.8 because of the previously mentioned error which prevents me from instantiating an HTMLDocument. Until that’s resolved, I won’t be able to test.

@stevechikwaya

We have recorded your comments under the same ticket ID and will let you know as soon as we have any significant update. We appreciate your patience and comprehension in this regard.