Free Support Forum - aspose.com

Slow html -> PDF conversion

I’m using the aspose.pdf java, trial version, aspose.pdf-17.7.jar, Java 8, windows 10 x64. 16GB memory.

I’m triyng to figure out why PDF generation is taking a long time. The following code takes ~4 seconds to run, even though it’s an extremely simple HTML document. Is there something I’m missing or not setting up correctly, or is it really just that slow?

 public static void main(String[] args) {
    long startTime = System.nanoTime();
    String processedHTML = "My Test HTML<BR><BR>Couple lines<br><BR>Nothing insane<BR><BR>";
    try {
        InputStream htmlStream = new ByteArrayInputStream(processedHTML.getBytes(StandardCharsets.UTF_8.name()));
        Document doc = new Document(htmlStream, new HtmlLoadOptions());
        ByteArrayOutputStream output = new ByteArrayOutputStream();
        doc.save(output);
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }

    long endTime = System.nanoTime();

    long duration = (endTime - startTime);  //divide by 1000000 to get milliseconds.
    System.out.println(duration);

}

@bfinleyui,
We managed to replicate the problem of slow conversion in our environment. It has been logged under the ticket ID PDFJAVA-37085 in our issue tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates.

Is there someplace I can see that ticket? I’m having trouble convincing the bosses to drop $25k without more information about the slow conversion happening here…

@bfinleyui,
Normally, the performance issues are known as complex issues and take more time than usual defects. You can ask for an update in this thread and we will notify you once the ticket ID PDFJAVA-37085 is resolved.

We are seeing the same thing, has any thing been done to speed it up.

It seems to have got slower when we updated our libraries.

@nonoandy,
The linked ticket ID PDFJAVA-37085 is pending for the analysis and not resolved yet. We keep improving the performance because it is a continuous process. We will let you know once a significant progress has been made in this regard.

Was this issue ever resolved? I was not sure where I could find the status of the ticket ID: PDFJAVA-37085

@mwojno

Sadly, the issue is not yet resolved. It is logged in our internal issue management system and you cannot track it. However, we will notify you within this forum thread as soon as it is resolved. Please spare us some time.

We are sorry for the inconvenience.

@mwojno, @nonoandy, @bfinleyui

We have investigated the earlier logged ticket and found that this is not a bug. This is how JVM environment works. Notice please, that the first time when application run will be initialized fonts, caches, get system parameters, classes, static data, etc.

That is why real performance could be calculated only starting from the second-third iteration and with time performance will grow up a bit till it comes to stable values.

For example, we placed your code in a loop with 10 iterations.

for (int i = 1; i<10; i++) {
            System.out.println("Iteration: " + i);
            long startTime = System.currentTimeMillis();
            String processedHTML = "My Test HTMLCouple linesNothing insane";
            try {
                InputStream htmlStream = new ByteArrayInputStream(processedHTML.getBytes(StandardCharsets.UTF_8.name()));

                Document doc = new Document(htmlStream, new HtmlLoadOptions());
                long duration1 = (System.currentTimeMillis() - startTime);  //divide by 1000000 to get milliseconds.
                System.out.println("import ms: "+duration1);

                ByteArrayOutputStream output = new ByteArrayOutputStream();

                doc.save(output);
            } catch (UnsupportedEncodingException e) {
                e.printStackTrace();
            }

            //long endTime = System.nanoTime();
            final long endTime = System.currentTimeMillis();

            long duration = (endTime - startTime);
            System.out.println("save ms: "+duration);
        }

The first iteration took 4 seconds to import HTML, the second iteration took 0.5 sec. But each subsequent iteration took 80-120 ms to import HTML, which is 40 times faster than iteration number 1.

Java\jdk1.8.0_211
Is licensed = true

Iteration: 1
import ms: 4335
import + save ms: 4361
Iteration: 2
import ms: 543
import + save ms: 556
Iteration: 3
import ms: 117
import + save ms: 126
Iteration: 4
import ms: 131
import + save ms: 145
Iteration: 5
import ms: 146
import + save ms: 159
Iteration: 6
import ms: 137
import + save ms: 146
Iteration: 7
import ms: 190
import + save ms: 198
Iteration: 8
import ms: 125
import + save ms: 132
Iteration: 9
import ms: 78
import + save ms: 184

Done for version 20.8
Taken time is: 7401 ms.

Also if compare to the current version with version 17.7 we have improved performance and quality issues in many aspects of HTML->PDF conversion.