java.lang.StringIndexOutOfBoundsException converting HTML to PDF

This code throws a StringIndexOutOfBoundsException:

Document doc = new Document("input.html");
doc.save("PDFFromHtmlInJava.pdf", SaveFormat.PDF);

when input.html contains:

<html>
    <a href="/">test</a>
</html>

The bug occurs in version 21.12 and all versions above 21.1. If I go back to version 20.12 then it works.

@damonh,
Unfortunately, we were unable to reproduce the same issue on our side. Please check the following document, produced, using the 21.12 version of Aspose.Words for Java:

Could you please share your problem document? We will investigate the issue and provide you information on it.

Hi,

Thanks for the quick response.

Please try the attached example. I’m running with Java 11.0.12.

test.zip (1.8 KB)

Regards, Damon

java.lang.StringIndexOutOfBoundsException: String index out of range: 1
	at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:47)
	at java.base/java.lang.String.charAt(String.java:693)
	at com.aspose.words.internal.zzZm4.zzWoH(Unknown Source)
	at com.aspose.words.internal.zzZm4.zzXcd(Unknown Source)
	at com.aspose.words.internal.zzZm4.zzVZE(Unknown Source)
	at com.aspose.words.internal.zzZm4.<init>(Unknown Source)
	at com.aspose.words.internal.zz9p.zzZzo(Unknown Source)
	at com.aspose.words.internal.zz9p.zzZlq(Unknown Source)
	at com.aspose.words.internal.zz9p.zzZNw(Unknown Source)
	at com.aspose.words.internal.zzUu.<init>(Unknown Source)
	at com.aspose.words.internal.zz5q.zzZp8(Unknown Source)
	at com.aspose.words.internal.zz5q.<init>(Unknown Source)
	at com.aspose.words.internal.zzW3i.zzWKk(Unknown Source)
	at com.aspose.words.internal.zzW3i.zzYdw(Unknown Source)
	at com.aspose.words.internal.zzZkq.zzZp8(Unknown Source)
	at com.aspose.words.internal.zz3m.zzZp8(Unknown Source)
	at com.aspose.words.internal.zzYBj.zzZp8(Unknown Source)
	at com.aspose.words.internal.zz3m.zzZp8(Unknown Source)
	at com.aspose.words.internal.zzYBj.zzZp8(Unknown Source)
	at com.aspose.words.internal.zzXie.zzZp8(Unknown Source)
	at com.aspose.words.internal.zzYBj.zzZp8(Unknown Source)
	at com.aspose.words.internal.zz3m.zzZp8(Unknown Source)
	at com.aspose.words.internal.zzYBj.zzZp8(Unknown Source)
	at com.aspose.words.internal.zz3m.zzZp8(Unknown Source)
	at com.aspose.words.internal.zzYBj.zzZp8(Unknown Source)
	at com.aspose.words.internal.zz8P.zzZp8(Unknown Source)
	at com.aspose.words.zzZs7.zzXvE(Unknown Source)
	at com.aspose.words.zzZjP.zzZp8(Unknown Source)
	at com.aspose.words.zzZjP.zzZqK(Unknown Source)
	at com.aspose.words.zzZjP.zzZp8(Unknown Source)
	at com.aspose.words.zzVWM.zzZp8(Unknown Source)
	at com.aspose.words.Document.zzZqK(Unknown Source)
	at com.aspose.words.Document.zzZp8(Unknown Source)
	at com.aspose.words.Document.save(Unknown Source)
	at com.aspose.words.Document.save(Unknown Source)
	at AsposeTest.test(AsposeTest.java:12)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
	at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
	at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
	at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:252)
	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:141)
	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:112)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
	at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
	at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:115)
	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:75)

@damonh,
To avoid the issue, please use the absolute path of the document when you open it.
For example, please use Document doc = new Document("C:\\Temp\\test\\input.html") instead of Document doc = new Document("input.html")

Thanks Sergey. That works. However my real code is using the InputStream constructor of Document. The following example also fails from 21.1 onwards but works in 20.12. Is there a workaround for this aside from writing to a temporary file and using the other constructor with an absolute path? Also do you know when the fix for this will be available? Regards, Damon

String s = "<html>\n<a href=\"/\">test</a>\n</html>";
Document doc = new Document(new ByteArrayInputStream(s.getBytes(Charset.forName("UTF-8"))));
doc.save("PDFFromHtmlInJava.pdf", SaveFormat.PDF);

P.S. Please note that this problem only occurs with some documents. e.g. in my example changing href to “/a” makes it work, and so does removing the html tag. So it’s possible that using an absolute path will also fail for different files. So I’m reluctant to rely on that as the workaround unless you can confirm from looking at the source code what exactly the issue is. It looks to me (without having seen the code) like it may not always be reading the entire file/stream.

@damonh,
Thank you for reporting this problem to us. For the sake of correction, we have logged this problem in our issue tracking system as WORDSJAVA - 2679. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

1 Like

Hi Sergey,

Do you know if this will be fixed in the 22.02 release, and what date that will be released? Or if not then do you know which release the fix is scheduled for?

Thanks, Damon

@damonh,
We wish to inform you that WORDSJAVA - 2679 is currently pending for analysis and is in the queue. So, unfortunately, there is no ETA available for this issue at the moment. We will be sure to inform you via this forum thread as soon as there are any updates available on it.

Hi Sergey, do you know if this bug (WORDSJAVA - 2679) has been fixed yet? Thanks, Damon

@damonh Unfortunately, the issue is still not resolved. As a temporary workaround, you can specify base url to not empty string in document load options:

String s = "<html>\n<a href=\"/\">test</a>\n</html>";
LoadOptions options = new LoadOptions();
options.setBaseUri("notemptystring");
Document doc = new Document(new ByteArrayInputStream(s.getBytes(Charset.forName("UTF-8"))), options);
doc.save("C:\\Temp\\PDFFromHtmlInJava.pdf", SaveFormat.PDF);

The issues you have found earlier (filed as WORDSJAVA-2679) have been fixed in this Aspose.Words for Java 23.5 update.