We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

HTML to PDF Conversion with Javascript using C# hangs .NET host process indefinetly

Hello,
I was trying to convert some HTML to PDF but the whole process got stuck forever during the document load phase (on the constructor).

This is the code to reproduce the issue with the 21.12.0.0 nuget package (.NET 4.8):

using (var htmlStream = new System.IO.MemoryStream(System.Text.Encoding.UTF8.GetBytes(@"
	<html>
	<head></head>
	<body>
		<div id=""jstext""></div>
		<script type=""text/javascript"">
			document.getElementById('jstext').innerHTML = '>JS TEXT<';
		</script>
	</body>
	</html>
")))
{
	new Aspose.Pdf.Document(htmlStream, new Aspose.Pdf.HtmlLoadOptions());
}

If I do encoding on the string like this '&gt;JS TEXT&lt;' it does work correctly.

To work around the issue I had to wrap the call in a Task.Factory.StartNew and impose a timeout on the wait call.

Since I don’t really care about javascript because I’m processing untrusted input, how can I disable/remove/skip loading of all scripts?

I didn’t find any relevant option in the HtmlLoadOptions class a part from the ResourceLoadingStrategy which is good to prevent network calls.
Something like the Aspose.Html.Sandbox would be useful.

Thanks,
SM

@usernameisalreadyinuse

You can use PdfJavaScriptStripper.Strip method to remove Java Script from the document. However, it also throws exception for your case.

We have logged this problem in our issue tracking system as PDFNET-51140. You will be notified via this forum thread once this issue is resolved. We apologize for your inconvenience.