Bug in Aspose.words when opening a HTML file

I am using Aspose words to load and print html files and have encountered a potential parsing bug.

If I load the html file using word, it works fine, but if I tried to load it using Aspose.Words I get the following error:

Aspose.Words.FileCorruptedException: The document appears to be corrupted and cannot be loaded.

I have narrowed down the html to this:

<IFRAME WIDTH=468 HEIGHT=60 NORESIZE SCROLLING=No FRAMEBORDER=0 MARGINHEIGHT=0 MARGINWIDTH=0
	SRC="http://something.com/?queryVariable|2.0|107|182639|1|1|blah">
	<script language=javascript src="http://soemthing.com/script.js;">
	</script>
</IFRAME>

Specifically the query string argument on this url: http://something.com/?queryVariable|2.0|107|182639|1|1|blah

It’s the pipes in the query string that cause the issue, Although it’s a strange query string, its valid, I have successfully parsed it.

The reason I think this is a bug is that I can open the html file successfully in MS Word but not using Aspose.

Thanks

Ed

@ejt66,

Thanks for your inquiry. Please ZIP and attach your HTML file here for testing. We will investigate the issue on our end and provide you more information.

Best regards,

Simply loading the document in Aspose.Words cause it to throw, here is the code I used to load it:

var loadOptions = new HtmlLoadOptions()
{
    LoadFormat = LoadFormat.Html
};
var document = new Document(file, loadOptions);

BadQueryString.zip (466 Bytes)

@ejt66,

Thanks for your inquiry.

While using the latest version of Aspose.Words i.e. 17.8, we managed to reproduce this issue on our end. We have logged this issue in our bug tracking system. The ID of this issue is WORDSNET-15779. Your thread has also been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Best regards,

@ejt66,

The issues you have found earlier (filed as WORDSNET-15779) have been fixed in this Aspose.Words for .NET 17.10 update and this Aspose.Words for Java 17.10 update.