HTML conversion of file named with url-encoded characters breaks url to embedded images

Dear all,

I encounter an issue with files named like a%20file%20with%20url%20entities.docx. They shouldn’t be named this way, but we assume users will submit them.

If the document contains one embedded (JPEG) image, the HTML conversion produces

  • a%20file%20with%20url%20entities.htm
  • a%20file%20with%20url%20entities.001.jpeg

Everything is fine at this step. However, inside the HTML file, the url pointing the image won’t be protected:

<!-- cleaned HTML code -->
<p>
  <img src="a%20file%20with%20url%20entities.001.jpeg"/>
</p>

Microsoft Words protects the URL:

<!-- cleaned HTML code -->
<p>
  <img src="a%2520file%2520with%2520url%2520entities_fichiers/image002.jpg"/>
</p>

I’ve also attached a ZIP of the sample document and the conversions performed both by Aspose.Words and Microsoft.Word. Version used is Aspose 18.10.

Best regards,
Monir

filename_with_url_encoding_characters.zip (364.2 KB)

@monir.aittahar

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-18721 . You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.