HTML to docx missing images

Hi,

I tested two filenames.

  • Failed with Chinese characters
  • Succed without Chinese characters

SDK: Words for C++ 21.6
Test files:html-to-docx-chs.zip (996.2 KB)

Thanks

@kngstr

Could you please ZIP and attach your expected and problematic output documents here for our reference? We will then provide you more information about your query.

@tahir.manzoor

Sorry. I didn’t make clear.
baidu.html: This file is OK.
百度一下,你就知道.html: This file is missing images.

@kngstr

Please use the latest version of Aspose.Words.Cpp 21.7.0 and let us know how it goes on your side. If you still face problem, please ZIP and attach the problematic output document here for our testing. Thanks for your cooperation.

@tahir.manzoor

Sorry. My license is expired, can not test with 21.7

@kngstr

Please get 30 days temporary license from here:

Please note that Aspose.Words mimics the behavior of MS Word. If you convert your HTML to DOCX using MS Word, you will get the attached output. ms word.docx (139.8 KB) This is the reason we asked for problematic and expected output Word document.

Please share the requested documents. We will investigate the issue and log it in our issue tracking sytem.

@tahir.manzoor

What I’m talking about is the difference between english file name and chinese file name.
The content of the two sample files are the same.
The difference is just file name.

@kngstr

Please check the attached image. The path of images and .js files should be correct.
source path.png (53.5 KB)

We suggest you please import the document with Encoding property of LoadOptions. Please refer to the following article. You can use UTF8 encoding in load options to get the desired output. Hope this helps you.

@tahir.manzoor

These files were saved by Firefox browser.
We can’t ask customers to save their files again.

@kngstr

Please note that the images path should be correct. If it is incorrect the images will not be visible.

Please download the attached files from your first post and open them in browser. The images will not be shown. If your HTML has correct path for images and the images are not visible after HTML to DOCX conversion, you need to import the document with LoadOptions.Encoding property as suggested in my previous post.

@tahir.manzoor

OK.
PS: I tried with Chrome, it works fine.

@kngstr

Have you tried the load options as UTF8 encoding? Please let us know if your problem has been solved.

@tahir.manzoor

Yes. Same old thing.

@kngstr

Please make sure that you are using the latest version of Aspose.Words.Cpp 21.7.0. We have tested the scenario using the latest version of Aspose.Words.Cpp 21.7.0 with following code example and have not found the shared issue. Please check the attached output document. HtmltoDOCX.docx (59.5 KB)

auto loadOptions = MakeObject<LoadOptions>();
loadOptions->set_Encoding(System::Text::Encoding::get_UTF8());

auto doc = MakeObject<Document>(MyDir + u"百度一下,你就知道.html", loadOptions);
doc->Save(MyDir + u"HtmltoDOCX.docx");

We used following steps to test your case.

  • Rename the HTML file to ‘百度一下,你就知道.html’. The shared ZIP file has different name. Please check the attached image. ZIP file.jpg (21.0 KB)
  • Rename the folder name of HTML. Folder name.png (5.8 KB)
  • Open the HTML file in notepad and correct the path src=“xxxx” with 百度一下,你就知道_files.

To check the incorrect file name and src path, please download the attached ZIP file at your end and unzip it.

@tahir.manzoor

I got it. The SDK can not deal with this situation.
Thanks.