Japanese Text is overlapped after DOCX to HtmlFixed Conversion using Java

Dear support,

we are experiencing severe formatting issues when we convert Word documents with Japanese text to HTML.

We are using quite the straight forward code:

HtmlFixedSaveOptions options = new HtmlFixedSaveOptions();
options.setExportEmbeddedImages(true);
options.setExportEmbeddedCss(true);
options.setExportEmbeddedFonts(true);
options.setExportEmbeddedSvg(true);
options.setUpdateFields(false);
        
Document doc = new Document(inputStream);
doc.save(outputStream, options);

Here is a screenshot from a test document in Word (DOCX) and one from the resulting HTML:
word_document.png (63.8 KB)
html_result.jpg (48.8 KB)
It seems as if most paragraphs are displayed in a single line and thus starting to overlap itself.

It would be great if you could help us - maybe there are save option flags that are necessary to solve those issues?

Thanks a lot,
Stefan

@stefan.raubal

Please note that Aspose.Words requires TrueType fonts when rendering document to fixed-page formats (JPEG, PNG, PDF or XPS). You need to install fonts that are used in your document on the machine where you are converting documents to PDF. Please refer to the following articles:
Using TrueType Fonts
Manipulating and Substitution TrueType Fonts

If you still face problem, please ZIP and attach your input Word document here for testing. We will investigate the issue and provide you more information on it.

Hello @tahir.manzoor,

thanks for your reply - please mind that this topic is about HTML, not PDF or other fixed-page formats.

We have the NotoSansCJK fonts as OTF versions registered in the FontRepository (to be precise: we set the folder where those fonts are located as fonts folder), but the output HTML shows that “DejaVu Sans” is defined as font-family.

Here is an example Word document where the ouput is messed up:
JA-SC-TC dummy text.docx (25.4 KB)

And here is a screenshot of the output:
html_result_2.png (56.7 KB)

Kind regards,
Stefan

@stefan.raubal

Your are saving document to HtmlFixed and it is fixed page file format like PDF or XPS. The flow file formats are DOCX and DOC.

We have tested the scenario using the latest version of Aspose.Words for Java 21.9 with Windows fonts and have not found the shared issue. So, please try the latest version of Aspose.Words for Java 21.9. Please check the attached output document. html-fixed 21.9.zip (46.6 KB)

If you still face problem, please attach the following resources here for testing:

  • Please attach fonts that you are using.
  • Please create a simple Java application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

Thanks for the quick response!
Good to see that it works fine on your side.

In your output, “MS Gothic” is the font of choice. That’s a good hint!

Kind regards,
Stefan

Hello again @tahir.manzoor,

when providing “MS Gothic” in the fonts folder, the conversion works fine also for our version (21.4).
The problem is, that this font is not a free one - customers on Linux servers won’t have it available.

When you wrote in your first response that we need True Type fonts, does that mean that “Open Type” fonts like the NotoSans won’t work?

And, a general question on font handling for HTML conversion: are HTMLs created by Aspose based on WebFont kits and available independently on what fonts are installed on the client?

Thanks again for your support,
Stefan

@stefan.raubal

Aspose.Words does support open type fonts. Please ZIP and attach the fonts that you want to use along with code example that you are using. We will investigate the issue and provide you more information on it.

You need to install fonts that are used in your document for correct HTML output. If the fonts are not installed on the client machine, the output may not display correctly.

You can use HtmlFixedSaveOptions.ExportEmbeddedFonts property to embed the fonts into output HtmlFixed document. So, your client can view the correct output without having fonts on his machine.

Hello @tahir.manzoor,

please find attached the source example including the input test document.
The Regular version of the free NotoSans fonts (that should be put into the fonts folder!) can be downloaded from

  • https://noto-website-2.storage.googleapis.com/pkgs/notosanscjkjp-hinted.zip
  • https://noto-website-2.storage.googleapis.com/pkgs/notosanscjksc-hinted.zip
  • https://noto-website-2.storage.googleapis.com/pkgs/notosanscjktc-hinted.zip

(The files are big - the Regular version for each font has ~16MB.)

asian-chars-doc-to-html.zip (247.0 KB)

The NotoSansCJK* fonts normally work nicely when we convert Asian text e.g. to PDF using Aspose, so we wonder why they are not regarded for HTML conversion.

Thanks for caring,
Stefan

@stefan.raubal

We have tested the scenario using the latest version of Aspose.Words for Java 21.9 and have not found the shared issue. So, please use Aspose.Words for Java 21.9. We have attached the output document with this post for your kind reference.
html-fixed 21.9.zip (1.2 MB)