Word content misplaced in the result of conversion to HTML

Hi there

I am using Aspose Word 17.6 to convert Word files to HTML format

Here is my code for test:

try {

	String path = "menu only.docx";
	String password = "";

	Document doc = null;

	Document pageDoc;
	LayoutCollector layoutCollector;
	DocumentPageSplitter splitter;
	ByteArrayOutputStream output = new ByteArrayOutputStream();
	HtmlSaveOptions saveOp = new HtmlSaveOptions();
	saveOp.setExportImagesAsBase64(true);
	saveOp.setExportTextInputFormFieldAsText(false);
	saveOp.setExportTocPageNumbers(true);
	saveOp.setExportPageSetup(true);
	saveOp.setExportDocumentProperties(true);
	saveOp.setExportRelativeFontSize(false);

	if (StringUtils.isEmpty(password)) {
		doc = new Document(path);
	} else {
		LoadOptions loadOps = new LoadOptions(password);
		doc = new Document(path, loadOps);
	}

	layoutCollector = new LayoutCollector(doc);
	doc.updatePageLayout();
	splitter = new DocumentPageSplitter(layoutCollector);

	String blockId = UUID.randomUUID().toString();

	File outputDir = new File(blockId + "/");
	if (!outputDir.exists())
		outputDir.mkdir();

	for (int page = 1; page <= doc.getPageCount(); page++) {
		Document onepageDoc = splitter.getDocumentOfPage(1);
		System.out.println("page:" + page);
		pageDoc = splitter.getDocumentOfPage(page);
		
		// 頁碼顯示
		int pagenumber = page;

		if (onepageDoc.getFirstSection().getPageSetup().getDifferentFirstPageHeaderFooter()) {
			// saveOp.setExportHeadersFootersMode(ExportHeadersFootersMode.FIRST_SECTION_HEADER_LAST_SECTION_FOOTER);
			pagenumber -= 1;
		}
		for (HeaderFooter headerFooter : pageDoc.getFirstSection().getHeadersFooters()) {
			if (headerFooter.isHeader() == false && headerFooter.getText().contains("PAGE   \\* MERGEFORMAT")) {
				headerFooter.remove();
				if (pagenumber > 0) {
					DocumentBuilder builder = new DocumentBuilder(pageDoc);
					builder.moveToHeaderFooter(HeaderFooterType.FOOTER_PRIMARY);
					builder.getParagraphFormat().setAlignment(ParagraphAlignment.CENTER);
					builder.write(String.valueOf(pagenumber));
				}
			}
		}

		output.reset();
		pageDoc.save(output, saveOp);

		IOUtils.write(output.toByteArray(), new FileOutputStream(blockId + "/" + page + ".html"));
	}
} catch (Exception e) {
	e.printStackTrace();
}

The content misplaced issue occurs on a Word file, within Linux environment
I have uploaded this Word file, result, and comparison image.
Please check the attachment and this issue, thank you~

menu only.docx.zip (24.2 KB)
result.zip (4.2 KB)
comparison.JPG (201.9 KB)

Craig

@craig.w.su,

Thanks for your inquiry. We have tested the scenario using latest version of Aspose.Words for Java 17.8 and have not found the shared issue. Please use Aspose.Words for Java 17.8. We have attached the output HTML with this post for your kind reference.
output1.zip (3.4 KB)

Hi
@tahir.manzoor

I used 17.8 and this issue still exists.
Please check the result in the attachment.

result_with17.8.zip (4.2 KB)

@craig.w.su,

Thanks for your inquiry. We have tested the scenario on Ubuntu operating system and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-15777. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

@craig.w.su,

Thanks for your patience. It is to inform you that the issue which you are facing is actually not a bug in Aspose.Words. So, we have closed this issue (WORDSNET-15777) as ‘Not a Bug’. You need to install fonts that are used in your document on the machine where you are converting document to HTML.

The issue occurs because Aspose.Words cannot find document fonts on the machine where you are converting DOCX to HTML. Tab stop filler spans generated by Aspose.Words only work correctly if fonts specified in the source document are available both to Aspose.Words during conversion and to browsers on client machines. Otherwise, tab stop positions and tab lengths calculated by Aspose.Words will not match actual values rendered on client machines, and text will get misaligned.

Hi
@tahir.manzoor

I used following code to get all names of fonts used in this document.

Document doc = new Document(“custom/input/docx/” + fileName);

for (int i = 0; i < doc.getFontInfos().getCount(); i++) {
System.out.println(doc.getFontInfos().get(i).getName());
}

And I installed the following fonts in Linux environment.

Wingdings
Times New Roman
Calibri
新細明體
Cambria

There is still the same problem in the result.
result.zip (4.2 KB)

Craig

@craig.w.su,

Thanks for your inquiry. Please implement IWarningCallback interface as shown below to get the missing font notifications. We have tested again the same scenario at Ubuntu operating system and have not found the shared issue. Please make sure that all fonts are installed on your machine and you are using latest version of Aspose.Words for Java 17.8.

FontSettings.getDefaultInstance().setFontsFolder("/home/Fonts", true);

Document doc = new Document(MyDir + "menu only.docx");

HtmlSaveOptions saveOp = new HtmlSaveOptions();
saveOp.setExportImagesAsBase64(true);
saveOp.setExportTextInputFormFieldAsText(false);
saveOp.setExportTocPageNumbers(true);
saveOp.setExportPageSetup(true);
saveOp.setExportDocumentProperties(true);
saveOp.setExportRelativeFontSize(false);

doc.setWarningCallback(new com.aspose.words.IWarningCallback() {
    @Override
    public void warning(com.aspose.words.WarningInfo warningInfo) {
        if(warningInfo.getWarningType() == WarningType.FONT_SUBSTITUTION)
            System.out.println(warningInfo.getDescription());
    }
    });

doc.save(MyDir + "out/output.html", saveOp);

Hi

@tahir.manzoor

I used this snippet of code, and installed missing fonts.
The result looks much better.
result.zip (4.1 KB)

In the result, some of the page numbers at the edge are not aligned.
Can this be improved?
comparison.JPG (318.9 KB)

Craig

@craig.w.su,

Thanks for your inquiry. We have logged this problem in our issue tracking system as WORDSNET-15800. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.