Poor numbered paragraph formatting when converting from word to html

Hi,

When converting a word document to html, if we have numbered paragraph in the word document, then we have a poor formatting of the first line of the paragraph.

The first line is formatted using non-breakable space, hence the alignment is not clean.
Is there a way for us to improve the formatting ? Maybe an option in order to format using CSS instead of non-breakable space ?

Here is my files:
mondoc.docx is my input word document
mondoc.html is my output html document
mondoc.zip (25.3 KB)

Here is the code used to perform the conversion:

	public static void main(final String... strings) {
		try {
			final License license = new License();
			license.setLicense(LICENSE);
		} catch (final Exception e) {
		}
		final String html;

		try {
			final LoadOptions lo = new LoadOptions();
			lo.setLoadFormat(LoadFormat.AUTO);
			lo.setEncoding(StandardCharsets.UTF_8);
			final Document doc = new Document(DOCUMENT, lo);

			doc.removeMacros();
			doc.removeSmartTags();
			doc.getChildNodes(NodeType.COMMENT, true).clear();
			doc.joinRunsWithSameFormatting();

			try (final NoBomByteArrayOutputStream bos = new NoBomByteArrayOutputStream()) {
				final HtmlSaveOptions saveOptions = new HtmlSaveOptions(SaveFormat.HTML);

				saveOptions.setExportListLabels(ExportListLabels.AS_INLINE_TEXT);
				saveOptions.setExportTocPageNumbers(false);
				saveOptions.setEncoding(StandardCharsets.UTF_8);
				saveOptions.setExportImagesAsBase64(true);
				doc.save(bos, saveOptions);
				html = bos.toUtf8String();
			}
		} catch (final Exception e) {
			throw new RuntimeException("invalid.corrupted");
		}

		try {
			Files.write(Paths.get(HTML), html.getBytes());
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

@PS-CL,

Please note that Aspose.Words writes   instead of   because   is not defined in XML. And by default Aspose.Words generates XHTML documents (i.e. HTML documents that comply with XML rules).

When using the following code with Aspose.Words fro Java 19.1, the output is improved (see awjava-19.1.zip (1.4 KB)). However, there is no vertical space between paragraphs after “MY SECOND TITLE” and “MY THIRD TITLE”. For the sake of correction, we have logged this problem in our issue tracking system. The ID of this issue is WORDSNET-18132. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

Thank you for your quick answer.
Indeed, the output is improved using Aspose 19.1 (we were using 18.11).

@PS-CL,

We will also keep you posted on any further updates on the linked issue.

@PS-CL,

Regarding WORDSNET-18132, we have completed the analysis of your issue. This is a bug in export of lists to native HTML lists.

In this case, the “main” list has nested lists. The text inside the li tag of the main list looks like a header for the nested lists. We use the li tag as “paragraph” that gets all css styles, but in this case, only “header” must get styles with paddings.

For now, padding is placed not after span with “heading” but after li tag. We can try processing this case.

In the meantime while you are waiting for a fix (there is no ETA available at the moment), please use “HtmlSaveOptions.ExportListLabels = ExportListLabels.AsInlineText” as a workaround. Hope, this helps.