Remove Space between Characters using Java | DOCX to HTML conversion

Hello,

We’re facing a situation when converting a docx file to HTML paragraph by paragraph - each letter from the output is surrounded by a separate <span> tag .

e.g.: <span style=\"font-family:Arial; font-size:11pt\">The</span><span style=\"font-family:Arial; font-size:11pt; letter-spacing:0.05pt\"> </span><span style=\"font-family:Arial; font-size:11pt\">ove</span><span style=\"font-family:Arial; font-size:11pt; letter-spacing:0.05pt\">r</span>...
In this way, the size of HTML output ends up having 2MB in size when transforming a 50 pages word document.

By analyzing the output, I’ve observed that the only difference is the letter-spacing property.

Is there any way to remove the letter-spacing from the HTML and then to combine the <span> tags with the same formatting?

I’ve tried paragraph.joinRunsWithSameFormatting(); but doesn’t help much in my case.

The code: tc-aspose-evaluation.zip (74.3 KB)

The name of the word document is CharacterSpacingIssue.docx.

Library version: Aspose Words for Java 21.1.

I’ve figured out how to remove and join spans for this specific use case:

for (Run run : paragraph.getRuns()) {
	run.getFont().setSpacing(0.0);
}

paragraph.joinRunsWithSameFormatting();

@mihail.manoli

It is nice to hear from you that your problem has been solved. Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.