Remove font-family style when extracting HTML from Word Document

Hi,

Is there any way to remove the font-family inline style when converting a Paragraph to HTML?

Here is the Java code:

private String getParagraph(Paragraph paragraph) {
    String content;
    try {
        content = paragraph.toString(SaveFormat.HTML);
    } catch (Exception ex) {
        fLogger.log(Level.WARNING, "Unable to parse paragraph. Message: ", ex.getMessage());
        content = "Import error placeholder";
    }
    return content;
}

The response is:
<p style="margin-top:0pt; margin-bottom:8pt; line-height:108%; font-size:11pt"><span **style="font-family:Calibri"**>If you find the text hard to read, you can increase the font size by tapping the symbol in the top right corner. By tapping it a second time, you can switch back to the default size. Try it out!</span></p>

I’m using Aspose.Words for Java 21.1.

Thanks,
Mihail

@mihail.manoli,

You can try specifying CSS Style Sheet Type and then export a Word Paragraph to HTML string:

Document doc = new Document("C:\\Temp\\Hello world.docx");

Paragraph para = doc.getFirstSection().getBody().getFirstParagraph();

HtmlSaveOptions htmlSaveOptions = new HtmlSaveOptions(SaveFormat.HTML);
htmlSaveOptions.setCssStyleSheetType(CssStyleSheetType.EMBEDDED);

System.out.println(para.toString(htmlSaveOptions));

Hi,

I would need more to remove just the font-family: font-name part from the HTML element and keep the other styles inline.

It’s possible to do such a thing?

Thank you!

@mihail.manoli,

Paragraph properties like the ones belonging to Paragraph.ParagraphFormat class get translated into CSS during exporting Paragraph to HTML string. This is to preserve the layout/formatting of Paragraphs in HTML. If you remove styling information, then it will become similar to plain text. Maybe you should post-process Aspose.Words generated HTML using your own code to get the desired HTML string. Can you please elaborate your inquiry further by providing complete details of your use case? This will help us to understand your scenario, and we will be in a better position to address your concerns accordingly. Please also provide your source Word document, Aspose.Words’ generated HTML file and your expected file showing the desired output here for our reference. You can create expected file manually.

@awais.hafeez
I’ve ended up post-processing the Aspose.Words generated HTML by removing the font-family inline style.

Thank you,
Mihail

@mihail.manoli,

It is great that you were able to find what you were looking for. Please let us know any time you may have any further queries in future.