Convert DOCX to HTML using Java | Remove Hyperlink Fields from Word Document | Improve Performance (2482)

Hello,

I have a docx file I’m trying to save as html using Aspose.Words for Java 19.9 library. Saving lasts forever.
Code:

HtmlSaveOptions htmlSaveOptions = new HtmlSaveOptions(SaveFormat.HTML);

Document document = new Document(new FileInputStream(new File("htmlPreviewFile.docx")));

File targetHtmlFile = null;

File imagesDir = null;

try {
	imagesDir = Files.createTempDirectory("createHtmlPreviewImages").toFile();

	targetHtmlFile = File.createTempFile("createHtmlPreview", "Html"); 

	htmlSaveOptions.setImagesFolder(imagesDir.toURI().getRawPath());

	try (OutputStream outputStream = new FileOutputStream(targetHtmlFile)) {
		document.save(outputStream, htmlSaveOptions);
	} catch (Exception e) {
		//
	}

} finally {
	if (targetHtmlFile != null && targetHtmlFile.exists()) {
		FileUtils.forceDelete(targetHtmlFile);
	}
}

htmlPreviewFile.zip (508.3 KB)

Can you please check this one?

Knd regards,
Zeljko

@zpredojevic,

A couple of hyperlink fields in DOCX Word document seem to be causing this problem:

Document doc = new Document("C:\\temp\\htmlPreviewFile\\htmlPreviewFile.docx");

for (Field field : doc.getRange().getFields())
    if (field.getType() == FieldType.FIELD_HYPERLINK)
        field.remove();

doc.save("C:\\temp\\htmlPreviewFile\\20.8.html");

For the sake of correction, we have logged this problem in our issue tracking system with ID WORDSNET-21051. We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

The issues you have found earlier (filed as WORDSNET-21051) have been fixed in this Aspose.Words for .NET 20.10 update and this Aspose.Words for Java 20.10 update.