Hi Team,
When i am trying to extract html from word document , some of my content is getting converted to image and becoming non editable due to that. Please check.
Code :
public static void main(String[] args) throws Exception {
com.aspose.words.License license = new com.aspose.words.License();
license.setLicense("/home/saurabharora/Downloads/Aspose.Total.Product.Family.lic");
Document document = new Document("/home/saurabharora/Downloads/document_test_image.docx");
HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.HTML);
opts.setExportPageSetup(true);
opts.setExportDocumentProperties(true);
opts.setExportListLabels(ExportListLabels.BY_HTML_TAGS);
opts.setExportImagesAsBase64(true);
opts.setExportFontsAsBase64(true);
opts.setExportHeadersFootersMode(ExportHeadersFootersMode.FIRST_PAGE_HEADER_FOOTER_PER_SECTION);
opts.setCssStyleSheetType(CssStyleSheetType.EMBEDDED);
opts.setExportTocPageNumbers(true);
opts.setExportShapesAsSvg(false);
opts.setExportRelativeFontSize(true);
// opts.setExportPageMargins(true);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
document.save(byteArrayOutputStream, opts);
String html = byteArrayOutputStream.toString(StandardCharsets.UTF_8);
System.out.println(html);
}
Document :
document_test_image.zip (55.2 KB)
Thanks