1.html文件转成word文件；2.给word文件添加封面和页脚。第2步加完封面页脚，部分图片排版异常

ZhonghaoSun · October 11, 2024, 9:10am

版本：23.8
编程语言：java

问题截图：

原html文件：
example.zip (1.4 KB)

测试代码：

byte[] htmlBytes = Files.readAllBytes(Paths.get("C:\\Users\\admin\\Desktop\\example.html"));
byte[] wordBytes = convert2Word(htmlBytes);

InputStream cover = new FileInputStream("E:\\cloud-10-08\\cloud-ofs\\src\\main\\resources\\file\\cover-template.docx");
InputStream footer = new FileInputStream("E:\\cloud-10-08\\cloud-ofs\\src\\main\\resources\\file\\footer-template.docx");
InputStream inputStream = addCoverAndFooter(new ByteArrayInputStream(wordBytes), cover, footer, VERSION_PARAM, "12.34");
byte[] coverWord = IOUtils.readStreamAsByteArray(inputStream);

第1部转换代码：


public static byte[] convert2Word(byte[] htmlBytes) throws Exception {
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    ByteArrayInputStream in = new ByteArrayInputStream(htmlBytes);
    convert2Word(in, out);
    return out.toByteArray();
}

public static void convert2Word(InputStream in, OutputStream out) throws Exception {
	Document doc = new Document(in);
	try {
		NodeCollection childNodes = doc.getChildNodes(NodeType.PARAGRAPH, true);
		if (childNodes != null) {
			for (Paragraph para : (Iterable<Paragraph>) childNodes) {
				if (para.getListFormat().isListItem()) {
					ListLevel listLevel = para.getListFormat().getListLevel();
					listLevel.getFont().setColor(Color.BLACK);
					if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2) {
						listLevel.setNumberStyle(NumberStyle.ARABIC);
						listLevel.setNumberFormat("\u0001.\u0001、");
					}
					if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3) {
						listLevel.setNumberStyle(NumberStyle.ARABIC);
						listLevel.setNumberFormat("\u0001.\u0001.\u0002、");
					}
					if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_4) {
						listLevel.setNumberStyle(NumberStyle.ARABIC);
						listLevel.setNumberFormat("\u0001.\u0001.\u0002.\u0003、");
						listLevel.getFont().setSize(10.5);
						listLevel.getFont().setBold(true);
						listLevel.getFont().getShading().clearFormatting();
					}
				}
			}
		}
	} catch (Exception e) {
		logger.warn("html转换word,处理样式异常", e);
	}
	doc.updateListLabels();
	//设置为A4大小
	SectionCollection sections = doc.getSections();
	for (Section section : sections) {
		section.getPageSetup().setPaperSize(PaperSize.A4);
	}

	//处理过大图片被截断问题
	PageSetup pageSetup = doc.getFirstSection().getPageSetup();
	double width = pageSetup.getPageWidth() - pageSetup.getRightMargin() - pageSetup.getLeftMargin();
	for (Shape shape : (Iterable<Shape>) doc.getChildNodes(NodeType.SHAPE, true)) {
		if (shape.getWidth() > width) {
			shape.setWrapType(WrapType.INLINE);
			shape.setWidth(width);
			//shape.getImageData().fitImageToShape();
		}
	}
	doc.save(out, SaveFormat.DOCX);
}

第2步转换代码：

public static InputStream addCoverAndFooter(InputStream destDoc, InputStream cover, InputStream footer, 
											 String paramKey, String paramValue) throws Exception {
	Document document = new Document(destDoc);
	ByteArrayOutputStream out = new ByteArrayOutputStream();
	Document coverDoc = new Document(cover);
	//填写版本号
	FindReplaceOptions options = new FindReplaceOptions();
	coverDoc.getRange().replace(paramKey, paramValue, options);
	//拼接封面
	coverDoc.appendDocument(document, ImportFormatMode.KEEP_SOURCE_FORMATTING);
	coverDoc.updateFields();
	coverDoc.save(out, SaveFormat.DOCX);
	//添加页脚
	Document srcDoc = new Document(footer);
	Document finalDoc = new Document(new ByteArrayInputStream(out.toByteArray()));
	ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
	addFooter(srcDoc, finalDoc, outputStream);

	return new ByteArrayInputStream(outputStream.toByteArray());
}


/**
 * 添加页脚
 * @param srcDoc	包含页脚的示例word文件
 * @param dstDoc	目标文件
 * @param output
 * @throws Exception
 */
private static void addFooter(Document srcDoc, Document dstDoc, OutputStream output) throws Exception {
	HeaderFooter srcFooter = srcDoc.getFirstSection().getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_PRIMARY);
	HeaderFooter srcFooterCopy = (HeaderFooter) dstDoc.importNode(srcFooter, true, ImportFormatMode.KEEP_SOURCE_FORMATTING);

	HeaderFooter dstFooter = dstDoc.getLastSection().getHeadersFooters().getByHeaderFooterType(HeaderFooterType.FOOTER_PRIMARY);
	if (dstFooter != null)
		dstFooter.remove();

	dstDoc.getLastSection().getHeadersFooters().add(srcFooterCopy);

	dstDoc.save(output, SaveFormat.DOCX);
}

vyacheslav.deryushev · October 11, 2024, 11:52am

@ZhonghaoSun 出现这种情况是因为转换后的 html 文件和 “cover-template.docx ”文件的 MSWord 版本不同。Aspose.Words 从 html 创建的 docx 文件兼容最新版本，而 “cover-template.docx ”兼容 2010 版本。某些兼容性设置会导致形状出现问题。在这种情况下，您可以使用两种方法：

优化 “封面模板.docx”，使其适用于 2013 及以上版本：

coverDoc.getCompatibilityOptions().optimizeFor(MsWordVersion.WORD_2013);

使用转换后的 html 中的 docx 作为基础文件，并通过 DocumentBuilder 添加 “cover-template.docx”：

public static InputStream addCoverAndFooter(InputStream destDoc, InputStream cover, InputStream footer,
                                            String paramKey, String paramValue) throws Exception {
    Document document = new Document(destDoc);
    DocumentBuilder builder = new DocumentBuilder(document);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    Document coverDoc = new Document(cover);
    //填写版本号
    FindReplaceOptions options = new FindReplaceOptions();
    coverDoc.getRange().replace(paramKey, paramValue, options);
    //拼接封面

    builder.moveToDocumentStart();
    builder.insertDocument(coverDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
    builder.insertBreak(BreakType.SECTION_BREAK_NEW_PAGE);
    document.save(out, SaveFormat.DOCX);

    //添加页脚
    Document srcDoc = new Document(footer);
    Document finalDoc = new Document(new ByteArrayInputStream(out.toByteArray()));
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    addFooter(srcDoc, finalDoc, outputStream);

    return new ByteArrayInputStream(outputStream.toByteArray());
}

ZhonghaoSun · October 12, 2024, 3:30am

好的，问题已经解决，多谢。