版本:23.8
编程语言:java
样式问题:
1.部分标题颜色变成蓝色
2.标题多出【一】
3.部分标题出现黑色矩形框
问题截图:
eteams_2024-05-21_18-53-12.jpg (75.0 KB)
原html文件:
文件转换样式测试.zip (3.1 KB)
转换后的wrod文件:
文件转换样式测试-convert.zip (12.0 KB)
转换代码:
public void htmlToWord(String in, String out) throws Exception {
Document html = new Document(in);
html.save(out, SaveFormat.DOCX);
}
是否可以通过修改转换配置来解决呢
@ZhonghaoSun 您提供的文件与 html 文件中设置的参数一致。对于列表,您有:
-aw-list-number-styles:'chineseCountingThousand decimal'
这导致 【一】
<span style="font-family:'Times New Roman'; font-size:0pt; font-weight:normal; background-color:#000000">1.1.1.3</span>
这是关于 - 部分标题出现黑色矩形框
不存在任何 Aspose.Words 问题。所有工作都符合预期。您需要在 html 文档中进行更改或使用以下代码:
Document doc = new Document("input.html");
for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
if (para.getListFormat().isListItem()) {
ListLevel listLevel = para.getListFormat().getListLevel();
listLevel.getFont().setColor(Color.BLACK);
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001、");
}
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001.\u0002、");
}
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_4) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001.\u0002.\u0003、");
listLevel.getFont().setSize(10.5);
listLevel.getFont().setBold(true);
listLevel.getFont().getShading().clearFormatting();
}
}
}
doc.updateListLabels();
doc.save("output.docx");
@vyacheslav.deryushev
我们这边排查到,这里用的html文件也是使用Aspose由word转换来的。
我整理下完整的调用步骤:
word原文件:
文件转换样式测试-word原文件.zip (30.6 KB)
1.用word文件转换为html
使用的代码:
String wordPath = "D:\\XXXXX\\文件转换样式测试.docx";
LoadOptions loadOptions = new LoadOptions();
Document doc = new Document(wordPath, loadOptions);
HtmlSaveOptions saveOptions = null;
saveOptions = new HtmlSaveOptions();
saveOptions.setExportImagesAsBase64(true);
saveOptions.setScaleImageToShapeSize(false);
doc.save( "文件转换样式测试.html", saveOptions);
转换后的html文件:
文件转换样式测试.zip (3.1 KB)
2.html再转换为word文件
使用的代码:
Document html = new Document(in);
html.save(out, SaveFormat.DOCX);
转换后的word文件:
文件转换样式测试-convert.zip (12.0 KB)
在经过以上两步的转换后,出现了【一】、黑色矩形框、标题变成蓝色
测了您提供的这段代码,是可以解决这三个问题。
但是我们的转换流程是1.word转换html,2.html再转换word
第1步中的原word文件没有 1.1.1.1~1.1.1.3标题
image.png (69.9 KB)
麻烦再结合完整的转换流程帮忙看下呢
@ZhonghaoSun 目前,在您的原始 docx 文件中,1.1.1.1~1.1.1.3 列表编号的字体大小为 0,背景颜色为 0。如图所示:
我创建了一个关于保留列表编号的问题。
我们已在内部问题跟踪系统中打开以下新票证,并将根据免费支持政策中提到的条款提供修复。
Issue ID(s): WORDSNET-26996
如果您需要优先支持,以及直接联系我们的付费支持管理团队,您可以获得付费支持服务。
请注意,在 DOCX->HTML->DOCX 往返过程中,由于 HTML 和 MS Word 文档对象模型的显著差异,并不总是能提供 100% 的保真度。
要获得正确的输出结果,请使用以下代码:
Document doc = new Document("orig.docx");
HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.setExportImagesAsBase64(true);
saveOptions.setScaleImageToShapeSize(false);
doc.save("output.html", saveOptions);
doc = new Document("output.html");
for (Paragraph para : (Iterable<Paragraph>) doc.getChildNodes(NodeType.PARAGRAPH, true)) {
if (para.getListFormat().isListItem()) {
ListLevel listLevel = para.getListFormat().getListLevel();
listLevel.getFont().setColor(Color.BLACK);
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001、");
}
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001.\u0002、");
}
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_4) {
listLevel.setTrailingCharacter(ListTrailingCharacter.NOTHING);
}
}
}
doc.updateListLabels();
doc.save("output.docx");
你好,html转word遇到一些报错,麻烦帮忙看下呢
问题1:
报错截图:
image.png (109.2 KB)
异常信息:
com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded.
at com.aspose.words.FileFormatUtil.zzV3(Unknown Source)
at com.aspose.words.Document.zzWsW(Unknown Source)
at com.aspose.words.Document.zzVSm(Unknown Source)
at com.aspose.words.Document.<init>(Unknown Source)
at com.aspose.words.Document.<init>(Unknown Source)
at com.aspose.words.Document.<init>(Unknown Source)
at net.qiyuesuo.common.word.Html2WordUtils.convert2Word(Html2WordUtils.java:53)
at net.qiyuesuo.common.word.Html2WordUtils.convert2Word(Html2WordUtils.java:41)
at net.qiyuesuo.common.word.Html2WordUtils.main(Html2WordUtils.java:99)
Caused by: java.lang.IllegalStateException: XMLStreamException: Illegal to have multiple roots (start tag in epilog?).
at [row,col {unknown-source}]: [1,68]
at com.aspose.words.internal.zzYD9.zzVSm(Unknown Source)
at com.aspose.words.internal.zzYD9.read(Unknown Source)
at com.aspose.words.zzZoj.zzXoK(Unknown Source)
at com.aspose.words.Document.zzWsW(Unknown Source)
... 7 more
Caused by: com.aspose.words.internal.zzYob: Illegal to have multiple roots (start tag in epilog?).
at [row,col {unknown-source}]: [1,68]
at com.aspose.words.internal.zzWYp.zzZc5(Unknown Source)
at com.aspose.words.internal.zzWYp.zzZ0Y(Unknown Source)
at com.aspose.words.internal.zzVZG.zzZSH(Unknown Source)
at com.aspose.words.internal.zzVZG.zzXQ4(Unknown Source)
at com.aspose.words.internal.zzVZG.zzWS7(Unknown Source)
at com.aspose.words.internal.zzVZG.zzZ52(Unknown Source)
at com.aspose.words.internal.zzVZG.next(Unknown Source)
at com.aspose.words.internal.zzYD9.read(Unknown Source)
... 9 more
Process finished with exit code 1
html源文件:
test.zip (478 字节)
问题2:
异常信息:
Exception in thread "main" com.aspose.words.FileCorruptedException: The document appears to be corrupted and cannot be loaded.
at com.aspose.words.FileFormatUtil.zzV3(Unknown Source)
at com.aspose.words.Document.zzWsW(Unknown Source)
at com.aspose.words.Document.zzVSm(Unknown Source)
at com.aspose.words.Document.<init>(Unknown Source)
at com.aspose.words.Document.<init>(Unknown Source)
at com.aspose.words.Document.<init>(Unknown Source)
at net.qiyuesuo.common.word.Html2WordUtils.convert2Word(Html2WordUtils.java:53)
at net.qiyuesuo.common.word.Html2WordUtils.convert2Word(Html2WordUtils.java:41)
at net.qiyuesuo.common.word.Html2WordUtils.main(Html2WordUtils.java:99)
Caused by: java.lang.IllegalStateException: XMLStreamException: Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'version'
at [row,col {unknown-source}]: [1,178]
at com.aspose.words.internal.zzYD9.zzVSm(Unknown Source)
at com.aspose.words.internal.zzYD9.read(Unknown Source)
at com.aspose.words.zzZoj.zzXoK(Unknown Source)
at com.aspose.words.Document.zzWsW(Unknown Source)
... 7 more
Caused by: com.aspose.words.internal.zzXPO: Unexpected character '=' (code 61); expected a semi-colon after the reference for entity 'version'
at [row,col {unknown-source}]: [1,178]
at com.aspose.words.internal.zzWYp.zzYk1(Unknown Source)
at com.aspose.words.internal.zzWYp.zzW3k(Unknown Source)
at com.aspose.words.internal.zzWYp.zzY6h(Unknown Source)
at com.aspose.words.internal.zzVZG.zzVSm(Unknown Source)
at com.aspose.words.internal.zzVZG.zzZdW(Unknown Source)
at com.aspose.words.internal.zzVZG.zzK4(Unknown Source)
at com.aspose.words.internal.zzVZG.zzZ52(Unknown Source)
at com.aspose.words.internal.zzVZG.next(Unknown Source)
at com.aspose.words.internal.zzYD9.read(Unknown Source)
... 9 more
Process finished with exit code 1
html源文件:
test2.zip (437 字节)
使用的代码均为:
Document doc = new Document(in);
try {
NodeCollection childNodes = doc.getChildNodes(NodeType.PARAGRAPH, true);
if (childNodes != null) {
for (Paragraph para : (Iterable<Paragraph>) childNodes) {
if (para.getListFormat().isListItem()) {
ListLevel listLevel = para.getListFormat().getListLevel();
listLevel.getFont().setColor(Color.BLACK);
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001、");
}
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001.\u0002、");
}
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_4) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001.\u0002.\u0003、");
listLevel.getFont().setSize(10.5);
listLevel.getFont().setBold(true);
listLevel.getFont().getShading().clearFormatting();
}
}
}
}
} catch (Exception e) {
logger.warn("html转换word,处理样式异常", e);
}
doc.updateListLabels();
doc.save(out, SaveFormat.DOCX);
@ZhonghaoSun 您需要将文件内容放入 <html>...content...</html>
标记中。否则,如果没有这些标记,Aspose.Words 文件格式检测器就无法检测到这是 html 文件。因此,对于 Aspose.Words 来说,如果没有这些标记,就只是一个扩展名不同的 txt 文件,它会阻止此类文件。
@ZhonghaoSun 在这种情况下,如果不想在文档中加入 <html>
标记,只有 builder.insertHtml
可以提供帮助。
好的,我还想问下这段代码能否指定转换后的docx的页面大小。
public static void convert2Word(InputStream in, OutputStream out) throws Exception {
Document doc = new Document(in);
try {
NodeCollection childNodes = doc.getChildNodes(NodeType.PARAGRAPH, true);
if (childNodes != null) {
for (Paragraph para : (Iterable<Paragraph>) childNodes) {
if (para.getListFormat().isListItem()) {
ListLevel listLevel = para.getListFormat().getListLevel();
listLevel.getFont().setColor(Color.BLACK);
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_2) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001、");
}
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_3) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001.\u0002、");
}
if (para.getParagraphFormat().getStyleIdentifier() == StyleIdentifier.HEADING_4) {
listLevel.setNumberStyle(NumberStyle.ARABIC);
listLevel.setNumberFormat("\u0001.\u0001.\u0002.\u0003、");
listLevel.getFont().setSize(10.5);
listLevel.getFont().setBold(true);
listLevel.getFont().getShading().clearFormatting();
}
}
}
}
} catch (Exception e) {
logger.warn("html转换word,处理样式异常", e);
}
doc.updateListLabels();
doc.save(out, SaveFormat.DOCX);
}
目前使用这段代码,从html转换为word,转换后的word文件,页面大小为 8 1/2x11
image.png (49.5 KB)
能否设置为A4大小呢
@ZhonghaoSun 请使用
doc.getFirstSection().getPageSetup().setPaperSize(PaperSize.A4);
如果有多个部分,则需要对所有部分都这样做。