Java使用Aspose Words合并rtf文件并且带toc,原rtf页错页,多出了一页

您好:
问题:在使用com.aspose.words合并rtf文件的时候,有的时候合成的rtf会错页。比如原来2个rtf文件,file-one.rtf有2页,file-t.rtf有1页,合并完并且添加toc,应该是toc+2+1,共计4页才对。但是我的合并后是5页,也就是一个文件分裂了一个页出去

环境及配置参数:jdk11 + spring boot 2.3.3 + aspose-words-22.9-jdk17.jar

代码:

package org.imzdong;

import com.aspose.words.*;

import java.util.HashSet;
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.Set;

public class Main {


    private static final String LICENSE_RESOURCE_NAME = "license.xml";
    private static License ASPOSE_LICENSE = new License();

    static {
        try {
            ASPOSE_LICENSE.setLicense(Main.class.getClassLoader().getResourceAsStream(LICENSE_RESOURCE_NAME));
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] args) throws Exception{
        LinkedHashMap<String, String> map = new LinkedHashMap<>();
        map.put("table 1", "D:\\Downloads\\chrome\\test-data\\mm\\Table 14.1.1.1.rtf");
        map.put("table 2", "D:\\Downloads\\chrome\\test-data\\mm\\Table 14.2.4.1.rtf");
        String savePath = "D:\\Downloads\\chrome\\test-data\\mm\\merge-rtf.rtf";
        mergeRtfWithToc(map, savePath);
    }

    public static void mergeRtfWithToc(LinkedHashMap<String, String> pathAndTitles, String savePath) throws Exception {

        Document newDoc = new Document();
        DocumentBuilder builder = new DocumentBuilder(newDoc);
        //crate title 'Table of Contents'
        createTocTitle(builder);
        //prepare to make toc
        prepareToc(builder);
        //mark the bookmark, and then insert toc
        Set<String> tocSet = markTheBookmark(builder,pathAndTitles);
        //insert toc
        insertToc(builder, tocSet);
        //Proofreading page Orientation
        proofreadingPageOrientation(newDoc);

        newDoc.getStyles().getByStyleIdentifier(StyleIdentifier.TOC_1).getFont().setSize(12);
        newDoc.getStyles().getByStyleIdentifier(StyleIdentifier.TOC_1).getParagraphFormat().setLineSpacing(12);
        newDoc.updateFields();
        newDoc.save(savePath);
    }

    private static void proofreadingPageOrientation(Document newDoc) {
        //change orientation of last page
        SectionCollection sections = newDoc.getSections();
        for (Section section : sections) {
            if (section.getPageSetup().getOrientation() != 2) {
                section.getPageSetup().setOrientation(2);
            }
        }
        if (sections.getCount() > 0) {
            //remove the manual added page break
            ParagraphCollection paras = newDoc.getLastSection().getBody().getParagraphs();
            if (paras.getCount() > 0) {
                paras.removeAt(paras.getCount() - 1);
            }
        }
    }

    private static void insertToc(DocumentBuilder builder, Set<String> tocSet) throws Exception {
        for (String tocName : tocSet) {
            builder.moveToBookmark(tocName);
            insertTocEntry(builder, tocName, "1");
        }
    }

    private static Set<String> markTheBookmark(DocumentBuilder builder, LinkedHashMap<String, String> pathAndTitles) throws Exception {
        Set<String> tocSet = new HashSet<>();
        Set<Map.Entry<String, String>> entries = pathAndTitles.entrySet();
        for (Map.Entry<String, String> entry : entries) {
            String tocTitle = entry.getKey();
            String path = entry.getValue();
            Document temp = new Document(path);
            if (temp.getSections().getCount() == 0) {
                //empty rtf may occur the Null-point-exception when 'builder.insertDocument(temp, ImportFormatMode.KEEP_SOURCE_FORMATTING);'
                DocumentBuilder tempBuilder = new DocumentBuilder(temp);
                tempBuilder.writeln(" ");
            }
            builder.moveToDocumentEnd();
            builder.insertBreak(BreakType.PAGE_BREAK);
            builder.startBookmark(tocTitle);
            builder.insertDocument(temp, ImportFormatMode.KEEP_SOURCE_FORMATTING);
            builder.endBookmark(tocTitle);
            tocSet.add(tocTitle);
        }
        return tocSet;
    }

    private static void createTocTitle(DocumentBuilder builder) {
        Font font = builder.getFont();
        double oldSize = font.getSize();
        font.setSize(18);
        font.setNameAscii("Times New Roman");
        ParagraphFormat paragraphFormat = builder.getParagraphFormat();
        paragraphFormat.setLineSpacing(12);
        paragraphFormat.setAlignment(ParagraphAlignment.CENTER);
        builder.writeln("Table of Contents");
        font.setSize(oldSize);
        builder.writeln("");
    }

    private static void prepareToc(DocumentBuilder builder) throws Exception {
        FieldToc fieldToc = (FieldToc) builder.insertField(FieldType.FIELD_TOC, true);
        fieldToc.setEntryLevelRange("1-3");
        fieldToc.setInsertHyperlinks(true);
        builder.getParagraphFormat().setStyleIdentifier(StyleIdentifier.HEADING_3);
    }

    private static void insertTocEntry(final DocumentBuilder builder, final String text, final String entryLevel) throws Exception {
        builder.getFont().setNameAscii("Times New Roman");
        FieldTC fieldTc = (FieldTC) builder.insertField(FieldType.FIELD_TOC_ENTRY, true);
        fieldTc.setText(text);
        fieldTc.setEntryLevel(entryLevel);
    }

}

测试文件和测试结果文件都在附件。

需求是合并后不能改变源rtf文件,请问该怎么处理这种情况?test.zip (4.0 KB)

感谢!

合并的结果文件result.zip (10.7 KB)

@imzdong 要获得预期结果,您应该使用 Document.appendDocument 方法。 请看下面的代码:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);

// Insert TOC Title
builder.pushFont();
builder.getFont().setSize(18);
builder.getFont().setNameAscii("Times New Roman");
ParagraphFormat paragraphFormat = builder.getParagraphFormat();
paragraphFormat.setLineSpacing(12);
paragraphFormat.setAlignment(ParagraphAlignment.CENTER);
builder.writeln("Table of Contents");
builder.popFont();
// Insert TOC
FieldToc fieldToc = (FieldToc) builder.insertField(FieldType.FIELD_TOC, true);
fieldToc.setEntryLevelRange("1-3");
fieldToc.setInsertHyperlinks(true);

// Append documents.
String[] srcDocs = new String[] {"C:\\Temp\\Table 14.1.1.1.rtf", "C:\\Temp\\Table 14.2.4.1.rtf"};
for (String srcDocPath : srcDocs) {
    Document src = new Document(srcDocPath);
    DocumentBuilder srcBuilder = new DocumentBuilder(src);
    // Add TOC entry field at the beginning of the source document.
    FieldTC fieldTc = (FieldTC) srcBuilder.insertField(FieldType.FIELD_TOC_ENTRY, true);
    fieldTc.setText(srcDocPath);
    fieldTc.setEntryLevel("1");

    // Append document to the destination document.
    doc.appendDocument(src, ImportFormatMode.KEEP_SOURCE_FORMATTING);
}

doc.updateFields();
doc.save("C:\\Temp\\out.docx");

感谢。
我是用你这个代码,页数都对,但是toc页的宽度和其他的不同。请问该怎么解决?

再问下:

builder.moveToBookmark(tocName);
insertTocEntry(builder, tocName, "1");

这种移动到bookmark,在添加toc entry为啥会错页呢?

@imzdong 在 MS Word 文档中,每个部分都有其自己的页面设置。 请参阅我们的文档以了解有关部分的更多信息:
https://docs.aspose.com/words/java/working-with-sections/
当您从头开始创建文档时,该文档包含具有默认页面设置的部分。 您可以使用Section.PageSetup 属性更改页面设置。

请确保您使用正确的 DocumentBuilder 实例。 您很可能传递主文档的 DocumentBuilder,而不是像我的示例中那样传递源文档。

@alexey.noskov 您好,我用你这个代码合并237个文件,总大小是338m,但是合并的过程中使用的内存达到了8个多g,这样消耗内存太离谱了呀。我使用的是最新版本的 implementation(group: ‘com.aspose’, name: ‘aspose-words’, version: ‘23.7’, classifier: ‘jdk17’)

是内存就需要这么多,还是有其他参数可以控制内存消耗。这样我们都没法部署到生产环境呀。

@imzdong Aspose.Words 需要的内存是原始文档大小的几倍。 请参阅我们的倾析以了解有关 Aspose.Words 内存要求的更多信息:
https://docs.aspose.com/words/net/memory-requirements/

为了减少处理极大文档时的内存使用量,您可以尝试使用 LoadOptions.TempFolder 、 SaveOptions.TempFolderSaveOptions.MemoryOptimization 属性。