Document.ExtractPages

soaringai · April 21, 2024, 4:26pm

Document.ExtractPages 方法提取页面以后，再通过nodes.getPageCount();获取到的页面数量不正确，再次使用extractPages提取通过extractPages提取的页面会将整个页面提取使用的是21.6的jdk

alexey.noskov · April 21, 2024, 6:24pm

@soaringai 您能否在此处附上您的输入文档以进行测试？我们将检查该问题并为您提供更多信息。
另外，请尝试使用最新的 24.4 版本的 Aspose.Words，如果问题仍然存在，请告诉我们。

soaringai · April 22, 2024, 5:23am

我的word中有一个表格，我使用代码，在表格的最前面增加了一行以后，再去使用extractPages(0,1）就会发生问题，这样提取的不是一页的数据

alexey.noskov · April 22, 2024, 5:35am

@soaringai 请附上您的输入文档和代码，以便我们重现该问题？我们将检查该问题并为您提供更多信息。

soaringai · April 22, 2024, 5:43am

test.docx (13.6 KB)

代码:

@Test
public void testEx() throws Exception {
    String wordPath = "C:\\Users\\Administrator\\Desktop\\111\\test.docx";
    Document doc = new Document(wordPath);
    int pageCount = doc.getPageCount();
    Document nodes = doc.extractPages(1, pageCount - 1);
    addRow(nodes);
    Document nodes1 = nodes.extractPages(0, 1);
    nodes1.save("C:\\Users\\Administrator\\Desktop\\111\\123.docx");
}

private void addRow(Document doc) throws Exception {

    NodeCollection tables = doc.getChildNodes(NodeType.TABLE, true);
    for (Table table : (Iterable<Table>) tables) {
        Row row = new Row(doc);
        for (int i = 0; i < 7; i++) {
            addCell(doc, row);
        }
        table.insertBefore(row, table.getFirstRow());
    }
    doc.updatePageLayout();
}

private void addCell(Document doc, Row row) {
    Cell cell = new Cell(doc);
    cell.appendChild(new Paragraph(doc));
    cell.getFirstParagraph().appendChild(new Run(doc, "Title"));
    row.appendChild(cell);
}

alexey.noskov · April 22, 2024, 7:54am

@soaringai 您可以在调用 Document.getPageCount 后编辑文档，然后使用 Document.extractPages 方法。当 Document.getPageCount 调用 Aspose.Words 构建并缓存文档布局时。因此，调用此属性后所做的更改不会反映在缓存的文档布局中。请在使用 Document.extractPages 方法之前尝试调用 Document.updatePageLayout。

soaringai · April 22, 2024, 8:02am

我的代码中已经调用了Document.updatePageLayout()代码了，但是通过 Document nodes1 = nodes.extractPages(0, 1)截取的页面不是预想中的一页，而是两页

soaringai · April 22, 2024, 8:05am

可以运行我的代码进行测试

soaringai · April 22, 2024, 8:13am

使用 Document nodes1 = nodes.extractPages(0, 1);这段代码以后，打印了nodes1.getPageCount()，打印了一页，实际上我打开word查看提取的页面会存在两页

alexey.noskov · April 22, 2024, 1:33pm

@soaringai 不幸的是，我无法使用最新的 24.4 版本的 Aspose.Words 重现该问题。请尝试使用最新版本，如果问题仍然存在，请告诉我们。

另外，可能出现的问题是文档中使用的字体在处理文档的环境中不可用。如果 Aspose.Words 找不到文档中使用的字体，则字体被替换。由于字体规格的差异，这可能会导致布局差异，并导致页面检测不正确。您可以实现 IWarningCallback 以在执行字体替换时收到通知。
以下文章可能对您有用：
https://docs.aspose.com/words/java/specify-truetype-fonts-location/
https://docs.aspose.com/words/java/install-truetype-fonts-on-linux/

soaringai · April 22, 2024, 3:57pm

1center.docx (11.6 KB)

你能在使用这个文档试一下嘛，我尝试使用了最新版本，也确实存在相同的问题，在不使用office编辑该文档的情况下执行

@Test
    void cutPage() throws Exception {
        Document nodes = new Document("C:\\Users\\Administrator\\Desktop\\test\\1center.docx");
        nodes.updatePageLayout();
        int pageCount = nodes.getPageCount();
        System.out.println(pageCount);
        Document nodes1 = nodes.extractPages(0, 1);
        nodes1.save("C:\\Users\\Administrator\\Desktop\\test\\cut.docx");
    }

，我的文档实际上有两页，
cut.docx (11.6 KB)

这是我使用该代码截取的结果

soaringai · April 22, 2024, 4:00pm

并且我使用了IWarningCallback 没有接收到通知，说明我的字体没有被替换

soaringai · April 22, 2024, 4:05pm

Document nodes1 = nodes.extractPages(0, 1);这段代码的结果应该是将首页提取出来对吧？，但是我的cut.docx提取了两页出来，System.out.println(pageCount)输出了1

alexey.noskov · April 22, 2024, 6:12pm

@soaringai 正如我所看到的，您仍在使用旧的 21.6 版本的 Aspose.Words for Java。请尝试使用最新的24.4版本。这是我这边的输出结果：
out.docx (11.3 KB)
末尾有一个空页，但这是预期的，因为提取的内容以表格结尾，但在 MS Word 中，表格不能是文档的最后一个节点，末尾总是必须有一个段落。因此 Aspose.Words 添加一个空段落，从而生成一个空页面。

soaringai · April 25, 2024, 3:00am

我进行了进一步尝试，升级以后，还是存在相同的问题

soaringai · April 25, 2024, 3:54am

不好意思，我实际的表格要更复杂一点，我把实际的表格处理好给你
input.docx (22.6 KB)

这是我实际的数据表格，
以下是我的处理代码请执行testAsposeWord这个方法，我的本意是往表格的第一行增加表头

private static final List titles = Arrays.asList("序号\u0007");

    @Test
    public void testAsposeWord() throws Exception {
        // 2-last 3-last 4-last 5-last
        Document nodes = new Document("C:\\Users\\Administrator\\Desktop\\111\\input.docx");
        int pageCount = nodes.getPageCount();
        System.out.println("pageCount:" + pageCount);
        Document nDoc = null;
        Document nDoc2 = nodes;
        boolean flag = true;
        boolean tag = true;
        ArrayList<Row> titleRows = new ArrayList<>();
        int i = 0;
        while (flag) {
            if (tag) {
                Document firstPage = nDoc2.extractPages(0, 3);
                firstPage.save("C:\\Users\\Administrator\\Desktop\\111\\" + i + "first.docx");
                // 把每一页的最后一个标题存入
                Row pageLastTitleRow = getPageLastTitleRow(firstPage);
                if (pageLastTitleRow != null) {
                    titleRows.add(pageLastTitleRow);
                }
                nDoc = firstPage;
                tag = false;
                int pageCount1 = nDoc2.getPageCount();
                if (pageCount1 == 3) {
                    break;
                }
                Document sPageToEnd = nDoc2.extractPages(3, pageCount1 - 3);
                addNewRow(sPageToEnd, titleRows);
                setBordLine(sPageToEnd);
                nDoc2 = sPageToEnd;
                nDoc2.save("C:\\Users\\Administrator\\Desktop\\111\\nDoc.docx");
            } else {
                nDoc2.updatePageLayout();
                Document firstPage = nDoc2.extractPages(0, 1);
                System.out.println("pageSize:" + firstPage.getPageCount() + "i:" + i);
                firstPage.save("C:\\Users\\Administrator\\Desktop\\111\\" + i + "first.docx");
                // 把每一页的最后一个标题存入
                Row pageLastTitleRow = getPageLastTitleRow(firstPage);
                if (pageLastTitleRow != null) {
                    titleRows.add(pageLastTitleRow);
                }
                nDoc.appendDocument(firstPage, ImportFormatMode.KEEP_SOURCE_FORMATTING);
                int pageCount1 = nDoc2.getPageCount();
                if (pageCount1 == 1) {
                    break;
                }
                Document sPageToEnd = nDoc2.extractPages(1, pageCount1 - 1);
                addNewRow(sPageToEnd, titleRows);
                setBordLine(sPageToEnd);
                nDoc2 = sPageToEnd;
            }
            i++;
        }
        nDoc.save("C:\\Users\\Administrator\\Desktop\\111\\b.docx");
    }


     private void addNewRow(Document doc, ArrayList<Row> titleRows) throws Exception {
//        //获取文档中的第2个表格，下标从0开始
        NodeCollection tables = doc.getChildNodes(NodeType.TABLE, true);
//        // 循环处理每个表格
        for (Table table : (Iterable<Table>) tables) {
//         创建行
            if (!CollectionUtils.isEmpty(titleRows)) {
                Row row = titleRows.get(titleRows.size() - 1);
                Row newRow = new Row(doc);
                for (Cell cell : row.getCells()) {
                    newRow.getCells().add(CreateCell(doc, cell));
                }
                //插入到表格的第0行，下标0开始，从上到下
                table.getRows().insert(0, newRow);
            }
        }
    }

    private Row getPageLastTitleRow(Document doc) {
        NodeCollection tables = doc.getChildNodes(NodeType.TABLE, true);
        Row lastRow = null;
        // 循环处理每个表格
        for (Table table : (Iterable<Table>) tables) {
            RowCollection rows = table.getRows();
            for (Row row : rows) {
                CellCollection cells = row.getCells();
                for (Cell cell : cells) {
                    String text = cell.getText();
                    System.out.println("cellText:" + text);
                    if (titles.contains(text)) {
                        lastRow = row;
                    }
                }
            }
        }
        return lastRow;
    }

    @Test
    public void cutPage() throws Exception {
        Document nodes = new Document("C:\\Users\\Administrator\\Desktop\\111\\nDoc.docx");
        FontSubstitutionWarningCollector callback = new FontSubstitutionWarningCollector();
        nodes.setWarningCallback(callback);
//        nodes.updatePageLayout();
        int pageCount = nodes.getPageCount();
        System.out.println(pageCount);
        Document nodes1 = nodes.extractPages(0, 1);
        nodes1.save("C:\\Users\\Administrator\\Desktop\\111\\cut.docx");
    }

    private static class FontSubstitutionWarningCollector implements IWarningCallback {
        ///
        /// Called every time a warning occurs during loading/saving.
        ///
        public void warning(WarningInfo info) {
            if (info.getWarningType() == WarningType.FONT_SUBSTITUTION)
                FontSubstitutionWarnings.warning(info);
        }

        public WarningInfoCollection FontSubstitutionWarnings = new WarningInfoCollection();
    }

    /**
     * 创建列值
     *
     * @param doc Document对象
     * @return
     */
    public static Cell CreateCell(Document doc, Cell cell) throws Exception {
        Cell newCell = new Cell(doc);
        BorderCollection borders = cell.getCellFormat().getBorders();
        newCell.getCellFormat().getBorders().setLineStyle(borders.getLineStyle());
        newCell.getCellFormat().getBorders().setLineWidth(borders.getLineWidth());
        newCell.getCellFormat().getBorders().setDistanceFromText(borders.getDistanceFromText());
        newCell.getCellFormat().getBorders().setColor(borders.getColor());
        newCell.getCellFormat().getBorders().setShadow(borders.getShadow());
        Paragraph p = new Paragraph(doc);
        p.appendChild(new Run(doc, cell.getText()));
        newCell.appendChild(p);
        return newCell;
    }

    public static void setBordLine(Document doc) throws Exception {
        NodeCollection tables = doc.getChildNodes(NodeType.TABLE, true);
        for (Table table : (Iterable<Table>) tables) {
            // 设置表格的边框线为可见
            for (Row row : table.getRows()) {
                for (Cell cell : row.getCells()) {
                    cell.getCellFormat().getBorders().setLineStyle(LineStyle.SINGLE);
                    cell.getCellFormat().getBorders().setLineWidth(0.5);
                }
            }
        }
    }

alexey.noskov · April 25, 2024, 5:06am

@soaringai 在代码中，您仍然可以在更新页面布局后编辑文档，然后使用“extractPages”，因此此方法适用于文档布局的缓存版本，该版本不包括更新页面布局后所做的更改。请尝试编辑文档，然后使用文档布局，即使用“extractPages”、“getpageCount”方法。这两种方法都需要文档布局。

soaringai · April 25, 2024, 5:07am

在代码中，我已经使用了页面布局更新了

soaringai · April 26, 2024, 9:34am

你好，最近我重新打印了警告，警告打印了Fonts are not embedded,despite either ‘EmbedSystemFonts’ or ‘SaveSubsetFonts’ is set to ‘true’ due to ‘EmbedTrueTypeFonts’ option is set to 'false’这个

alexey.noskov · April 27, 2024, 4:58am

@soaringai

问题在这里得到解答：
https://forum.aspose.com/t/fonts-are-not-embedded-despite-either-embedsystemfonts-or-savesubsetfonts-is-set-to-true-due-to-embedtruetypefonts-option-is-set-to-false/282946/2

是的，但您只需执行一次，然后编辑文档。但布局已被缓存，您的更改不会影响缓存的文档布局。