Unexpected page brakes in PDF after Mail Merge

kerner · October 9, 2024, 9:08am

I am running Mail Merge on a docx template.

The resulting docx is formatted as expected. When saving as PDF instead, there is a page break/ extra space after the first table.
The following warning is logged:

MINOR_FORMATTING_LOSS: A table not supported by the new table layout logic is encountered. Older logic that has known issues is applied. At Table 1, Section 1

How can I fix that? How to find “Table 1” and “Section 1”, assuming there would be more than one table in the document.

Attached is the template.

basisvertrag einzelpersonen template 2.docx (61.7 KB)

alexey.noskov · October 9, 2024, 11:52am

@kerner Could you please save your output as DOCX and as PDF and attach the documents here? Is the problem also observed in DOCX output? If so, please provide sample data and code that will allow us to generate the problematic output.

kerner · October 9, 2024, 12:46pm

I use the following code to generate the document:

private void insertMultiTables(Document doc, Map<String, List<Map<String, String>>> tables) throws Exception {

        doc.getMailMerge().setCleanupOptions(MailMergeCleanupOptions.REMOVE_UNUSED_REGIONS);
        long overload = getTableOverload(doc);
        long tableCount = getTableCount(doc);
        long mailMergeRunCount = 0;

        for (Map.Entry<String, List<Map<String, String>>> tableEntry : tables.entrySet()) {

            String tableName = tableEntry.getKey();
            List<Map<String, String>> tableValues = tableEntry.getValue();

            Collection<String> columns = getTableColumns(tableValues);
            DataTable dataTable = new DataTable(tableName);
            columns.forEach(e -> dataTable.getColumns().add(e));

            for (Map<String, String> listEntry : tableValues) {
                dataTable.getRows().add(listEntry.values().toArray());
            }

            for (int i = 0; i < overload; i++) {
                log.debug("Running Mail Merge for table '{}'", tableName);
                doc.getMailMerge().executeWithRegions(dataTable);
                mailMergeRunCount++;
            }

        }

        // trigger cleanup
        // TODO: could be optimized to run only as often as 'totalTableCount' - 'totalMailMergeRunCount'.
        for (int i = 0; i < tableCount; i++) {
            doc.getMailMerge().executeWithRegions(new DataTable());
            mailMergeRunCount++;
        }

        doc.getMailMerge().deleteFields();
    }

    private long getTableCount(Document doc) throws Exception {
        String[] fieldNames = doc.getMailMerge().getFieldNames();
        return Arrays.stream(fieldNames)
                .filter(e -> e.toUpperCase().startsWith("TABLESTART"))
                .count();
    }

    /**
     * Returns the number of how often the same table name is present in the document.
     * If a table is present multiple times (multiple tables with the same name),
     * MailMerge needs to run multiple times with the same data table.
     *
     * @param doc the document to take a look at
     * @return the number of a table name occurrences
     * @throws Exception in case of error
     */
    private long getTableOverload(Document doc) throws Exception {
        String[] fieldNames = doc.getMailMerge().getFieldNames();
        Map<String, Long> freqMap = Arrays.stream(fieldNames)
                .filter(e -> e.toUpperCase().startsWith("TABLE"))
                .collect(
                        Collectors.groupingBy( Function.identity(), Collectors.counting()
                        )
                );
        long max = freqMap.values().stream().mapToLong(e -> e).max().orElse(1);
        int wait = 0;
        return max;
    }

    private Collection<String> getTableColumns(List<Map<String, String>> tableValues) {
        Collection<String> result = new LinkedHashSet<>();
        for(Map<String, String> entry : tableValues){
            result.addAll(entry.keySet());
        }
        return result;
    }

and

public byte[] writeFile(Document document, String fileExtension) throws IOException {

        try (ByteArrayOutputStream outputStream = new ByteArrayOutputStream()) {
            SaveOptions saveOptions = SaveOptions.createSaveOptions(SaveFormat.fromName(fileExtension.toUpperCase()));
            if(saveOptions instanceof PdfSaveOptions pdfSaveOptions){
                pdfSaveOptions.setPreserveFormFields(true);
                pdfSaveOptions.setUpdateFields(true);
            }
            document.getFieldOptions().setFieldUpdateCultureSource(FieldUpdateCultureSource.FIELD_CODE);
            document.save(outputStream, saveOptions);

            return outputStream.toByteArray();
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

The resulting PDF file, generated from the template:

file 2.pdf (106.1 KB)

The docx generated from the template:

file 2.docx (71.8 KB)

And the PDF generated via the export from MS Word:

file 2 from word.pdf (63.2 KB)

alexey.noskov · October 9, 2024, 1:01pm

@kerner
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-27460

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

kerner · October 9, 2024, 1:41pm

Thank you.
Im logged in to the Paid Support Helpdesk, but cannot find the Ticket. Can you help me to get priority support on this?

https://helpdesk.aspose.com/tickets.php?a=search&keywords=WORDSNET-27460&topic_id=

alexey.noskov · October 9, 2024, 1:46pm

@kerner You should post the issues in the paid support helpdesk. Then my colleagues from paid support will rise priority of the defects in our defect tracking system.

aspose.notifier · April 16, 2025, 4:06pm

The issues you have found earlier (filed as WORDSNET-27460) have been fixed in this Aspose.Words for Java 25.4 update.