We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

DocumentBuilder insert document with KEEP_SOURCE_STYLES not keeping styles

Hello.

We are currently evaluating Aspose Total in order to directly convert Excel and Powerpoint directly to Word and then merge/append the resulting documents with other Word files so that we can a single word document for further modification. So our goal is to read a given list of files of different types (Word, Excel and Powerpoint) and end up with a single word document with all their content.

To do this operation we create a Document and then a DocumentBuilder:

Document wordDocument = new Document();
DocumentBuilder wordBuilder = new DocumentBuilder(wordDocument);

Then for each file we call the respective method to load, convert to Word if necessary and save the result to a temporary ByteArrayOutputStream. We then use the method insertDocument of the wordBuilder to “insert” the content in the stream:

// In case of excel file
var workbook = new Workbook(new ByteArrayInputStream(file.getBytes()));
// For each sheet set up the page orientation and the cells size
for (int i = 0; i < workbook.getWorksheets().getCount(); i++)
{
    workbook.getWorksheets().get(i).getPageSetup().setOrientation(PageOrientationType.LANDSCAPE);
    workbook.getWorksheets().get(i).getPageSetup().setFitToPagesWide(1);
    workbook.getWorksheets().get(i).getPageSetup().setFitToPagesTall(0);
}
// Create the output stream to save the Excel conversion to word
ByteArrayOutputStream conversionStream = new ByteArrayOutputStream();
workbook.save(conversionStream, com.aspose.cells.SaveFormat.DOCX);

var docAttachment = new Document(new ByteArrayInputStream(conversionStream.toByteArray()));
List<Table> tables = Arrays.stream(docAttachment.getChildNodes(NodeType.TABLE, true).toArray())
        .filter(Table.class::isInstance)
                .map(Table.class::cast)
                .collect(Collectors.toList());
 
        for (Table table: tables) {
            table.autoFit(AutoFitBehavior.AUTO_FIT_TO_WINDOW);
        }

wordBuilder.insertDocument(docAttachment, ImportFormatMode.KEEP_SOURCE_FORMATTING);
conversionStream.close();

For PowerPoint:

Presentation pres = new Presentation(new ByteArrayInputStream(file.getBytes()));
wordBuilder.getPageSetup().setOrientation(Orientation.LANDSCAPE);

try
{
    for (ISlide slide : pres.getSlides())
    {
        // generates and inserts slide image
        BufferedImage bitmap = slide.getThumbnail(1, 1);
        wordBuilder.insertImage(bitmap);

        if (slide.getSlideNumber() != pres.getSlides().size() - 1)
        {
            wordBuilder.insertBreak(BreakType.PAGE_BREAK);
        }
    }
}
finally
{
    if (pres != null) pres.dispose();
}

For Word:

var docAttachment = new Document(new ByteArrayInputStream(file.getBytes()));
wordBuilder.insertDocument(docAttachment, ImportFormatMode.KEEP_SOURCE_FORMATTING);

Once all the files are processed we save the Document to a ByteArrayOutputStream and return it to be then downloaded:

ByteArrayOutputStream wordDocumentOutStream = new ByteArrayOutputStream();
wordDocument.save(wordDocumentOutStream, SaveFormat.DOCX);
wordDocumentOutStream.close();
 
return wordDocumentOutStream;

As the code above demonstrates, for Excel and Powerpoint we want to end up with the corresponding pages in a landscape layout, also we expect to preserve the styles of each document independently, without affecting the others.
Unfortunately the results are not consistent and vary with the order in which each type of file is processed. We share here the sample documents we used SampleFiles.zip (831.5 KB)
(one file for each type) and the results achieved:

  • Word first, then PPT then XLS: The content of the Word file is broken with some empty pages in between the original pages and with the images overflowing. Also the orientation should be set in PORTRAIT mode and after the second page it is in LANDSCAPE mode. For the PPT and XLS the orientation is correct, but they are now also displaying the footer from the merged Word file, which is not what we desire. Here is the final word: _MergedDocuments_WordPPTXLS.docx (1.1 MB)
    *XLS first, then PPT then Word: In this case all pages display the footer from the merged Word (although the word is inserted correctly this time), however neither XLS or PPT have orientation in LANDSCAPE._MergedDocuments_XLSPPTWord.docx (1.1 MB)
  • Other orders produce results similar to the previous points above.

Could you please explain why this is happening?

Alternatively, we tried creating a document for each conversion and then use wordDocument.append(convertedDoc) instead of the wordBuilder.insertDocument. With this approach the results were more consistent, although the footer from the original Word is still set to all pages and there is always an empty page between documents.

Thank you for your help.

@m1tnick In your scenario you should definitely use Document.appendDocument method instead of DocumentBuilder.insertDocument. In Ms Word document page setup (page size, orientation, headers/footers) is defined per section. When you use DocumentBuilder.insertDocument content of the source document is inserted into the destination document into the current section, so page setup is lost.
When you use Document.appendDocument method sections from the source document are added into the destination document keeping original sections page setup. By default if following sections in MS Word does not have headers/footers, they are inherited from the previous section. That is why you see headers/footers from your MS Word document in the appended documents. You can disable this using HeaderFooterCollection.linkToPrevious method. Your code should look like this:

// Open destination document
Document dst = new Document("dst.docx");
// Load source document. 
Document src = new Document("src.docx");
        
// Disable headers/footers inheriting.
src.getFirstSection().getHeadersFooters().linkToPrevious(false);
        
dst.appendDocument(src, ImportFormatMode.KEEP_SOURCE_FORMATTING);
1 Like

Hello Alexey, thank you again for your quick reply. Your solution worked and I can now control the footers.

Additionally, I replaced the use of DocumentBuilder.insertDocument with the Document.appendDocument, but I still have one issue which is some unwanted empty pages in the beginning or in-between the documents.

Using the source samples from the original post I get this word: _MergedDocuments.docx (1.0 MB)
which has an empty page in the beginning and also after the excel.

Also, in another very important scenario that we need to do, I have to programmatically insert some text before each merged document that is appended. To do this I programmatically create a Word document (let’s name it mainDoc) that will be the main one where I will append all other documents.
Then, for each file in the list, I will create another document, use its documentBuilder to insert the desired text, and I append this document to the mainDoc. Then I load the first file in the list and append it to the mainDoc. This process is repeated until all the files in the listare appended to the mainDoc.

However as you can see in the result sample.docx (1.0 MB)
, there is an empty page in between each appended file.

Can you please give guidance in how can we avoid these empty pages?

Thank you again for the help.

@m1tnick Thank you for additional information.

  1. An empty page at the beginning of _MergedDocuments.docx appears because it looks like you append documents to a newly created Document. A newly created document is not actually absolutely empty, it contain one section with body and an empty paragraph. So when you simply append another document to the newly create document, there will be an empty page at the beginning. You can remove all children from the document to avoid this:
    Document dstDoc = new Document();
    dstDoc.removeAllChildren();
    // Here append your sub documents
    // ......
    
  2. An empty page at the end of _MergedDocuments.docx is produced by an explicit page break:

    It looks like this break is in the source document you have appended. Looks like it is produced by Aspose.Cells while conversion from Excel to Word. You can remove the explicit page break at the end of the document using code like this:
    Document doc = new Document("C:\\Temp\\_MergedDocuments.docx");
    
    // Get the last paragraph in the document.
    Paragraph lastPara = doc.getLastSection().getBody().getLastParagraph();
    // Remove last run in the paragraph if it contains page break.
    if (lastPara.getRuns().getCount() > 0)
    {
        Run lastRun = lastPara.getRuns().get(lastPara.getRuns().getCount() - 1);
        if (lastRun.getText().equals(ControlChar.PAGE_BREAK))
            lastRun.remove();
    }
    
    doc.save("C:\\Temp\\out.docx");
    
  3. It looks like you insert title in a separate section as a result it is on the separate page. I think in your case you can simply insert title at the very beginning of each document:
    Document doc = new Document("C:\\Temp\\in.docx");
    
    // Add title as the first paragraph.
    Paragraph title = new Paragraph(doc);
    title.appendChild(new Run(doc, "This is a paragraph at the very beginning of the document."));
    
    doc.getFirstSection().getBody().prependChild(title);
    
    But you should note that MS Word documents are flow documents and after inserting content at the beginning, document content will be reflowed.
1 Like