We are implementing functionality to take specific pages from Word documents, and turning them into new Word Documents, using the Aspose Library, Version 18.11
We found the PageSplitter code within the Aspose Github repository, and have adapted it in our own code. https://reference.aspose.com/words/java/com.aspose.words/Document#extractPages(int,int)
Unfortunately there are some documents which, when manipulated using Aspose, have parts of paragraphs shifted to subsequent pages.
As an example, please see the original file, a 2-page document called “example.doc”, attached (please note it is within the Archive.zip)
We run the below code against “example.doc”. To summarise, it is supposed to create a new document, made up of the pages from the original document specified in the List pages list. (I can produce a version of this code that can be compiled upon request).
String extractPages(ClippingRequestMessage requestMessage, File originalFile, List<Integer> pages) throws Exception {
Document document = new Document(originalFile.getAbsolutePath());
DocumentPageSplitter splitter = new DocumentPageSplitter(document);
Document tempDocument = new Document();
int counter = 0;
sort(pages);
for (int pageNum : pages) {
counter++;
if (counter == 1) {
tempDocument = splitter.getDocumentOfPage(pageNum);
cleanUpDocument(tempDocument);
} else {
Document tempNewDocument = splitter.getDocumentOfPage(pageNum);
tempDocument.appendDocument(tempNewDocument, ImportFormatMode.KEEP_SOURCE_FORMATTING);
}
}
String tempClippedDocument = TempFileHelper.getTempClippedFile(requestMessage);
tempDocument.save(tempClippedDocument);
return tempClippedDocument;
}
If we call the above function asking for pages 1 and 2 of the original document, the resulting document is attached within Archive.Zip: “ example-clipped.doc ”. Comparing “example.doc” and “example-clipped.doc”, you can see that the last paragraph on page 1 of “example.doc” is split up in “example-clipped.doc”, with parts of the paragraph appearing in both pages 1 and 2.
This is a problem for us because we expect the new document to retain its original format and paragraph structure, even if it is only a subset of pages of the original document.
We would greatly appreciate some help on retaining the original structure of the word document and the paragraphs within, even when splitting the document.
Additionally , we notice that this splitting of paragraphs also occurs when converting a Word document to a pdf using Aspose.
Using the below code against “example.doc”:
String originalFilePath = originalFile.getAbsolutePath();
Document document = new Document(originalFilePath);
String directory = Paths. *get* (originalFilePath).getParent().toString();
PdfSaveOptions options = new PdfSaveOptions();
options.setPreserveFormFields(true);
options.setUseCoreFonts(true);
options.setJpegQuality(10);
options.setExportDocumentStructure(true);
String tempPdfFilePath = directory + File. *separator* + *TEMP_PDF* ;
options.setCompliance(PdfCompliance. *PDF_15* );
document.save(tempPdfFilePath, options);
The resulting file, “example-clipped. pdf ” (also in Archive.zip), also has the split-paragraph problem, as can be seen at the end of page 1 of the document.
Would also appreciate assistance on this issue, as it may be linked to the original problem to do with splitting the document file.
Kind regards,
Sunny
Archive.zip (34.5 KB)