ExtractPages API looks not working

hi ,

i am using aspose.words 21.8.0.

i have a document:
123.docx (29.0 KB)

i used the below clodes to extract pages from the documents:

var fileFullname = @"O:\123.docx";
var doc = new Aspose.Words.Document(fileFullname);
var pageCount = doc.PageCount;

for (int page = 0; page < doc.PageCount; page++)
{
    // Save each page as a separate document.
    var extractedPage2 = doc.ExtractPages(page, 1);
    var toFileName2 = System.IO.Path.Combine(@"O:\", $"SplitDocument.PageByPage_{page + 1}.docx");
    extractedPage2.Save(toFileName2);
}

the extracted page2 looks not good enough:

SplitDocument.PageByPage_2.docx (16.3 KB)

i attahced all extracted pages:
SplitDocument.PageByPage_1.docx (15.7 KB)

SplitDocument.PageByPage_2.docx (16.3 KB)

SplitDocument.PageByPage_3.docx (17.7 KB)

please have a look on this issue .

thankk you very much.

@vs6060_qq_com The problem is caused by the first paragraph on the second page. It behaves differently when it is in the middle of the document and if it is the first paragraph of the document. You can check this by removing content of the first page in MS Word.

You can work the problem around by removing empty paragraphs at the beginning of the extracted page:

Document doc = new Document(@"C:\Temp\in.docx");
doc.WarningCallback = new WarningCallback();

for (int i = 0; i < doc.PageCount; i++)
{
    // Save each page as a separate document.
    Document page = doc.ExtractPages(i, 1);

    // Remove empty paragraphs at the beginning of the page.
    while (page.FirstSection.Body.FirstParagraph != null && !page.FirstSection.Body.FirstParagraph.HasChildNodes)
        page.FirstSection.Body.FirstParagraph.Remove();

    page.Save($@"C:\Temp\page_{i}.docx");
}

hi ,

thank you for your help ,the code looks fine to me,we will apply theses codes to our prod environment.

but sorry ,i found another issue:

1234.docx (831.6 KB)

my testing codes are below:

var fileFullname = @"O:\1234.docx";
var doc = new Aspose.Words.Document(fileFullname);
doc.WarningCallback = new WarningCallback();
var pageCount = doc.PageCount;
for (int page = 0; page < doc.PageCount; page++)
{
    // Save each page as a separate document.
    var extractedPage2 = doc.ExtractPages(page, 1);
    var toFileName2 = System.IO.Path.Combine(@"O:\", $"SplitDocument.PageByPage_{page + 1}.docx");
    extractedPage2.Save(toFileName2);
}

the API doc.PageCount looks not working.there are actully 3 pages, but the api returns 4 pages.

and the SplitDocument.PageByPage_3.docx (792.9 KB) alos looks not good enough.

i also uploaded all the extracted pages:

SplitDocument.PageByPage_1.docx (808.4 KB)

SplitDocument.PageByPage_2.docx (793.3 KB)

SplitDocument.PageByPage_3.docx (792.9 KB)

SplitDocument.PageByPage_4.docx (791.7 KB)

please have a look, thank you very much.

plase also remove the attached screen shot ,because it contains some private informations, or make the private informations invisibile.
it is important.
thank you very much .

@vs6060_qq_com
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-26551

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

hi,

any news for the above issue?

@vs6060_qq_com The issue is already resolved in the current codebase. The fix will be included into the next 24.3 (March 2024) version of Aspose.Words. We will be sure to let you know once it is published.

The issues you have found earlier (filed as WORDSNET-26551) have been fixed in this Aspose.Words for .NET 24.3 update also available on NuGet.