ExtractPages - when I use ExtractPages it breaks my numbering

Good day,

when I split the document into sections. I use the following code for this, it breaks the numbering in the document. Could you take a look at it for me?

Document docPR = new Document($"{path}4P003q.docx");

Document docClone = docPR.Clone();
docClone.Sections.Clear();
docPR.UpdatePageLayout();

ImportFormatOptions opt = new ImportFormatOptions() { KeepSourceNumbering = true };
for (int i = 0; i < docPR.PageCount; i++)
{
    Document page = docPR.ExtractPages(i, 1);
    docClone.AppendDocument(page, ImportFormatMode.UseDestinationStyles, opt);
}
docClone.Save($"{path}4P003qOut.docx");

Document docX = new Document(@"C:\Temp\Html_dif\3P002f.docx");

Thank you in advance

4P003q.docx (7.5 MB)

@benestom
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-26386

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

It looks like the problem is caused by KeepSourceNumbering if disable this option numbering in the output document is correct.

Document doc = new Document(@"C:\Temp\in.docx");

Document docClone = (Document)doc.Clone(false);
ImportFormatOptions opt = new ImportFormatOptions() { KeepSourceNumbering = false };
for (int i = 0; i < doc.PageCount; i++)
{
    Document page = doc.ExtractPages(i, 1);
    docClone.AppendDocument(page, ImportFormatMode.UseDestinationStyles, opt);
}

docClone.Save(@"C:\Temp\out.docx");

@benestom With Document.EnsureMinimum(), MergeDocuments() works as expected and the numbering is correct:

Document docPR = new Document("in.docx");

Document docClone = (Document )docPR.Clone(false);

// Add this as the resilience for NRE! 
docClone.EnsureMinimum();

for (int i = 0; i < docPR.PageCount; i++)
{
    Document page = docPR.ExtractPages(i, 1);
    docClone = Merger.Merge(new Document[] { docClone, page }, MergeFormatMode.KeepSourceLayout);
}
docClone.Save("out.docx");

Please try using the suggested code and let us know if the problem still persists.

The issues you have found earlier (filed as WORDSNET-26386) have been fixed in this Aspose.Words for .NET 25.1 update also available on NuGet.

Hello

I encountered a similar problem where the numbering is not being respected when ExtractPages and then AppendDocument. Please can you take a look at it for me.
I am sending a screen and a document as an attachment.

Thank you very much in advance
test01.docx (120.6 KB)

@benestom I checked this document with the following code ExtractPages - when I use ExtractPages it breaks my numbering - #3 by alexey.noskov and the problem is not reproduced. Here is my output file:

out.docx (138.9 KB)

Hello,

It solved the chapter continuity, but in another document the heading styles are changing, from “Nadpis 1” to “Heading 1”, I need this to stop happening. They probably don’t match well, I’m sending a photo of the fonts from the document before and after.

thank for reply

@benestom Do you use the same input document as you have shared in your initial post? If so, could you please specify pages you are extracting from it? Also, as I can see the document you have attached in this post was generated by Aspose.Words and heading styles already has English names:

<w:style w:type="paragraph" w:styleId="Heading1">
	<w:name w:val="heading 1" />
	<w:basedOn w:val="Normal" />
	<w:next w:val="Normal" />
	<w:link w:val="Nadpis1Char" />
	<w:uiPriority w:val="9" />

If you are using another input document, please attach it here for testing along with simple code that will allow us to reproduce the problem.
Probably you are using a localized version of MS Word on your side and that is why style names are shown in Czech.

the document where this is happening is too big, so I send only part of it as input and output. But the behavior is different - all formatting is lost.

code:

var docNad = new Document(@"C://temp/input.docx");


Document docClone = (Document)docNad.Clone(false);

// Add this as the resilience for NRE! 
docClone.EnsureMinimum();

for (int i = 0; i < docNad.PageCount; i++)
{
    Document page = docNad.ExtractPages(i, 1);
    docClone = Merger.Merge(new Document[] { docClone, page }, MergeFormatMode.KeepSourceLayout);
}
docClone.Save(@"C://temp/output.docx");

output.docx (65.0 KB)

input.docx (58.5 KB)

Please test this file again and check the chapters
1P189j.docx (1.9 MB)

@benestom Here is what I see in English version of MS Word:

and for the file 1P189j.docx when you execute the script (split into pages), the output doesn’t have the same number of new fonts as mine? (on the picture)

Can’t it be a language pack?

@benestom It looks like I misunderstood you problem. Do you mean the problem is that after splitting and rejoining the documents there are several copies of Heading 1 and other styles? Like Heading 1_1, Heading 1_2Heading 1_N? If so, this is an expected behavior when you use KeepSourceFormatting or KeepSourceLayout merge mode. To avoid creating copies of the styles you should use UseDestinationStyles mode.

Ano, pokud použiji v Aspose.Words.LowCode.Merger.Merge(…) - KeepSourceFormatting nebo KeepSourceLayout tak nadpisy strácím nebo se naopak rozmnoží, jak použiji tuto funkci s UseDestinationStyles lze mi tam dát MergeFormatMode.MergeFormatting a ten již má nadpisi v pořádku, ale jaké další dopady to může mít…??

@benestom ImportFormatMode.UseDestinationStyles is used by Aspose.Words.LowCode.Merger when MergeFormatMode.MergeFormatting is specified.
Regarding side effects caused by this document merge mode it depends on your document and should be tested with your real scenarios. When using the ImportFormatMode.UseDestinationStyles or MergeFormatMode.MergeFormatting option, if a matching style already exists in the destination document, the style is not copied and the imported nodes are updated to reference the existing style.

I’m in a confused situation now. If I use:

docClone = Merger.Merge(new Document[] { docClone, page }, MergeFormatMode.KeepSourceLayout);

Then I have the chapters and the connections correctly, but the photos multiply there (normal 1 … normal 7 ).

if I use

docClone = Merger.Merge(new Document[] { docClone, page }, MergeFormatMode.MergeFormatting);

Then I can’t get the fonts but the connection to individual chapters doesn’t work.

I’m just looking for a way that wouldn’t multiply the chapters and at the same time the connection to individual chapters would work.

for this dokument:
1P189j.docx (1.9 MB)

@benestom Please try using the following approach to split and rejoin document:

Document doc = new Document(@"C:\Temp\in.docx");

// Split the document into pages.
Document[] pages = new Document[doc.PageCount];
for(int i =0; i<pages.Length; i++)
    pages[i] = doc.ExtractPages(i, 1);

// Merge the pages back.
Document result = null;
ImportFormatOptions opt = new ImportFormatOptions() { KeepSourceNumbering = true };
foreach (Document page in pages)
{
    if (result == null)
        result = page;
    else
        result.AppendDocument(page, ImportFormatMode.UseDestinationStyles, opt);
}
                
result.Save(@"C:\Temp\out.docx");

Thank you, I tried this method but again I came across another document with incorrect chapter numbering. I can send you the document if you would like to take a look at it.

@benestom Yes, please provide the problematic input and output documents here for testing.