Lost ParagraphFormat When Split Words in C#

Hi,

I’m trying to split some Words documents by heading. But the output files seem like losting ParagraphFormat .
How can I keep the same formating with the sourced document.

Thanks.

Test Info

Using SDK: Aspose.Words for .Net 24.10.
Tested OSs were Win7 and Win10
.NET target platform is .net 6.0

Test Code

ConsoleApp.Words.7z (8.6 KB)

Test File

RTBasic手册.7z (127.9 KB)

@kngstr

  1. When splitting Word documents by heading using Aspose.Words for .NET, paragraph formatting can be lost in the output files .
  2. To preserve paragraph formatting when splitting documents, you should use HtmlSaveOptions with appropriate settings such as DocumentSplitCriteria and DocumentSplitHeadingLevel [2, 3].
  3. Specifically, setting the DocumentSplitCriteria property in HtmlSaveOptions to HeadingParagraph allows you to split the document by headings while preserving formatting [2, 3].

Here is a minimal example based on the provided information:

var doc = new Document("input.docx");
HtmlSaveOptions options = new HtmlSaveOptions
{
 HtmlVersion = Aspose.Words.Saving.HtmlVersion.Html5,
 DocumentSplitCriteria = DocumentSplitCriteria.HeadingParagraph,
 DocumentSplitHeadingLevel = 6
};
doc.Save("output.html", options);

This approach ensures that the document is split by headings and maintains the original paragraph formatting in the resulting HTML files [2, 3].

Actually I don’t want to save to html.

@kngstr

  1. When splitting Word documents by heading using Aspose.Words for .NET, paragraph formatting can be lost in the output files .
  2. The user wants to avoid saving in HTML format and instead needs to maintain paragraph formatting when splitting documents .
  3. While the standard approach involves using HtmlSaveOptions with DocumentSplitCriteria set to HeadingParagraph to preserve formatting , this method inherently produces HTML output.
  4. To achieve splitting while preserving formatting without saving as HTML, you would need to manually iterate through the document’s nodes and create new documents for each section, ensuring that paragraph formatting is copied over during the process. This requires more custom code than the HTML-based method but allows for maintaining the original document format.

@kngstr Please try using the following code to split document by heading styles:

Document doc = new Document(@"C:\Temp\in.docx");
Document[] parts = SplitByStyle(doc, new StyleIdentifier[] { StyleIdentifier.Heading1, StyleIdentifier.Heading2 });
for (int i = 0; i < parts.Length; i++)
{
    parts[i].Save($@"C:\Temp\out_{i}.docx");
}
/// <summary>
/// Splits the document by styles.
/// </summary>
private static Document[] SplitByStyle(Document doc, StyleIdentifier[] styles)
{
    ImportFormatOptions options = new ImportFormatOptions();
    options.KeepSourceNumbering = true;

    List<Document> docParts = new List<Document>();
    // Create the first part and node importer for it.
    Document currentPart = (Document)doc.Clone(false);
    NodeImporter currentImporter = new NodeImporter(doc, currentPart, ImportFormatMode.UseDestinationStyles, options);
    docParts.Add(currentPart);

    foreach (Section sect in doc.Sections)
    {
        // Create section in the target document, keeping the original section's header/footer.
        Section currentSection = (Section)currentImporter.ImportNode(sect, true);
        currentSection.Body.RemoveAllChildren();
        currentPart.AppendChild(currentSection);
        foreach (Node child in sect.Body.GetChildNodes(NodeType.Any, false))
        {
            if (child != sect.Body.FirstChild && child.NodeType == NodeType.Paragraph)
            {
                Paragraph p = (Paragraph)child;
                if (styles.Contains(p.ParagraphFormat.StyleIdentifier))
                {
                    // Create another part.
                    currentPart = (Document)doc.Clone(false);
                    currentImporter = new NodeImporter(doc, currentPart, ImportFormatMode.UseDestinationStyles, options);
                    docParts.Add(currentPart);

                    // Create the next section.
                    currentSection = (Section)currentImporter.ImportNode(sect, true);
                    currentSection.Body.RemoveAllChildren();
                    currentPart.AppendChild(currentSection);
                }
            }
            // Put content into the target document.
            currentSection.Body.AppendChild(currentImporter.ImportNode(child, true));
        }
    }

    foreach (Document part in docParts)
    {
        // Trim page breaks from the last paragraph
        while (part.LastSection.Body.LastParagraph.LastChild != null &&
            part.LastSection.Body.LastParagraph.LastChild.NodeType == NodeType.Run &&
            ((Run)part.LastSection.Body.LastParagraph.LastChild).Text == ControlChar.PageBreak)
            part.LastSection.Body.LastParagraph.LastChild.Remove();
    }

    return docParts.ToArray();
}

Also, you can use low code Splitter class to split the document by styles:

SplitOptions opt = new SplitOptions();
opt.SplitCriteria = SplitCriteria.Style;
opt.SplitStyle = "Heading 1";
Aspose.Words.LowCode.Splitter.Split(@"C:\Temp\in.docx", @"C:\Temp\part.docx", opt);

@alexey.noskov
Thanks for your answer.

But the problem is not fixed.
Paragraph Settings are still different.

Source settings

Output settings

@kngstr
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-28993

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.