How do I convert heading numbering to plain text?

benestom · February 16, 2023, 3:25pm

Good day,

I am turning to you for advice. I have a document that contains several numbered headings. I need to export, for example, page 7, where heading number 7 is located. If I convert it to a new document, it will become heading number 1.
How do I convert heading numbering to plain text?

eduardo.canal · February 16, 2023, 5:55pm

@benestom unfortunately there is not easy way to convert calculated page numbers into plain text. HeaderFooter is shared across the Section, so the only way to achieve this is creating a Section by every page in the document.
A workaround for what you want could be set an starting page number for the new document, see the following example:

var indx = 6;
Document doc = new Document(@"C:\Temp\input.docx");
Document subDoc = doc.ExtractPages(indx, doc.PageCount - indx);

subDoc.FirstSection.PageSetup.PageStartingNumber = indx + 1;
subDoc.FirstSection.PageSetup.RestartPageNumbering = true;

subDoc.Save(@"C:\Temp\output.docx");

input.docx (3.7 MB)
output.docx (1.4 MB)

alexey.noskov · February 17, 2023, 6:36am

@benestom I suppose you are talking about list numbering, not about page numbering. numbering start number is controlled by ListLevel.StartAt property. If you use Document.ExtractPages method to split your document into pages, Aspose.Words take care about the correct list item numbering, so it has the same numbering as in source document. For example see the following simple code:

Document doc = new Document(@"C:\Temp\in.docx");
// Extract 7th page
Document subDoc = doc.ExtractPages(6, 1);
subDoc.Save(@"C:\Temp\out.docx");

in.docx (13.9 KB)
out.docx (11.1 KB)

As you can see in the output document Heading list numbering starts from seven.

If you actually need to convert list numbering into regular text, you can achieve this using code like the following:

Document doc = new Document(@"C:\Temp\in.docx");
           
// Update list labels.
doc.UpdateListLabels();

// Get all paragraphs, which are list items
List<Paragraph> listItems = doc.GetChildNodes(NodeType.Paragraph, true).Cast<Paragraph>()
    .Where(p => p.IsListItem).ToList();

// Convert list items into regular paragraphs with leading text that imitates numbering.
foreach (Paragraph item in listItems )
{
    string label = item.ListLabel.LabelString + "\t";
    Run fakeListLabelRun = new Run(doc, label);
    item.ListFormat.RemoveNumbers();
    item.PrependChild(fakeListLabelRun);
}

doc.Save(@"C:\Temp\out.docx");

FYI @eduardo.canal

benestom · February 17, 2023, 2:29pm

thank you for answer.
the heading number will be converted correctly. But there is a small difference.
offset from the number

If I want to convert a page 1:1. I would have to buy a PDF license. Save a document as a PDF and then extract the entire page with heading numbering? would it be like this?

alexey.noskov · February 17, 2023, 2:45pm

@benestom If your goal is saving a particular page of the document to PDF, you can do this by specifying page set in the PdfSaveOptions:

Document doc = new Document("in.docx");
PdfSaveOptions opt = new PdfSaveOptions();
opt.PageSet = new PageSet(6);
doc.Save("seventh_page.pdf", opt);

The differences after conversion list numbering to plain text appears because difference in indents applied to list item and to regular paragraph. But this is the last resort option. I I sure, you can get the expected output either by using Document.ExtractPages method or by direct saving a particular page of the document to PDF, without additional document pre-processing.

benestom · May 15, 2023, 9:03am

Good day,
I need some more advice from you. I’ll explain best on a specific thread. If I want pages from only one section. How do I get to be able to use PageSet but I don’t know the page numbers.

Ex:
Document has 10 sections, each section has a different number of pages. For example, the document has 108 pages. I need all the pages from section 7 through.

Thank you for answer.
Regards, TB

alexey.noskov · May 15, 2023, 12:14pm

@benestom You can use LayoutCollector to detect page index where section starts and ends:

Document doc = new Document("in.docx");
LayoutCollector collector = new LayoutCollector(doc);

// Get section.
Section sect = doc.Sections[6];

// Get page indexes where section starts and ends.
Console.WriteLine("Section stat page: " + collector.GetStartPageIndex(sect));
Console.WriteLine("Section end page: " + collector.GetEndPageIndex(sect));

benestom · June 19, 2023, 1:35pm

LayoutCollector layoutCollector = new LayoutCollector(modDoc);
layoutCollector.Clear();
modDoc.UpdatePageLayout();

Section sect5 = modDoc.Sections[5];
Section sect6 = modDoc.Sections[6];
int numberPage5 = layoutCollector.GetStartPageIndex(sect5);
int numberPage6 = layoutCollector.GetStartPageIndex(sect6);

?numberPage5 = 8 = numberPage6
I ran into a problem, can you look it up for me? both sections return the same page…

modDoc.docx (1.6 MB)

alexey.noskov · June 19, 2023, 2:08pm

@benestom This occurs because the 6th section starts from bookmark, which does not have visual representation and Aspose.Words detect it where section break is placed. Please try using the following code:

int numberPage5 = layoutCollector.GetStartPageIndex(sect5.Body.FirstParagraph);
int numberPage6 = layoutCollector.GetStartPageIndex(sect6.Body.FirstParagraph);

PS: It is not required to call

layoutCollector.Clear();
modDoc.UpdatePageLayout();

after creating new instance of LayoutCollector, since it’s constructor internally calls UpdatePageLayout.

benestom · July 25, 2023, 10:44am

good day,
I would like to consult with you regarding obtaining the number of the first page in the section.

I have a problem with section[32] where the section starts with a table. The section is on page 43, but when I call the GetStartPageIndex method above the v table, it returns the page number of the previous section. Is there a way around it, or why does it do that?

my code:

LayoutCollector layoutCollector = new LayoutCollector(modDoc);
Section sect = modDoc.Sections[32];
layoutCollector.GetStartPageIndex(ReturnFirstParagraphOrTable(sect));

private Node ReturnFirstParagraphOrTable(Section sec)
{
    foreach (Node node in sec.Body.ChildNodes)
    {
        if (node.NodeType == NodeType.Paragraph || node.NodeType == NodeType.Table)
        {
            return node;
        }
    }
    return sec.Body.FirstChild;
}

testcompareStart.zip.001.7z (5 MB)
testcompareStart.zip.002.7z (5 MB)
testcompareStart.zip.003.7z (3.7 MB)

The .7z suffix must be removed from individual files before unpacking

alexey.noskov · July 25, 2023, 12:32pm

@benestom You should use the first paragraph of the table to determine where the table content actually starts. Please modify your code like this:

private static Node ReturnFirstParagraph(Section sec)
{
    foreach (Node node in sec.Body.ChildNodes)
    {
        switch (node.NodeType)
        {
            case NodeType.Paragraph:
                return node;
            case NodeType.Table:
                return ((Table)node).FirstRow.FirstCell.FirstParagraph;
        }
    }
    return sec.Body.FirstChild;
}

benestom · July 26, 2023, 11:31am

Thank You, good job