Hi. I have a question and a Word file. there are some headers in files. such as level 1/2/3. I want to use Python aspose word to extract all headers in these files and sorted them in reading order?
Is there any sample code for reference? Thanks.
Hi. I have a question and a Word file. there are some headers in files. such as level 1/2/3. I want to use Python aspose word to extract all headers in these files and sorted them in reading order?
Is there any sample code for reference? Thanks.
@ZZZ21321 You can use code like the following to get all heading paragraphs from the document in their logical order:
doc = aw.Document("C:\\Temp\\in.docx")
# get all paragraphs in the document.
paragraphs = doc.get_child_nodes(aw.NodeType.PARAGRAPH, True)
# loop through all paragraphs and print only headings.
for p in paragraphs :
para = p.as_paragraph()
if para.paragraph_format.is_heading :
print(para.to_string(aw.SaveFormat.TEXT).strip())
Thanks, but I get some print. but miss the first level?
Word
image.png (53.6 KB)
Code
image.png (17.9 KB)
@ZZZ21321 Could you please attach your input document here for testing? It is difficult to say what is going wrong without a real document. Unfortunately, screenshots does not gave the required information for analysis.
@ZZZ21321 It looks like in your case you need to identify paragraphs by their outline levels. Please try using the following code:
doc = aw.Document("C:\\Temp\\in.docx")
# get all paragraphs in the document.
paragraphs = doc.get_child_nodes(aw.NodeType.PARAGRAPH, True)
# loop through all paragraphs and print only headings.
for p in paragraphs :
para = p.as_paragraph()
if (para.paragraph_format.outline_level == aw.OutlineLevel.LEVEL1
or para.paragraph_format.outline_level == aw.OutlineLevel.LEVEL2
or para.paragraph_format.outline_level == aw.OutlineLevel.LEVEL2) :
print(para.to_string(aw.SaveFormat.TEXT).strip())