Creating footnotes from text between 2 markers

I am trying to convert a string between 2 markers (“#stfoot#” and “#enfoot#”) into a footnote using the python script below. It is working but there a small issue (see screenshots); the superscript, that was created, is at the end of the paragraph instead of being in the place where the markers were.
Original document:

Processed document:

The python code is:

import sys
import aspose.words as aw
import aspose.pydrawing as drawing

def process_document(doc_path):
    # Load the document
    doc = aw.Document(doc_path)
    
    # Initialize a DocumentBuilder
    builder = aw.DocumentBuilder(doc)

    # Iterate through paragraphs
    paragraphs = doc.get_child_nodes(aw.NodeType.PARAGRAPH, True)
    for paragraph in paragraphs:
        # Get the text of the paragraph
        paragraph_text = paragraph.get_text()
        
        # Check if the paragraph contains #stfoot# and #enfoot#
        if "#stfoot#" in paragraph_text and "#enfoot#" in paragraph_text:

            # Extract the text between #stfoot# and #enfoot#
            footnote_text = paragraph_text.split("#stfoot#")[1].split("#enfoot#")[0].strip()

            # Remove the #stfoot#, #enfoot# markers and the extract the text from the paragraph text
            modified_paragraph_text = paragraph_text.replace("#stfoot#", "").replace("#enfoot#", "")
            doc.range.replace("#stfoot#", "", aw.replacing.FindReplaceOptions(aw.replacing.FindReplaceDirection.FORWARD))
            doc.range.replace("#enfoot#", "", aw.replacing.FindReplaceOptions(aw.replacing.FindReplaceDirection.FORWARD))
            doc.range.replace(footnote_text, "", aw.replacing.FindReplaceOptions(aw.replacing.FindReplaceDirection.FORWARD))
            
            # Insert footnote
            builder.move_to(paragraph)
            builder.write("")
            builder.insert_footnote(aw.notes.FootnoteType.FOOTNOTE, footnote_text)


    # Save the modified document
    output_path = "output.docx"
    doc.save(output_path)
    print(f"Footnotes inserted successfully! Saved to {output_path}")

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python script.py <input_docx>")
        sys.exit(1)
    
    input_docx = sys.argv[1]
    process_document(input_docx)

Any help is appreciated.

@ansar2024 You can achieve this using the following code:

doc = aw.Document("C:\\Temp\\in.docx")
builder = aw.DocumentBuilder(doc)

# replace markers with themselves to make them to be represented as single runs.
opt = aw.replacing.FindReplaceOptions()
opt.use_substitutions = True
doc.range.replace_regex("#stfoot#[\\w\\s]+?#enfoot#", "$0", opt)

# loop through the runs in the document and insert footnotes at placeholders.
for r in doc.get_child_nodes(aw.NodeType.RUN, True) :
    run = r.as_run()
    if run.text.startswith("#stfoot#") :
        footnote_text = run.text.replace("#stfoot#", "").replace("#enfoot#", "")
        builder.move_to(run)
        builder.insert_footnote(aw.notes.FootnoteType.FOOTNOTE, footnote_text)
        run.remove()

doc.save("C:\\Temp\\out.docx")

The code replaces markers with the marked text with themselves, this makes the marked text to be represented as a single Run. Then footnote is inserted at the Run.
in.docx (12.8 KB)
out.docx (11.0 KB)

Thanks a million!!! It works

1 Like

I faced another issue when the document page has more than 1 footnote markers, for example:
#stfoot# La storia è diventata un film nel 1944#enfoot#.some text… #stfoot# here is another footnote#enfoot# some other text.
Could you please advise? Thank you

@ansar2024 The above code works for two footnotes:
in.docx (12.7 KB)
out.docx (11.0 KB)

Could you please attach your problematic input and output documents here for testing?

Here is an example of input doc that I tried:
capitolo2.docx (19.4 KB)
I have highlighted the relevant text.

@ansar2024 Regula expression does not match a dot in the third case. Please modify regular expression like this:

doc.range.replace_regex("#stfoot#.+?#enfoot#", "$0", opt)

Actually, I made a mistake while testing the code you provided. The regular expression you provided initially is working fine .(doc.range.replace_regex(“#stfoot#[\w\s]+?#enfoot#”, “$0”, opt))
I apologize for the confusion. Thank you very much for you help. The issue is completely solved.

@ansar2024 It is perfect that you managed to achieve what you need. Please feel free to ask in case of any issues. We are always glad to help you.