Hi, I am trying to perform few tasks on word document(.docx) using python.
- Find particular text and highlight it and add comment to it. It should highlight only if the text is heading/title.
- Find the hyperlinks with it’s name and highlight and add comments to hyperlink.
- Find the table contents and highlight, add comment to it.
Please help me to perform above tasks using aspose.words in python.
Thanks in advance.
@harshitha112000, could you please attach a sample document? We will analyze it and provide you with a code examples.
Sure, here is the document
Document.docx (6.3 KB)
@harshitha112000, as you might know, Aspose.Words for Python via .NET is a Python wrapper for Aspose.Words for .NET. It was introduced not too long ago, and some of the functionality of Aspose.Words for .NET is not yet provided in the Python version. Most of your tasks require searching for text in documents. Usual pattern for this in .NET is as follows:
string pattern = "<searchpattern>;
Document document = new Document("in.docx");
SearchOnlyCallback searchCallback = new SearchOnlyCallback();
FindReplaceOptions searchOptions = new FindReplaceOptions
{
ReplacingCallback = searchCallback
};
document.Range.Replace(pattern, "", searchOptions);
foreach (Node in in searchCallback.Occurrences)
Console.WriteLine(node.GetText());
internal class SearchOnlyCallback : IReplacingCallback
{
public ReplaceAction Replacing(ReplacingArgs args)
{
mOccurrences.Add(args.MatchNode);
return ReplaceAction.Skip;
}
public List<Node> Occurrences {
get { return mOccurrences; }
}
private List<Node> mOccurrences = new List<Node>();
}
Unfortunately, the usage of the callback (IReplacingCallback) is not yet supported in Aspose.Words for Python via .NET:
So there is no elegant solution for finding text in document at the moment.
If you provide more detailed explanation of what you need to find in documents, we could try to find a solution for you.
Ok, is there a way to list all the hyperlinks with its name in the document and actual url?
@harshitha112000, could you attach a sample document with such hyperlinks?
The above same document has some hyperlinks at the end, if possible help me with highlighting all those hyperlinks.
@harshitha112000, could you try the following example and check whether it produces the expected results:
from aspose.pydrawing import Color
from aspose.words import Document
from aspose.words.fields import FieldType
from aspose.words.replacing import FindReplaceOptions
doc = Document("Document.docx")
links = []
for field in doc.range.fields:
if field.type == FieldType.FIELD_HYPERLINK:
hyperlink = field.as_field_hyperlink()
links.append(hyperlink)
print(hyperlink.address)
options = FindReplaceOptions()
options.apply_font.highlight_color = Color.yellow
for link in links:
doc.range.replace(link.display_result, link.display_result, options)
doc.save("out.docx")
It’s working as expected, thanks for the code. Is there a similar way to do it for table contents as well? It would be very helpful if you could provide code for highlighting and add comments to the table content.
@harshitha112000, could you manually create a sample document with the highlightning and comments? It would help us understand your requirements.
out1.docx (4.0 KB)
Here is the sample document.
@harshitha112000, I don’t see any highlightning or comments in out1.docx. Is this the correct file?
yes. I have attached the screenshots as well.
MicrosoftTeams-image (1).png (49.9 KB)
MicrosoftTeams-image.png (53.5 KB)
@harshitha112000, the code example below highlights the specified text patterns with red and then adds comments:
from datetime import datetime
from aspose.pydrawing import Color
from aspose.words import Comment, Document, DocumentBuilder, NodeType, Paragraph, Run
from aspose.words.replacing import FindReplaceOptions
doc = Document("out1.docx")
patterns = [
"The BOT is fit for intended use",
"No. of critical errors",
"The purpose of the project(s)",
"New document"]
# Highlight the text patterns
options = FindReplaceOptions()
options.apply_font.highlight_color = Color.red
for pattern in patterns:
doc.range.replace(pattern, pattern, options)
builder = DocumentBuilder(doc)
# Find the highlighted runs
for node in doc.get_child_nodes(NodeType.RUN, True):
run = node.as_run()
if run.font.highlight_color == Color.red:
# Create comment and add a text
comment = Comment(doc, "Harshitha", "H", datetime.today())
comment.paragraphs.add(Paragraph(doc))
comment.first_paragraph.runs.add(Run(doc, "Comment text"))
# Insert the comment before the found run
builder.move_to(run)
builder.current_paragraph.insert_before(comment, run)
doc.save("out_python_comments.docx")
The comment adding shown in this example might not work sometimes. The correct solution should be based on search callbacks(IReplaceCallback), but as I mentioned earlier, this functionality is not yet supported in the Python version of Aspose.Words.
It’s working fine. Thanks.
I have one issue, while saving the document it’s not saving the entire document but only the first few pages. Why is that happening?
@harshitha112000, have you applied the license as shown in our documentation:
No. Maybe thats the reason. Thanks
I had one more doubt. How to identify heading 1 and heading 2 and highlight, add comments using this library for the Document.docx which I shared earlier?
@harshitha112000, I don’t see heading 1 and heading 2 in Document.docx. If you meant “Objective” and “How to read this document” you find them by adding the &p metacharacter to the end of pattern:
from datetime import datetime
from aspose.pydrawing import Color
from aspose.words import Comment, Document, DocumentBuilder, NodeType, Paragraph, Run
from aspose.words.replacing import FindReplaceOptions
doc = Document("Document.docx")
patterns = ["Objective&p", "How to read this document&p" ]
options = FindReplaceOptions()
options.apply_font.highlight_color = Color.red
for pattern in patterns:
doc.range.replace(pattern, pattern, options)
builder = DocumentBuilder(doc)
for node in doc.get_child_nodes(NodeType.RUN, True):
run = node.as_run()
if run.font.highlight_color == Color.red:
comment = Comment(doc, "Harshitha", "H", datetime.today())
comment.paragraphs.add(Paragraph(doc))
comment.first_paragraph.runs.add(Run(doc, "Comment text"))
builder.move_to(run)
builder.current_paragraph.insert_before(comment, run)
doc.save("out_python_comments.docx")
In Document.docx, Description is Heading1 and Project description is Heading 2. I asked for this. I used istitle() function but it is identifying only heading 1
@harshitha112000, in my understanding Heading 1 and Heading 2 are styles with such names. In Document.docx attached in this thread, “Description” and “Project description” have the Normal style applied, so they are not considered as headings.
There is the is_heading property that returns true for heading paragraphs, for instance:
from aspose.words import Document, NodeType
doc = Document("Document.docx")
for node in doc.get_child_nodes(NodeType.PARAGRAPH, True):
para = node.as_paragraph()
if para.paragraph_format.is_heading:
print(para.get_text())
In case of Document.docx this code prints nothing. This means no paragraphs with heading style are present in the document.
Could you please share the code example how you identify Description as Heading 1 and Project description as Heading2 using the istitle() function?