How to find highlighted text

MS Word has a feature - find text that ‘highlight’ (for example ‘Quickly find highlighted text’, ‘Word: Find highlighted text’). This function finds the most long sequence of highlighted words.

Does Aspose.Words have such feature?

Thank you.

Hi Alexander,


Thanks for your inquiry. Please try using the following code that finds a particular ‘highlighted text’ and increases the size of that text in output document.
<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>
<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Document doc = <span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 0, 128); font-weight: bold;”>new <span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Document(<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 128, 0); font-weight: bold;”>“D:<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 0, 128); font-weight: bold;”>\<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 128, 0); font-weight: bold;”>temp<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 0, 128); font-weight: bold;”>\<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 128, 0); font-weight: bold;”>input.docx”<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>);
<pre style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>
FindReplaceOptions opts = new FindReplaceOptions();
opts.setDirection(FindReplaceDirection.BACKWARD);
opts.ReplacingCallback = new ReplaceEvaluator();

Pattern regex = Pattern.compile(“text”, Pattern.CASE_INSENSITIVE);
doc.getRange().replace(regex, “text”, opts);

doc.save(“D:\temp\awjavaout-17.4.docx”);

<pre style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>static class ReplaceEvaluator implements IReplacingCallback
{
/
* This method is called by the Aspose.Words find and replace engine for each match.
* This method highlights the match string, even if it spans multiple runs.
*/
public int replacing(ReplacingArgs e) throws Exception
{
// This is a Run node that contains either the beginning or the complete match.
Node currentNode = e.getMatchNode();

// The first (and may be the only) run can contain text before the match,
// in this case it is necessary to split the run.
if (e.getMatchOffset() > 0)
currentNode = splitRun((Run)currentNode, e.getMatchOffset());

// This array is used to store all nodes of the match for further highlighting.
ArrayList runs = new ArrayList();

// Find all runs that contain parts of the match string.
int remainingLength = e.getMatch().group().length();
while (
(remainingLength > 0) &&
(currentNode != null) &&
(currentNode.getText().length() <= remainingLength))
{
runs.add(currentNode);
remainingLength = remainingLength - currentNode.getText().length();

// Select the next Run node.
// Have to loop because there could be other nodes such as BookmarkStart etc.
do
{
currentNode = currentNode.getNextSibling();
}
while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN));
}

// Split the last run that contains the match if there is any text left.
if ((currentNode != null) && (remainingLength > 0))
{
splitRun((Run)currentNode, remainingLength);
runs.add(currentNode);
}

for (Run run : (Iterable) runs)
if (run.getFont().getHighlightColor().getRGB() != 0) {
run.getFont().setSize(16);
run.getFont().setColor(Color.red);
}

// Signal to the replace engine to do nothing because we have already done all what we wanted.
return ReplaceAction.SKIP;
}

/
* Splits text of the specified run into two runs.
* Inserts the new run just after the specified run.
*/
private Run splitRun(Run run, int position) throws Exception
{
Run afterRun = (Run)run.deepClone(true);
afterRun.setText(run.getText().substring(position));
run.setText(run.getText().substring((0), (0) + (position)));
run.getParentNode().insertAfter(afterRun, run);
return afterRun;
}
}

Hope, this helps.

Best regards,

Hi Awais,


thank you for your answer. But I need to find all highlighted text, not a particular text.
My task is - count of highlighted blocks of text. In MS Word I can find such text by selecting checkbox ‘Highlight’ and leave text box is empty in Find dialog.

Best regards,
Alexander Dyuzhev

Hi,


Thanks for your inquiry. Please try using the following code:
<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>
<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Document doc = <span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 0, 128); font-weight: bold;”>new <span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Document(<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 128, 0); font-weight: bold;”>“D:<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 0, 128); font-weight: bold;”>\<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 128, 0); font-weight: bold;”>temp<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 0, 128); font-weight: bold;”>\<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 128, 0); font-weight: bold;”>input.docx”<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>);
<pre style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>
int count = 0;
for(Run run : (Iterable) doc.getChildNodes(NodeType.RUN, true)) {
if (run.getFont().getHighlightColor().getRGB() != 0) {
count++;
}
}

System.out.println(“Count:” + count);

Hope, this helps.

Best regards,

Thank you. I’ve tried your code for my sample document (attached).

The program found 7 items:
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:lime;mso-highlight:lime;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Text
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:lime;mso-highlight:lime;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Text
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:lime;mso-highlight:lime;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Text
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:lime;mso-highlight:lime;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Text
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:red;mso-highlight:red;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Text<span lang=“EN-US” style=“font-size:
11.0pt;line-height:115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:
minor-latin;mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;
mso-hansi-theme-font:minor-latin;mso-bidi-font-family:“Times New Roman”;
mso-bidi-theme-font:minor-bidi;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:lime;mso-highlight:lime;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>TextTextText

But MS Word found 5 items (as I need):
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:lime;mso-highlight:lime;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Text Text
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:lime;mso-highlight:lime;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Text
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:lime;mso-highlight:lime;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Text
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:lime;mso-highlight:lime;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>Text
<span lang=“EN-US” style=“font-size:11.0pt;line-height:
115%;font-family:“Calibri”,“sans-serif”;mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:
minor-latin;mso-bidi-font-family:“Times New Roman”;mso-bidi-theme-font:minor-bidi;
background:lime;mso-highlight:lime;mso-ansi-language:EN-US;mso-fareast-language:
EN-US;mso-bidi-language:AR-SA”>TextTextText

The difference in that the Aspose API found ‘Text Text’ sequence as three string, but Word as whole string.
Looks like Aspose API based on internal document markup instead of visual representation.

Hi Alexander,


Thanks for your inquiry. In this case, you can fix this problem by using the following code:
<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>
<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Document doc = <span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 0, 128); font-weight: bold;”>new <span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>Document(<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 128, 0); font-weight: bold;”>“D:<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 0, 128); font-weight: bold;”>\<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 128, 0); font-weight: bold;”>temp<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 0, 128); font-weight: bold;”>\<span style=“font-family: “Courier New”; font-size: 9pt; color: rgb(0, 128, 0); font-weight: bold;”>input_hl.docx”<span style=“background-color: rgb(255, 255, 255); font-family: “Courier New”; font-size: 9pt;”>);
<pre style=“font-family: “Courier New”; font-size: 9pt;”>doc.joinRunsWithSameFormatting();

int count = 0;
for(Run run : (Iterable) doc.getChildNodes(NodeType.RUN, true)) {
if (run.getFont().getHighlightColor().getRGB() != 0) {
count++;
}
}

System.
out.println(“count:” + count);

Hope, this helps.

Best regards,