Hello,
Our company is working on a project where we need to read a word document and detect the following:
- Find all text fragments in paragraphs that contain a strikethrough
- Find all text fragments in paragraphs that contain an underline
When the text within a paragraph contains a strikethrough, we need to remove the text completely.
When the text within a paragraph contains an underline, we just need to remove the underline and keep the text.
This will then be saved in another word document. Here is example of what we are trying to achieve. The original document will look something like this. Note, there is no option for strikethrough or underline in this editor, so I used metatags instead to indicate both [Strikethrough] and [Underline].
(1) [Strikethough] Applications for licensure must meet the prerequisites [Strikethough] for and pass the Foundations of Oriental Medicine, Acupuncture with Point Location, and Biomedicine examinations required for certification in acupuncture
(2) [Underline] Applicants for licensure must pass the examination in clean needle technique administered by the Council of Colleges for Acupuncture and Oriental Herbal Medicine, or its successor. [Underline]
The end result would look something like this:
(1) for and pass the Foundations of Oriental Medicine, Acupuncture with Point Location, and Biomedicine examinations required for certification in acupuncture
(2) Applicants for licensure must pass the examination in clean needle technique administered by the Council of Colleges for Acupuncture and Oriental Herbal Medicine, or its successor.
I tried doing this with the following code, but I am not sure if there is a more sophisticated way to detect both styles or font effects and apply the changes as needed. Any help would be appreciated:
Document doc = new Document("Strikethrough.docx");
Document cloneDoc = (Document)doc.deepClone(true);
doc.joinRunsWithSameFormatting();
for (Run run : (Iterable<Run>)doc.getChildNodes(NodeType.RUN, true))
{
if (run.getFont().getUnderline() == Underline.SINGLE)
{
System.out.println("UNDERLINE =>" + run.getText());
FindReplaceOptions options = new FindReplaceOptions();
options.getApplyFont().setUnderline(Underline.NONE);
cloneDoc.getRange().replace(run.getText(), run.getText(), options);
}
if (run.getFont().getStrikeThrough())
{
System.out.println("STRIKETHROUGH =>" + run.getText());
cloneDoc.getRange().replace(run.getText(), "");
}
}
cloneDoc.save("output.docx");