How to specify token based formatting

Good day!

I’m using the Aspose.Words 15.11.0 library. And I have exotic task: I need to specify formatting of separate tokens (words) in document. The problem at first is to find necessary word, so I used the following code to deal with it:

public class LicenseMarker {
private class LicenseMarkerMetaData {
public Run markerRun;
public Run afterRun;
int shift = 0; //internal shifting when run objects are changed
int lastRunId = -1; //last runId for resetting shifting
int jump = 0;
public LicenseMarkerMetaData(Run markerRun, Run afterRun, int shift, int lastRunId, int jump) {
this.markerRun = markerRun;
this.afterRun = afterRun;
this.shift = shift;
this.lastRunId = lastRunId;
this.jump = jump;
}
}
public void markODFs(Document doc, Token[] licenseTokens, String editorialPath) throws Exception
{
NodeCollection runs = doc.getChildNodes(NodeType.RUN, true);
LicenseMarkerMetaData lmmt = new LicenseMarkerMetaData(null, null, 0, -1, 0);
for (Token t : licenseTokens) {
markToken(t, runs, StyleProps.ADDING, lmmt);
}
}
private void markToken(Token t, NodeCollection runs, StyleProps props, LicenseMarkerMetaData lmmt) {
lmmt.jump += (t.getIndex().getRunId() != lmmt.lastRunId) ? 2 : 0; //shifting by run collection
Run runNode = (t.getIndex().getRunId() != lmmt.lastRunId - lmmt.jump)
? (Run)runs.get(t.getIndex().getRunId() + lmmt.jump)
: lmmt.afterRun; //specifying run object
System.out.println(runNode.getText());
lmmt.shift = (lmmt.lastRunId == t.getIndex().getRunId()) ? lmmt.shift : 0; //shifting inside run for correcting token position
markRun(runNode, t.getIndex().getPos(), t.text().length(), props, lmmt);
lmmt.shift = t.getIndex().getPos() + t.text().length();
lmmt.lastRunId = t.getIndex().getRunId(); //memorize last run index for further processing
}
private void markRun(Run node, int tokenStart, int tokenLength, StyleProps props, LicenseMarkerMetaData lmmt) {
splitRun(node, tokenStart, tokenLength, lmmt);
StyleReservedPropsCollection.applyToRun(lmmt.markerRun, props);
}
///
/// Splits text of the specified run into three runs - before editing, after and editing run itself.
/// Inserts the new runs just after the specified one.
///
private void splitRun(Run run, int position, int length, LicenseMarkerMetaData lmmt) {
try {
int pos = position - lmmt.shift;
Run result = (Run)run.deepClone(true);
try {
result.setText( run.getText().substring(pos, pos + length) );
} catch (StringIndexOutOfBoundsException e) {
System.out.println(run.getText().substring(pos));
System.out.println(pos + ", " + (pos + length));
e.printStackTrace();
}
Run after = (Run)run.deepClone(true);
after.setText( run.getText().substring(pos + length) );
run.setText( run.getText().substring(0, pos) );
run.getParentNode().insertAfter(result, run);
run.getParentNode().insertAfter(after, result);
lmmt.markerRun = result;
lmmt.afterRun = after;
} catch (Exception e) {
e.printStackTrace();;
}
}
}

Where Token is the inner class represented word in document that contains index of com.aspose.words.Run object and StyleProps is inner class too represented some colors, font settings, etc. But this code is failing on some documents because Run is unreliable objects and it is not rare situations when there are couple of pieces of text at the same paragraph with exactly same formatting but in separate Run instances (Note: I used the Paragraph.joinRunsWithSameFormatting() method but it didn’t save me). And that is without regard to other problems.
So, I have following question: is there any way to do what I want more safety, more reliably and more easily and desirely independent from Runs. May be based on character formatting, may be based on paragraphs…
Unfortunately, it difficult to give real data because document is insufficient - there are must be tokenization data in addition. But if it is critical, I’ll try to form them.

Thanks!

P.S. It would be great to base that formatting on some absolute positions in text, so I looked at AbsolutePositionTab, SpecialChar, Inline classes but didn’t understand their assignment.
P.P.S. I’m working with .odt format essentially, but it’s not difficult to work with .docx or .doc, if it helps.
Hi Dmitriy,

Thanks for your inquiry. You can achieve your requirement by implementing IReplacingCallback interface. Please read following documentation articles for your kind reference. Hope this helps you.


If you still face problem, please share your input document and text to which you want to apply formatting. We will then provide you more information on this along with code.

Hello! Thanks for tour reply!

I have already looked at IReplacingCallback interface. The problem is that I don’t want to replace occurences of some text, but I have set of words with specified positions in plain text and the task is to highlight words based on these positions and only then on text. In any case, I’ll try to provide some document example in the closest time to illustrate my problem.

Hi Dmitriy,

Thanks for your feedback. Please share following detail for investigation purposes.

  • Please share your input document and text to which you want to apply formatting.
  • Please create a standalone Java application (source code without compilation errors) that helps us reproduce your problem on our end and attach it here for testing.
  • Please attach the output Word file that shows the undesired behavior.
  • Please attach your target Word document showing the desired behavior. You can use Microsoft Word to create your target Word document. We will investigate as to how you are expecting your final document be generated like.

As soon as you get these pieces of information to us we'll start our investigation into your issue.

Good day! Thank you for your reply but we have found the problem solution by ourselves. Sorry to disturb you!