Good day!
I’m using the Aspose.Words 15.11.0 library. And I have exotic task: I need to specify formatting of separate tokens (words) in document. The problem at first is to find necessary word, so I used the following code to deal with it:
public class LicenseMarker {
private class LicenseMarkerMetaData {
public Run markerRun;
public Run afterRun;
int shift = 0; //internal shifting when run objects are changed
int lastRunId = -1; //last runId for resetting shifting
int jump = 0;
public LicenseMarkerMetaData(Run markerRun, Run afterRun, int shift, int lastRunId, int jump) {
this.markerRun = markerRun;
this.afterRun = afterRun;
this.shift = shift;
this.lastRunId = lastRunId;
this.jump = jump;
}
}
public void markODFs(Document doc, Token[] licenseTokens, String editorialPath) throws Exception
{
NodeCollection runs = doc.getChildNodes(NodeType.RUN, true);
LicenseMarkerMetaData lmmt = new LicenseMarkerMetaData(null, null, 0, -1, 0);
for (Token t : licenseTokens) {
…
markToken(t, runs, StyleProps.ADDING, lmmt);
…
}
}
private void markToken(Token t, NodeCollection runs, StyleProps props, LicenseMarkerMetaData lmmt) {
lmmt.jump += (t.getIndex().getRunId() != lmmt.lastRunId) ? 2 : 0; //shifting by run collection
Run runNode = (t.getIndex().getRunId() != lmmt.lastRunId - lmmt.jump)
? (Run)runs.get(t.getIndex().getRunId() + lmmt.jump)
: lmmt.afterRun; //specifying run object
System.out.println(runNode.getText());
lmmt.shift = (lmmt.lastRunId == t.getIndex().getRunId()) ? lmmt.shift : 0; //shifting inside run for correcting token position
markRun(runNode, t.getIndex().getPos(), t.text().length(), props, lmmt);
lmmt.shift = t.getIndex().getPos() + t.text().length();
lmmt.lastRunId = t.getIndex().getRunId(); //memorize last run index for further processing
}
private void markRun(Run node, int tokenStart, int tokenLength, StyleProps props, LicenseMarkerMetaData lmmt) {
splitRun(node, tokenStart, tokenLength, lmmt);
StyleReservedPropsCollection.applyToRun(lmmt.markerRun, props);
}
///
/// Splits text of the specified run into three runs - before editing, after and editing run itself.
/// Inserts the new runs just after the specified one.
///
private void splitRun(Run run, int position, int length, LicenseMarkerMetaData lmmt) {
try {
int pos = position - lmmt.shift;
Run result = (Run)run.deepClone(true);
try {
result.setText( run.getText().substring(pos, pos + length) );
} catch (StringIndexOutOfBoundsException e) {
System.out.println(run.getText().substring(pos));
System.out.println(pos + ", " + (pos + length));
e.printStackTrace();
}
Run after = (Run)run.deepClone(true);
after.setText( run.getText().substring(pos + length) );
run.setText( run.getText().substring(0, pos) );
run.getParentNode().insertAfter(result, run);
run.getParentNode().insertAfter(after, result);
lmmt.markerRun = result;
lmmt.afterRun = after;
} catch (Exception e) {
e.printStackTrace();;
}
}
}
Where Token is the inner class represented word in document that contains index of com.aspose.words.Run object and StyleProps is inner class too represented some colors, font settings, etc. But this code is failing on some documents because Run is unreliable objects and it is not rare situations when there are couple of pieces of text at the same paragraph with exactly same formatting but in separate Run instances (Note: I used the Paragraph.joinRunsWithSameFormatting() method but it didn’t save me). And that is without regard to other problems.
So, I have following question: is there any way to do what I want more safety, more reliably and more easily and desirely independent from Runs. May be based on character formatting, may be based on paragraphs…
Unfortunately, it difficult to give real data because document is insufficient - there are must be tokenization data in addition. But if it is critical, I’ll try to form them.
Thanks!
Thanks!
P.S. It would be great to base that formatting on some absolute positions in text, so I looked at AbsolutePositionTab, SpecialChar, Inline classes but didn’t understand their assignment.
P.P.S. I’m working with .odt format essentially, but it’s not difficult to work with .docx or .doc, if it helps.