Hi:
I want to find some text whether the word contains and the text coordinates in the word,but don’t replace them.For example, a word contains “xx”,there are many “xx” in the word.I want to get the positions where can uniquely determine the location of “xx“.
Can you give me a example?
@JohnKj You can use IReplacingCallback, LayoutCollector and LayoutEnumerator to achieve this. For example the following code wraps the matched text with bookmarks which are later used by LayoutCollector and LayoutEnumerator to calculate coordinates of the matched text:
Document doc = new Document(@"C:\Temp\in.docx");
// Use replacing callback to wrap the matched text with bookmakrs.
FindReplaceOptions opt = new FindReplaceOptions();
opt.Direction = FindReplaceDirection.Backward;
ReplaceEvaluatorWrapWithBookmark callback = new ReplaceEvaluatorWrapWithBookmark();
opt.ReplacingCallback = callback;
doc.Range.Replace("xx", "", opt);
// Create LayoutCollector and LayoutEnumerator to calculate coordinates of the matched text.
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
foreach (string bkName in callback.Bookmarks)
{
Bookmark bk = doc.Range.Bookmarks[bkName];
enumerator.Current = collector.GetEntity(bk.BookmarkStart);
RectangleF startRect = enumerator.Rectangle;
enumerator.Current = collector.GetEntity(bk.BookmarkEnd);
RectangleF endRect = enumerator.Rectangle;
RectangleF matchedTextRect = RectangleF.Union(startRect, endRect);
Console.WriteLine($"Page {enumerator.PageIndex}: {matchedTextRect}");
}
internal class ReplaceEvaluatorWrapWithBookmark : IReplacingCallback
{
/// <summary>
/// This method is called by the Aspose.Words find and replace engine for each match.
/// </summary>
ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
{
Document doc = (Document)e.MatchNode.Document;
// This is a Run node that contains either the beginning or the complete match.
Node currentNode = e.MatchNode;
// The first (and may be the only) run can contain text before the match,
// in this case it is necessary to split the run.
if (e.MatchOffset > 0)
currentNode = SplitRun((Run)currentNode, e.MatchOffset);
// This array is used to store all nodes of the match for further deleting.
List<Run> runs = new List<Run>();
// Find all runs that contain parts of the match string.
int remainingLength = e.Match.Value.Length;
while (
remainingLength > 0 &&
currentNode != null &&
currentNode.GetText().Length <= remainingLength)
{
runs.Add((Run)currentNode);
remainingLength -= currentNode.GetText().Length;
// Select the next Run node.
// Have to loop because there could be other nodes such as BookmarkStart etc.
do
{
currentNode = currentNode.NextSibling;
} while (currentNode != null && currentNode.NodeType != NodeType.Run);
}
// Split the last run that contains the match if there is any text left.
if (currentNode != null && remainingLength > 0)
{
SplitRun((Run)currentNode, remainingLength);
runs.Add((Run)currentNode);
}
// Generate an unique bookmakr name. Another approach can be used.
// If bookmark starts with undescore it is hidden in MS Word.
string bookmarkName = "_" + Guid.NewGuid().ToString();
while (doc.Range.Bookmarks[bookmarkName] != null)
bookmarkName += "_" + Guid.NewGuid().ToString();
// Insert a bookmakr around the matched text
BookmarkStart start = new BookmarkStart(doc, bookmarkName);
BookmarkEnd end = new BookmarkEnd(doc, bookmarkName);
runs[0].ParentNode.InsertBefore(start, runs[0]);
runs[runs.Count - 1].ParentNode.InsertAfter(end, runs[runs.Count - 1]);
Bookmarks.Add(bookmarkName);
// Signal to the replace engine to do nothing because we have already done all what we wanted.
return ReplaceAction.Skip;
}
private static Run SplitRun(Run run, int position)
{
Run afterRun = (Run)run.Clone(true);
run.ParentNode.InsertAfter(afterRun, run);
afterRun.Text = run.Text.Substring(position);
run.Text = run.Text.Substring((0), (0) + (position));
return afterRun;
}
public List<string> Bookmarks
{
get { return mBookmarks; }
}
private List<string> mBookmarks = new List<string>();
}
Can you give me a simple example? I don’t need to process bookmarks or others。I just want to get all the matchs and their unique position. I can’t quite understand the code above or whether are relatively simple methods?I don’t want to replace the maths.
The provided code does exactly this. As you may know MS Word documents are flow by their nature, so there is no information about node position on the page in the document. That is why the matched text is wrapped with bookmarks using IReplacingCallback
and then the bookmarks are used by LayoutCollector
and LayoutEnumerator
to calculate an exact position of the matched text. I am afraid there is no more simple way to achieve this.
There were some bookmarks which use to replace by other text inside the document.If I add new bookmarks, it will be confusing.What I really want to do is to check which revisions have been rejected in a document.For example,a document have many revisions ,Some are accepted, some are rejected.I want to know which are rejected and list the rejected revisions and the rejected’s anthors.Can you give me an example to achieve this goal?
You can remove the added bookmarks after calculating the matched text coordinates in the document.
When a revision is accepted or rejected it is not more a revision, it becomes a simple document content. So there is no way to find accepted and rejected revisions in the document.
The document which has less revisions and the document which is before opeeration(accept/reject),using comparation we can clearly see the differences by our eyes.We can also Infer how they change(accept or reject a revision) Can we get the differents by aspose.word? My idea is to review the revisions before opeeration and the revisions after opeeration . Some revisions will no longer exists.We can use the revised positions in the document to infer ,but I can’t get the revisied positions。I just get the authors of the revisions and the content of the revisions .Do you have other idea?
@JohnKj After comparing documents you can get list of revisions in the document or revision groups. You can use code like this to achieve this:
Document doc = new Document(@"C:\Temp\in.docx");
foreach (RevisionGroup group in doc.Revisions.Groups)
{
if (group.RevisionType == RevisionType.FormatChange)
Console.WriteLine(group.Text);
}