Search text in XPS document using C# with Aspose.Page for .NET

What would be the best way to use the Aspose.XPS api to search xps files for text?

@Brian_THOMAS

Thanks for contacting support.

Currently searching text support is not included in the API and we are not sure if we can add this functionality to the API in near future. As text is added in the XPS documents using Gylph and it is complex task to implement search support for Gylph elements. An investigation ticket as XPSNET-16 is already logged in our issue tracking system and we will further work to check the feasibility of the feature.

However, would you please share a bit more about your requirements like, how and why you want to search text in XPS files? Do you want to perform Search and Replace operation over the documents?

I was hoping you’d implement the nice Visitor pattern stuff like you did in the Aspose.Words API.

We might want to search text in order to do redaction - removal and replacement with a black rectangle or family of rectangles.

Or to find a keyword and replace it with a document id number assigned at print time.

Both the glcnd and xpsrchvw Windows apps can do text searching but they’re not at all fast. If you were to implement a fast text search it would be a good reason to write an Apose-based app to view xps content rather than rely on the Windows inbox apps.

@Brian_THOMAS

Thanks for providing more details about the requirements.

We have recorded your comments with the logged ticket and will definitely consider them during investigation. As soon as there are some updates regarding investigation progress, we will share with you. Please spare us little time.

We are sorry for the inconvenience.

I wondered whether I could iterate through all the XpsGlyphs in an XpsDocument. But it looks as though the XpsGlyphs object has no property or method that returns what the text encapsulated by the XpsGlyphs object actually is. So I have a question:

  • How can I visit and examine each piece of text on an Xps page?

@Brian_THOMAS

Thanks for your inquiry.

I am afraid that searching text over XPS page is not yet implemented and feature request is already logged in our system for the sake of implementation. We have also recorded your comments along with the logged ticket and will definitely provide our feedback against them once a significant progress is made towards ticket resolution. Please spare us little time.

We are sorry for the inconvenience.

@Brian_THOMAS

Please check following code snippet in order to iterate through all XpsGlyphs in a XPS Page.

private static void ShowText(XpsDocument doc)
{
  for (int i = 1; i <= doc.PageCount; i++)
       ShowText(doc.SelectActivePage(i));
}

private static void ShowText(XpsElement element)
{
 for (int i = 0; i < element.Count; i++)
   ShowText(element[i]);
 if (element is XpsGlyphs)
   System.Console.Out.WriteLine(((XpsGlyphs)element).UnicodeString);
}

In case you have any further concern, please feel free to let us know.

Here what is happening is I am getting the question mark (?) in place of “N” in Updated xps File. What is the reason behind that?

@acimcon

Unfortunately, we have lost data for certain posts in our forum due to the server downtime. Can you please share the attachments again?

Basic_Sample.zip (179.8 KB)

One More thing I have to ask that if the XPS file is more then 10 MB then it will take more then 15Mins to load the file so there is any thing which will decrease the loading time?
Attached xps files with different size.
Test File 17 MB.zip (9.8 MB)

@acimcon

About replacing the hyperlinks, we have responded to you in the original thread that you created. For the issue related to the file load time, an investigation ticket as PAGENET-479 has been logged in our issue tracking system for the sake of further analysis. We will look into its details and keep you posted with the status of its rectification. Please be patient and spare us some time.

We are sorry for the inconvenience.