Get total number of documents based on a particular phrase/word


#21

@Kushal.20,

Both Aspose.Words for Java and Aspose.PDF for Java APIs have Document classes. To avoid any conflicts, please create Document instances like this:

com.aspose.pdf.Document pdfDoc = new com.aspose.pdf.Document();
com.aspose.words.Document wordDoc = new com.aspose.words.Document();

Hope, this helps.


#22

Yeah, I also thought of using this way only.
Thanks, it is working with this now.
Thanks ! @awais.hafeez.

Now, am working further upon this feature only and will try to achieve some more functionalities. If I get stuck anywhere, I’ll write it to you !
Thanks once again for supporting ! :slight_smile:


#23

@awais.hafeez, Hi !
I am working on the same feature that we discussed above. I am done with getting the list of documents, containing the keyword searched for.
Now, the point that comes into light is, what if I have password protected files too!

I know how to unprotect files. I even applied it to my searching application code. But, the question that arises is, not all the files will have same passwords, so while searching we cannot supply passwords for all the files, right?
I used the following code (suppose for word) :

            String strFind = "Test";
	int count =0;
	File[] files = new File("E:\\docs").listFiles();
	for (File file : files) {
	    if (file.isFile()) {
	    	String folderName = file.getParent();
	        String fileName =  file.getName();
	        String extensionName = fileName.substring(fileName.lastIndexOf("."));

if(extensionName.equals(".doc") || extensionName.equals(".docx")) {
//System.out.println("Processing document: " + fileName);
String pass = “12345”;
FileFormatInfo fft = FileFormatUtil.detectFileFormat(file.getAbsolutePath());

			LoadOptions loadOps = new LoadOptions();
				loadOps.setPassword(pass);
				com.aspose.words.Document wordDoc = new 
                                    com.aspose.words.Document(file.getAbsolutePath(), loadOps);
				System.out.println("Opened Successfully with the Password:" + pass);

				
	            FindReplaceOptions options = new FindReplaceOptions();
	            ReplaceEvaluator callback = new ReplaceEvaluator();
	    		options.setReplacingCallback(callback);
	    		
	    		// We want the "your document" phrase to be highlighted.
	    		Pattern regex = Pattern.compile(strFind, Pattern.CASE_INSENSITIVE);
	    		wordDoc.getRange().replace(regex, strFind, options);
	    		int countWord = callback.mMatchNumber;
	    		if(countWord > 0) {
	    			//System.out.println("Folowing documnets contain the phrase : '"+regex+"'");
	    			System.out.println("E:\\"+file.getName()+" || Count="+countWord);
	    		}
	    		else {
	    			System.out.println("No document containing '" +regex+ "' exists");
	    		}

	        }

But, here the password 12345 can only be checked for. But, let’s consider the scenario where I’ll be having 100s or even 1000s of files, of which nobody know how many would be protected. Because, I only have the option to search for a keyword and get the list of the documents containing those words.So, in that case what to do ?

Is there any way out, that protected files too get read and scanned for that word and listed if they have that word, just like the normal files, without supplying any password?

Hope, you understood my concern!
Thankyou !


#24

@Kushal.20,

Please ZIP and upload your sample password protected Word document (along with the password string) here for testing. We will then investigate the scenario on our end and provide you more information.


#25

@awais.hafeez
I hope you got my point.
I am not talking about a single file. There could be any no. of files in the directory. So, it’s not about any single file.
Am just saying. that what I am doing now is, getting the list of all the documents existing in my directory for the searched keyword. For now, the files which are protected are not searched upon and I just get the exception (Invalid Password). So, what I want is that even the protected files should get scanned and returned in the result if they meet the specified criteria.
For achieving this, I tried the code that I already shared

With this code, I have supplied the password, that I just knew for one of the files.
But, all the files won’t be having the same password, right ? and neither, I would be knowing the passwords for any of the files in the user’s directory (so I even can;t pass any password,like I did in the code above).
So, the requirement is that all the password protected files too could be scanned without supplying the passwords, as I stated the reason for we can’t supply the password for X number of files.

Anyways, as per your requirement, am attaching the zip file , for which the password is 12345. But, I don’t think so that this is required. Test Docx Protected.zip (14.4 KB)

Hope, It’s more clear now.
Waiting for some positive outcome.
Thanks for your co-operation ! :slight_smile:


#26

@Kushal.20,

We are investigating this and will get back to you with feedback soon.


#27

@Kushal.20,

Your document has no protection but it is ‘encrypted’ with password that is why Microsoft Word asks for a password prior opening it. You should simply specify the password to open the encrypted document by using Aspose.Words. If you do not know the password, I am afraid, you will not be able to open/scan this document by using Aspose.Words. Here is how you can load this document into Aspose.Words’ DOM (document object model):

com.aspose.words.LoadOptions opts = new com.aspose.words.LoadOptions();
opts.setLoadFormat(LoadFormat.DOCX);
opts.setPassword("12345");
Document doc = new Document("E:\\temp\\Test Docx Protected\\Test Docx Protected.docx", opts);