Hash generation of the pdf file is different on every run why

archanaagrawal68gmai · May 20, 2014, 12:54am

hello,

I have split the pdf documents according to the pages and also with size.And when i apply the hash generation method (Message digest)or if i do the hash generation of that pdf file using SHA-256.I am getting different values for that pdf on every run.Why so.Do Pdf files have special techniques or methods for hash generation in java.Guide me.Thaniks in advance.

codewarior · May 20, 2014, 5:29am

Hi Archana,

Thanks for contacting support.

I have tested the scenario using one of my sample PDF files and as per my observations, same Hex code is being generated in each attempt. I have used the following code to test the scenario.

Can you please share the code snippet which you are using and also please share the resource files which can help us in replicating this issue at our end. We are sorry for this inconvenience.

[Java]

MessageDigest
md = MessageDigest.getInstance(“SHA-256”);<o:p></o:p>

java.io.FileInputStream fis = new java.io.FileInputStream("c:\\pdftest\\input.pdf");

byte[] dataBytes = new byte[1024];

int nread = 0;

while ((nread = fis.read(dataBytes)) != -1) {

md.update(dataBytes, 0, nread);

};

byte[] mdbytes = md.digest();

//convert the byte to hex format method 1

StringBuffer sb = new StringBuffer();

for (int i = 0; i < mdbytes.length; i++) {

sb.append(Integer.toString((mdbytes[i] & 0xff) + 0x100, 16).substring(1));

}

System.out.println("Hex format : " + sb.toString());

//convert the byte to hex format method 2

StringBuffer hexString = new StringBuffer();

for (int i=0;i<mdbytes.length;i++) {

hexString.append(Integer.toHexString(0xFF & mdbytes[i]));

}

System.out.println("Hex format : " + hexString.toString());

archanaagrawal68gmai · May 20, 2014, 5:42am

hey,
Thanks for the answer.But, my question is when i split the pdf file by pages.If my pdf file is of 11 pages,then using splitpdf my pages it will create 11 small pdf pages.When i apply the hash generation method on the split pdf pages mention as chunk in my code.I am getting different hash values, instead of same hash value on every run of the same pdf.Also I have attach the pdf file.Please, use your splitPdf into individual pages method first and then apply the hash generation method.
code:-
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.io.FileInputStream;
import java.io.IOException;
import java.security.;
import java.lang.;
public class Digest
{

public static void main (String args[]) throws IOException,NullPointerException, NoSuchAlgorithmException
{
String fingerprint = null;
MessageDigest md = MessageDigest.getInstance(“SHA-256”);
// String datafile=“E:\DataDedoop\17.Deduplication.Estimation.pdf”;
String datafile = “E:\NEW_TEST\chunks\chunk1.pdf”;
FileInputStream fis = new FileInputStream(datafile);
byte[] dataBytes = new byte[1024];
int nread = 0;
while ((nread = fis.read(dataBytes)) != -1) {
md.update(dataBytes, 0, nread);
}
byte[] mdbytes = md.digest();
StringBuilder hexString = new StringBuilder();
for (int i = 0; i < mdbytes.length; i++) {
hexString.append(Integer.toHexString(0xFF & mdbytes[i]));
}
fingerprint = hexString.toString();
System.out.println("Fingerprint : " + fingerprint);
}
}

codewarior · May 21, 2014, 2:27am

Hi Archana,

Thanks for sharing the details.

I
have tested the scenario and I am able to reproduce the same problem that different hash code is being generated for same file when executing the code for multiple times. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWJAVA-34221. We will
investigate this issue in details and will keep you updated on the status of a
correction. <o:p></o:p>

We apologize for your inconvenience.