I searched the forums and the latest reply about corrupt PDF detection was back in 2015.
I’m using Aspose.PDF 18.2 for .NET 4.0 and I’m parsing random PDFs and I’ve noticed that it will load a malformed / corrupt PDF and not throw any exceptions for most malformed PDFs I’ve encountered or created (by overwriting bytes in a hex editor etc.)
I have a requirement to be able to detect and reject corrupt PDFs (PDFs that will not render in say Acrobat).
Is there a method or approach I could take using Aspose.PDF to enumerate the contents of the PDF somehow and detect corruption (either in a try/catch looking for exceptions or through some property on the object model that indicates there were validation failures?)
Thanks in advance for any advice / pointers you can give me here.
I would like to share with you that you can check whether the source input is a valid PDF file or not, by using IsPdfFile property as in the code sample below:
PdfFileInfo info = new PdfFileInfo(dataDir + "Sample Response.txt");
if (info.IsPdfFile)
{
Console.WriteLine("Valid PDF file");
}
else
{
Console.WriteLine("Invalid PDF file");
}
I hope this will be helpful. If this does not satisfy your requirements then please share corrupted PDF file with us so that we may investigate to help you out.
Thanks - it doesn’t really satisfy the requirements but i don’t think that’s your fault.
I’m looking for a fairly fool proof method of detecting corrupt PDFs but that’s sort of a vague requirement.
For example if an image inside of a PDF is corrupt the PDF will render but may display an error in Acrobat when you flip to the page with the image.
What I ended up doing was a multi-pronged approach.
I did your recommendation above as my first check
Then I tried to extract all text as my second check
Then I tried converting the entire PDF to a TIFF as my third check
I may have done some random other things as well.
I don’t think there’s a single silver bullet here.
Would you please share the source PDF files you are referring to by mentioning “detecting corrupt PDFs”. Please also mention the problems you want to detect in those specific files so that we may investigate further to help you out.
By uploading, would you please share what you actually meant? Are you trying to upload the PDF file to some server? OR you just want to determine whether its valid or not?
Thank you for your response.
I am trying to determine valid or not
we have one more issue, We are using below code to check pdf content has any format exception, some pdfs tagged content is timing out. Please let me know why or do you have any other solution to check the content of pdf is valid or not.
We tested the case using 22.9 version of the API and did not notice any issue. The API returned false for the property pdfFileInfo.IsPdfFile. Can you please make sure to use 22.9 version and let us know in case you still face any issues?
Can you please share a sample PDF for this case as well?
You can share your file in a private message. You can click on the top left button in post editor to convert your post into a private message where you can share your file. We will further proceed to accordingly. image.png (8.8 KB)
We were able to notice this issue in our environment. Therefore, it has been logged under the ticket ID PDFNET-52742 in our issue tracking system for further investigation. We will look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.
The ticket has recently been logged in our issue tracking system and is pending for initial analysis. We will investigate and resolve it on a first come first serve basis. We are afraid that we cannot comment further without determining the actual cause of the issue. As soon as we make some definite progress towards ticket resolution, we will share updates with you via this forum thread. Please spare us some time.
We are sorry for the inconvenience.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
Enables storage, such as cookies, related to analytics.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.