PDF header signature not found

markmz1 · May 28, 2014, 5:18pm

We are using another tool that creates an image from a scanner from our web application. That image is then passed from the client to the server and to our Aspose logic as a stream. Then we create an Aspose Pdf document from the stream, add a FreeTextAnnotation and save the file to disk. Out of the 4,000 or so documents we have created, around 10 have been corrupt and we are unable to open with Adobe reader for example. We did find that Chrome was able to view the documents though. After some debugging using itextsharp, we see the error PDF Header signature not found. When we open up the documents in notepad, the first few bytes look off. Generally the first few characters in the pdf file begin with something like “%PDF-1.6” (without the quotes). The corrupt files are missing the version number and look something like “%PDF “.

Just wondering if you have any ideas on why the files would be missing the needed header information. Aspose pdf was able to open up the corrupt files and then we copied the pages to a new pdf document and saved and everything is good.

tilal.ahmad · May 29, 2014, 11:08am

Hi Mark,

We are sorry for the inconvenience caused. We will appreciate if you please share your sample code and the source image causing issue along with your Aspose.Pdf API version, so we will investigate the issue at our end and will suggest you accordingly.

Best Regards,

AsposeUser44 · August 21, 2014, 2:48pm

Did anything come of this? I have a similar issue with the same error message.

markmz1 · August 21, 2014, 4:52pm

We never did figure out exactly what was causing the issue. We rewrote the logic that processes the files and we are no longer receiving the error. We are scanning the images in using Dynamsoft for the browser. We changed our logic to save the file to disk when it is initially scanned, instead of saving it later after processing with OCR, etc.

As I was initially researching, I came across some logic that uses iTextSharp pdf reader. When opening the file, the corrupt files throw an error. We then open up those corrupt pdfs with Aspose, copying each page to a new document..

Here are the guts of the program if you want to try it. Note that it does use iTextSharp dll which is a available for free if you need it. I believe you can just use Aspose for everything (and call one of the validate methods to find the corrupt files).

// loop through all files in the directory, move the bad files to the badFileDir
// then loop below through those files, creating new pdf for each bad file
foreach (var f in Directory.GetFiles(dirName))
{
var pdfErrorMsg = "";
try
{
using (var pdfFileReader = new iTextSharp.text.pdf.PdfReader(f))
{
pdfFileReader.Close();
}
}
catch (Exception ex)
{
pdfErrorMsg = ex.Message;
var source = f;
var dest = Path.Combine(Path.GetDirectoryName(f), "Corrupt", Path.GetFileName(f));
File.Copy(source, dest);
}
}

Aspose.Pdf.License pdf = new Aspose.Pdf.License();
pdf.SetLicense(@"Aspose.Total.lic");
foreach (var f in Directory.GetFiles(badFileDir))
{
Console.WriteLine(f);
var pdfDoc = new Aspose.Pdf.Document(f);
var pdfDoc2 = new Aspose.Pdf.Document();
foreach (Page page in pdfDoc.Pages)
{
pdfDoc2.Pages.Add(page);
}
var dest = Path.Combine(Path.GetDirectoryName(f), "Fixed", Path.GetFileName(f));
pdfDoc2.Save(dest);
}

tilal.ahmad · August 22, 2014, 9:47am

Hi Mark,

Thank you very much for your feedback. Hopefully the workaround will work for Chris. However to fix the issue in Aspose.Pdf API, we will appreciate it if you or Chris share a sample code to generate the problematic PDF at our end, so we will investigate the issue and resolve it.

Looking forward to your sample code to replicate the issue.

Best Regards,