Pre Sales Question : Aspose.PDF Fuctionality Question

Is it possible to do the following with the library?

  1. Detect if a PDF has any scripts within the PDF file?
  2. Detect if a PDF has images and if so extract those images for OCR processing?
  3. If a PDF is saved as an image perform OCR on the text in a PDF?

@sulox45

Thanks for contacting support.

Aspose.PDF offers a features to detect field actions and remove JavaScript from PDF documents. Please consider following code snippet:

Document doc = new Document(dataDir + "JSPopupCalendar.pdf");
foreach (Field f in doc.Form.Fields)
{
 var actions = f.Actions;
 if (actions != null)
 {
  // do some stuff
 }
}
Facades.PdfJavaScriptStripper pjss = new Facades.PdfJavaScriptStripper();
bool val = pjss.Strip(dataDir + "JavaLink.pdf", dataDir + "JSPopupCalendar_out.pdf");

Aspose.PDF offers features to extract images from PDF documents and later you can perform OCR on images using Aspose.OCR for .NET. Furthermore, you can also convert PDF pages to images in order to perform OCR over them.

In case you need further assistance, please feel free to let us know.