We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Convert PDF from Tesseract with OCR Overlay

We are trying to convert a PDF with OCR overlay from Tesseract (v5.2.0) to PDF/A-3B.
After the conversion the OCR-layer is gone.

We are using Aspose-pdf v 22.12 for Java

our code:

    public void convert(InputStream inputPdf, OutputStream outputPdf, Optional<String> hocr) {
        Document pdfDoc = new Document(inputPdf);
            pdfDoc.convert(bufferedImage -> hocr.get());
        //pdfDoc.validate(new PdfFormatConversionOptions(PdfFormat.PDF_A_3B));
        PdfFormatConversionOptions pdfConvertOptions = new PdfFormatConversionOptions(PdfFormat.PDF_A_3B);

Example Input:
Tesseract-Result.pdf (23.1 KB)

Example Output from Aspose:
Aspose-result.pdf (27.8 KB)



An issue as PDFJAVA-42365 has been logged in our issue tracking system for further investigation. We will look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.