Hi,
We are trying to convert a PDF with OCR overlay from Tesseract (v5.2.0) to PDF/A-3B.
After the conversion the OCR-layer is gone.
We are using Aspose-pdf v 22.12 for Java
our code:
public void convert(InputStream inputPdf, OutputStream outputPdf, Optional<String> hocr) {
Document pdfDoc = new Document(inputPdf);
if(hocr.isPresent()){
pdfDoc.convert(bufferedImage -> hocr.get());
}
//pdfDoc.validate(new PdfFormatConversionOptions(PdfFormat.PDF_A_3B));
PdfFormatConversionOptions pdfConvertOptions = new PdfFormatConversionOptions(PdfFormat.PDF_A_3B);
pdfDoc.convert(pdfConvertOptions);
pdfDoc.save(outputPdf);
}
Example Input:
Tesseract-Result.pdf (23.1 KB)
Example Output from Aspose:
Aspose-result.pdf (27.8 KB)
Thanks
Didi