Convert PDF from Tesseract with OCR Overlay

We are trying to convert a PDF with OCR overlay from Tesseract (v5.2.0) to PDF/A-3B.
After the conversion the OCR-layer is gone.

We are using Aspose-pdf v 22.12 for Java

our code:

    public void convert(InputStream inputPdf, OutputStream outputPdf, Optional<String> hocr) {
        Document pdfDoc = new Document(inputPdf);
            pdfDoc.convert(bufferedImage -> hocr.get());
        //pdfDoc.validate(new PdfFormatConversionOptions(PdfFormat.PDF_A_3B));
        PdfFormatConversionOptions pdfConvertOptions = new PdfFormatConversionOptions(PdfFormat.PDF_A_3B);

Example Input:
Tesseract-Result.pdf (23.1 KB)

Example Output from Aspose:
Aspose-result.pdf (27.8 KB)



An issue as PDFJAVA-42365 has been logged in our issue tracking system for further investigation. We will look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

Dear all

Can you reproduce it, or do you need more information?
Do you have an update on this issue?


We are afraid that the investigation of the earlier logged ticket could not be completed due to other pending issues in the queue logged prior to it. Nevertheless, your concerns have been recorded and will be considered during ticket investigation. We will inform you as soon as we make some progress towards issue fix. Please spare us some time.

We are sorry for the inconvenience.