Hallo, I think, this is a bug.
What I am doing:
1.) Create 1 JPG for every page of a pdf (I controlled the result is ok)
2.) Convert every image to a single PDF. Here I tried 2 different ways:
a)
private ArrayList convertImagesToPdfs(ArrayList imageFiles)
{
ArrayList pdfFiles = new ArrayList<>();
AsposeOCR api;
String imageFile;
String outFile;
RecognitionResult res;
for (int i = 0; i < imageFiles.size(); i++)
{
try
{
imageFile = imageFiles.get(i);
System.out.println("OCR of " + imageFile);
outFile = imageFile + (i+1) + “.pdf”;
api = new AsposeOCR();
RecognitionSettings set = new RecognitionSettings();
set.setDetectAreas(false);
set.setLanguage(Language.Deu);
set.setAutoSkew(true);
res = api.RecognizePage(imageFile, set);
res.save(outFile, Format.Pdf);
System.out.println("Adding " + outFile);
pdfFiles.add(outFile);
} catch (Exception e)
{
e.printStackTrace();
}
}
return pdfFiles;
}
b)
private ArrayList convertImagesToPdfs(ArrayList imageFiles)
{
ArrayList pdfFiles = new ArrayList<>();
AsposeOCR api;
String imageFileDir;
String outFile;
ArrayList res;
RecognitionResult resOne;
if (imageFiles.size() > 0)
{
File f = new File(imageFiles.get(0));
imageFileDir = f.getParent();
try
{
api = new AsposeOCR();
RecognitionSettings set = new RecognitionSettings();
set.setDetectAreas(false);
set.setLanguage(Language.Deu);
set.setAutoSkew(true);
res = api.RecognizeMultiplePages(imageFileDir, set);
for (int i = 0; i < res.size(); i++)
{
resOne = res.get(i);
outFile = imageFileDir + “\” + (i + 1) + “.pdf”;
resOne.save(outFile, Format.Pdf);
System.out.println("Adding " + outFile);
pdfFiles.add(outFile);
}
} catch (Exception e)
{
e.printStackTrace();
}
}
return pdfFiles;
}
Result: All single-page PDF’s are created. BUT (!!!) every PDF is identical: It has the text from the first page. It seems to me, that the RecognitionResult is always the same.
I am using aspose-ocr-21.5.