Convert TIFF to PDF with text available

Hello,

We are facing to a problem with OCR Aspose package.

We need to transform TIFF files into PDF file, with text part available for reseraching words.

We have found some posts on this point (like Convert TIFF + OCR text to Searchable PDF)

but the code is not given.

For the moment, the text can be retrieve in a string object, but not in the final PDF file (Recognition|Documentation).

Could you please give us a solution to convert TIFF in PDF with text searchable ?

Thank you.

Regards.

Alexis.

Hi Alexis,

Thank you for your inquiry.

Currently, there is no direct way to convert TIFF files into searchable PDF document. However, in order to accomplish this requirement, you may use combination of Aspose.OCR for .NET and Aspose.Pdf for .NET.

First, perform OCR on the TIFF files using Aspose.OCR for .NET to extract the text. Once the image contents have been recognized, you may place them inside a new PDF document, which indeed will be a searchable PDF document. Before saving the final output, you may set the PDF compliance to PDF/A format. For further details, please visit:

Add Text in an Existing PDF File (here you can create new document)

Convert Text File to PDF Format

Convert PDF File to PDF-A

Hope the above information helps. In case of any issues, need further clearance please be sure to let us know, we will be glad to assist you.

Hi Ikram,

The OCR function seems to be returning a wrong result.
Please find attached a tif file, with some text, and a text file containing the OCR result.
Hereafter, the code snippets used to call OCR function :

OcrEngine ocrEngine = new OcrEngine();
ocrEngine.Image = ImageStream.FromFile(@“C:\Temp\test.tif”);
String text = “”;
if (ocrEngine.Process())
{
text = ocrEngine.Text.ToString();
File.Create(@“C:\Temp\test.txt”).Dispose();
using (StreamWriter tw = new StreamWriter(@“C:\Temp\test.txt”, true))
{
tw.Write(text);
tw.Close();
}
}

Thank you for your answer.

Best regards.

Alexis.
Hi Alexis,

Thank you for your inquiry.

We have evaluated the attached image on our end. We have used the latest version of Aspose.OCR for .Net 2.7.0. While testing it was found that the images provided by you has very low DPI value i.e. 96. Please note that the current implementation of the Aspose.OCR APIs perform well with images having resolution of at least 300 DPI and the accuracy rate tends to decrease by decreasing the resolution. Your provided image has resolution of 96 DPI therefore it will not be possible to get 100% accuracy if you wish to scan the complete image. On the other hand, if you intend to get some specific contents from a portion of the image, you can use the custom recognition blocks to get better accuracy.

Please note, the above mentioned solution is useful in scenario when you have documents following the similar structure, that is; the contents to be scanned are always on the same location for each image.

Consider the following code snippet that we used to extract information from image provided by you using custom recognition blocks.
string imageFile = @"C:\Ctrash\ocr_files\test.tif";
OcrEngine ocrEngine = new OcrEngine();
ocrEngine.ClearNotifies();
ocrEngine.Config.ClearRecognitionBlocks();

ocrEngine.Config.AddRecognitionBlock(RecognitionBlock.CreateTextBlock(43, 67, 1355, 52));
ocrEngine.Config.AddRecognitionBlock(RecognitionBlock.CreateTextBlock(757, 561, 471, 123));

ocrEngine.Config.DetectTextRegions = false;
ocrEngine.Image = ImageStream.FromFile(imageFile);
if (ocrEngine.Process())
{
Console.WriteLine(ocrEngine.Text);
File.Create(@"C:\Ctrash\ocr_files\test.txt").Dispose();
using (StreamWriter tw = new StreamWriter(@"C:\Ctrash\ocr_files\test.txt", true))
{
tw.Write(ocrEngine.Text);
tw.Close();
} }

Following is the Output:

MLG DOOR HINGE AND ACTUATOR FITTINGS FAILURES
Mitigation l lnterim Action
lSB 53-1 195
lSB 53-1 196


Hope the above information helps. In case of any issues, need further clearance please be sure to let us know, we will be glad to assist you.