Image to PDF to Searchable PDF

Hello,

I’m looking at the solution posted here:

This would POSSIBLY help me trying to do OCR on an image to ultimately create a searchable PDF. I have the code working just fine. I’m trying to determine if the searchable PDF will retain the same layout as the original Image. I look at the out.html produced by the code and it looks like carriage returns are added so the text is not in the same layout. Is the out.html how the searchable PDF will look or will it look more like the image??

I have downloaded the trial version of Aspose.PDF. The only reason our company would buy Aspose.PDF at time would be if it works for what I am trying to do. The trial version, however, has a limitation of 4 text fragments added. I can’t figure out the answer to my question above with this limitation.

Is there anyway I can get pass this limitation before buying?? If it works, we will gladly buy as we use Aspose.cells and love it.

I’m looking at the solution posted here:

This would POSSIBLY help me trying to do OCR on an image to ultimately create a searchable PDF. I have the code working just fine. I’m trying to determine if the searchable PDF will retain the same layout as the original Image. I look at the out.html produced by the code and it looks like carriage returns are added so the text is not in the same layout. Is the out.html how the searchable PDF will look or will it look more like the image??

Hi Cheryl,

Thanks for using our API’s.

Can you please share the source / sample PDF file so that we can test the scenario at our end. Furthermore, please note that in above stated approach, we are using a third party component to perform OCR on image.

Meanwhile, you may first consider converting PDF pages to Image format and then perform OCR on Image file using Aspose.OCR for .NET. For more information, please visit

I have downloaded the trial version of Aspose.PDF. The only reason our company would buy Aspose.PDF at time would be if it works for what I am trying to do. The trial version, however, has a limitation of 4 text fragments added. I can’t figure out the answer to my question above with this limitation.

Is there anyway I can get pass this limitation before buying?? If it works, we will gladly buy as we use Aspose.cells and love it.

You may consider requesting a 30 days temporary license to test our API’s without any limitations. For more information, please visit Get a temporary license

Thank for the reply. I did get the temporary license for Aspose.PDF and this will work for what I need it to do. Our company will be ordering it.

I am going with the 3rd party tesseract solution as I had this already written and now I can see it works! I have a question though about converting an image to a PDF.

I’m using this code:
Public Sub ConvertImageToPDF(ByVal imgFilename As String, ByVal newFilename As String)
Dim pdf1 As Aspose.Pdf.Generator.Pdf = New Aspose.Pdf.Generator.Pdf()
’ Create a section in the Pdf object
Dim sec1 As Aspose.Pdf.Generator.Section = pdf1.Sections.Add()

'Create an image object in the section
Dim image1 As Aspose.Pdf.Generator.Image = New Aspose.Pdf.Generator.Image(sec1)
'Add image object into the Paragraphs collection of the section
sec1.Paragraphs.Add(image1)
'Set the path of image file
image1.ImageInfo.File = imgFilename
'Set the type of image using ImageFileType enumeration
image1.ImageInfo.ImageFileType = Aspose.Pdf.Generator.ImageFileType.Tiff

'Save the Pdf
pdf1.Save(newFilename)
End Sub

It’s working but the image in the resulting PDF is not filling up the whole page. Is there a way to make the image fit to page? I tried image1.ImageScale = 100 (for 100%) but this didn’t make a difference and I’m not sure what the value it suppose to be.

I tried different imageinfo properties as well.

Suggestions?

Thanks,
Cheryl


I got it working using the code from this forum post:

Fit Image to page


Hi Cheryl,


We are glad to hear that your requirement is accomplished. Please continue using our API and in the event of any further query, please feel free to contact.