Hi,Support:
Here is a scanned pdf sample document which is expected to be converted to editable word document. How to use Aspose.pdf.dll to convert it?
Thanks for your help.
009.pdf (1001.0 KB)
Hi,Support:
Here is a scanned pdf sample document which is expected to be converted to editable word document. How to use Aspose.pdf.dll to convert it?
Thanks for your help.
009.pdf (1001.0 KB)
We are afraid that Aspose.PDF does not provide this feature to convert a scanned PDF into searchable and editable Word document. You can however generate a searchable PDF document using Aspose.OCR and then convert it into Word document:
Aspose.OCR.AsposeOcr api = new Aspose.OCR.AsposeOcr();
Aspose.OCR.OcrInput ocrInputPdf = new Aspose.OCR.OcrInput(Aspose.OCR.InputType.PDF);
ocrInputPdf.Add(dataDir + "TestObject_Page1.pdf");
List<Aspose.OCR.RecognitionResult> resultPdf = api.Recognize(ocrInputPdf, new Aspose.OCR.RecognitionSettings { DetectAreasMode = OCR.DetectAreasMode.DOCUMENT });
Aspose.OCR.AsposeOcr.SaveMultipageDocument(dataDir + "searchablePdf.pdf", Aspose.OCR.SaveFormat.Pdf, resultPdf);
Aspose.OCR.AsposeOcr.SaveMultipageDocument(dataDir + "searchablePdfNoImg.pdf", Aspose.OCR.SaveFormat.PdfNoImg, resultPdf);
Document pdfDocument = new Document(dataDir + @"ImageTest.pdf");
foreach (var page in pdfDocument.Pages)
{
TextFragmentAbsorber absorber = new TextFragmentAbsorber();
absorber.Visit(page);
foreach (TextFragment fragment in absorber.TextFragments)
{
fragment.TextState.RenderingMode = TextRenderingMode.FillText;
fragment.TextState.Font = FontRepository.FindFont("Arial");
}
page.Resources.Images.Clear();
}
DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.Format = DocSaveOptions.DocFormat.DocX;
saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow;
saveOptions.RelativeHorizontalProximity = 2.5f;
//saveOptions.RecognizeBullets = true;
pdfDocument.Save(dataDir + "ImageTest.docx", saveOptions);
Thanks!
However, it doesn’t work properly based on VB.net Dev,and the used OCR.dll API is version 23.4 and based on Net 4.6.1.
Please refer to the attached image,and the errors are that:
Aspose.OCR.RecognitionResul is undefined
Aspose.OCR.SaveFormat.PdfNoImg is not a member of Aspose.OCR.SaveFormat.
How to fix it ?
111.jpg (143.9 KB)Please try using 24.3/24.4 versions of the APIs as these are the latest ones and contain these classes. In case you notice any issues, please let us know.
Is it possible that you could share a sample console application project for our reference so that we could test in our environment and address the issue accordingly?
the API’s version is v24.4 which is the same as your dev.
It looks there is some difference in the project configuration on you end as we are not noticing this issue. That is why we requested for a sample console application.