How to convert this type of scanned pdf to editable word?

Hi,Support:

Here is a scanned pdf sample document which is expected to be converted to editable word document. How to use Aspose.pdf.dll to convert it?
Thanks for your help.
009.pdf (1001.0 KB)

@ducaisoft
We are looking into it and will be sharing our feedback with you shortly.

@ducaisoft

We are afraid that Aspose.PDF does not provide this feature to convert a scanned PDF into searchable and editable Word document. You can however generate a searchable PDF document using Aspose.OCR and then convert it into Word document:

Scanned PDF to Searchable PDF Conversion

Aspose.OCR.AsposeOcr api = new Aspose.OCR.AsposeOcr();

Aspose.OCR.OcrInput ocrInputPdf = new Aspose.OCR.OcrInput(Aspose.OCR.InputType.PDF);
ocrInputPdf.Add(dataDir + "TestObject_Page1.pdf");
List<Aspose.OCR.RecognitionResult> resultPdf = api.Recognize(ocrInputPdf, new Aspose.OCR.RecognitionSettings { DetectAreasMode = OCR.DetectAreasMode.DOCUMENT });
Aspose.OCR.AsposeOcr.SaveMultipageDocument(dataDir + "searchablePdf.pdf", Aspose.OCR.SaveFormat.Pdf, resultPdf);
Aspose.OCR.AsposeOcr.SaveMultipageDocument(dataDir + "searchablePdfNoImg.pdf", Aspose.OCR.SaveFormat.PdfNoImg, resultPdf);

OCR’d PDF to Word Conversion

Document pdfDocument = new Document(dataDir + @"ImageTest.pdf");

foreach (var page in pdfDocument.Pages)
{
    TextFragmentAbsorber absorber = new TextFragmentAbsorber();
    absorber.Visit(page);
    foreach (TextFragment fragment in absorber.TextFragments)
    {
        fragment.TextState.RenderingMode = TextRenderingMode.FillText;
        fragment.TextState.Font = FontRepository.FindFont("Arial");
    }
    page.Resources.Images.Clear();
}

DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.Format = DocSaveOptions.DocFormat.DocX;
saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow;
saveOptions.RelativeHorizontalProximity = 2.5f;
//saveOptions.RecognizeBullets = true;
pdfDocument.Save(dataDir + "ImageTest.docx", saveOptions);

Thanks!
However, it doesn’t work properly based on VB.net Dev,and the used OCR.dll API is version 23.4 and based on Net 4.6.1.
Please refer to the attached image,and the errors are that:
Aspose.OCR.RecognitionResul is undefined
Aspose.OCR.SaveFormat.PdfNoImg is not a member of Aspose.OCR.SaveFormat.

How to fix it ?

111.jpg (143.9 KB)
2222.jpg (124.7 KB)
the above exception in the above second image is that the system.memery file could be loaded.

@ducaisoft

Please try using 24.3/24.4 versions of the APIs as these are the latest ones and contain these classes. In case you notice any issues, please let us know.

Please see the attachment.
There is still error thrown!

123.jpg (84.2 KB)

@ducaisoft

Is it possible that you could share a sample console application project for our reference so that we could test in our environment and address the issue accordingly?

the API’s version is v24.4 which is the same as your dev.

@ducaisoft

It looks there is some difference in the project configuration on you end as we are not noticing this issue. That is why we requested for a sample console application.