Unable to extract text from word document


#1

test3.pdf (280.4 KB)

Hi,
Want to extract text from Pdf line by line. For that I am converting Pdf to Word doc and then getting text. Except for the attached Pdf code is working fine for all the Pdf. Can you please help me with this? Below is the code

        Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(fileName);
        string docxFilePath = filePath + "\\" + fileID + ".docx";
        // Instantiate DocSaveOptions object
        Aspose.Pdf.DocSaveOptions saveOptions = new Aspose.Pdf.DocSaveOptions();
        // Specify the output format as DOCX
        saveOptions.Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX;
        // Save document in docx format
        pdfDocument.Save(docxFilePath, saveOptions);

        Aspose.Words.Document doc = new Aspose.Words.Document(docxFilePath);

        // Create an object that inherits from the DocumentVisitor class.
        AsposeWordDocToTxtWriter myConverter = new AsposeWordDocToTxtWriter();
        doc.Accept(myConverter);
        pdfText = myConverter.GetText();

        words = pdfText.Split('\n');
        for (int j = 0, len = words.Length; j < len; j++)
        {
            //getting line here
        }

#2

@manasiak

Could you please attach your Word document here for testing? We will investigate the issue on our side and provide you more information.


#3

wordNotWorking.zip (437.1 KB)

Thanks for your reply. Please find attached Zip file.


#4

@manasiak

Thanks for sharing the document. Your issue is more related to Aspose.PDF. The text of PDF is exported in the form of images. We are investigating this issue and will get back to you soon.


#5

Thanks for your reply. Please let me know once it is resolved.


#6

@manasiak

Thank you for contacting support.

We have been able to reproduce the problem when PDF is converted to DOCX with Aspose.PDF for .NET API. A ticket with ID PDFNET-46587 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.