Hi Team,
We have to extract complete text from word and pdf documents. Is there any direct api to achieve this or I have to split the document in pages and then extract the contents.
Please help.
Thanks.
Hi Team,
We have to extract complete text from word and pdf documents. Is there any direct api to achieve this or I have to split the document in pages and then extract the contents.
Please help.
Thanks.
To get string of all the text in MS Word document, please use the following C# code of Aspose.Words for .NET API:
Document doc = new Document("input.docx");
string text = doc.ToString(SaveFormat.Text);
Or you can convert Word document to TXT format in memory and then obtain text representation of memory stream:
Document doc = new Document("input.docx");
MemoryStream stream = new MemoryStream();
doc.Save(stream, SaveFormat.Text);
stream.Position = 0;
string text = Encoding.UTF8.GetString(stream.ToArray());
Regarding extracting complete text from PDF documents, please refer to the following article:
Hope, this helps.