We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extracting text from html file


I would like to know how to extract text from an html file.

This is what I am doing right now:

Aspose.Words.Document doc = new Aspose.Words.Document(c:\myfile.html);
string text = doc.ToTxt();

My problem is that I am getting multiple lines - i.e. some of the lines in the page are returned multiple times.

Please help.

Thank you,

Adam Porat
Senior developer
Niloos Software ltd.


<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your request. Could you please attach your HTML document here for testing? I will check it and provide you more information.

Best regards.