We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

How to convert html to text?

How to convert html to text?


I am using following code. But it is very very slow. It takes 2+ minutes to load html file.

FileStream fs = File.OpenRead(strHtmlFilePath);
StreamReader sr = new StreamReader(fs);
strHtml = sr.ReadToEnd();
Aspose.Words.Document doc = new Aspose.Words.Document();
Aspose.Words.DocumentBuilder builder = new Aspose.Words.DocumentBuilder(doc);
builder.InsertHtml(strHtml);
string text = doc.ToTxt();

Dear Customer,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for the inquiry. Please try to use the latest Aspose.Words v11.3.0 at your end and hopefully your issue will be resolved. If it still persist then please share source html document for further investigation. We'll take a closer look and guide you accordingly.

Best Regards,

Tilal

Hi,


Thanks for your inquiry. You just need to load HTML file into Aspose.Words.Docuemnt object and then get text as described here:
http://www.aspose.com/docs/display/wordsnet/Retrieving+Plain+Text

Please let me know if I can be of any further assistance.

Best Regards,

Thanks for help. Issue still exists.

Please use attached html file.

Dear Customer,

Thanks for sharing your sample file. Unfortunately , I’m unable to find any issue while converting your html file in text, generated in seconds. Please have a look to following code snippet, used for the conversion. Please feel free to contact us for any further assistance.

Stream stream = File.OpenRead(docpath);

Document doc = new Document(stream, new LoadOptions() { LoadFormat = LoadFormat.Html });

stream.Close();

MemoryStream saveStream = new MemoryStream();

doc.Save(saveStream, SaveFormat.Text);

StreamWriter writeStream = new StreamWriter(("...\\test out.txt"), true);

writeStream.Close();

<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Best Regards,

Tilal Ahmad

Conversion is fast. But actual problem is with following statement:


Document doc = new Document(stream, new LoadOptions() { LoadFormat = LoadFormat.Html });

It is taking 2+ minutes.

I am using Aspose.Words.dll for .NET 1.1 file version 10.7.0.0

Dear Customer,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Thanks for your query, as earlier requested please upgrade your Aspose.Word version to the latest one, i.e. 11.3.0. It will definitely solve the issue. Please feel free to contact us for any further assistance.

Regards,

Tilal

Thanks for help. It is working fine for most of html files. But issue with html file (I have attached in my prev post) still active.

Hi,


Thanks for the additional information.

We always encourage our customers to use the latest release versions of Aspose.Words as it contains newly introduced features, enhancements and fixes for issues reported earlier. Now concerning to the issue that you are facing, while using the latest versions of Aspose.Words i.e. 11.3.0, the average load time for your HTML document, for three test runs, was around 25 seconds on my side. Moreover, I have noticed that you’re heavily specifying web URLs in SRC attribute of IMG tags. Please note that, Aspose.Words automatically downloads these images during loading the document into the DOM. This image download may take some time. So, there is no problem with Aspose.Words.

Please let me know if I can be of any further assistance.

Best Regards,

Hello,
Could you tell me where can I found example for Visual Basic language?

Thanks

@Lion91,

I am afraid, we had removed Visual Basic language from GitHub Examples. We are currently maintaining only examples in C# language. But, you can easily convert C# examples to VB.NET yourself by using some converter. For example:
https://www.developerfusion.com/tools/convert/csharp-to-vb/

Ok,
Thank you very much, from now on I will follow the advice.