Converting HTML pages to formatted test

I have HTML pages that I would like to convert to text with the layout close to how it appears in HTML. Is this possible? I used the code below which was from a previous post:

Stream stream = File.OpenRead(args[0]);
Document doc = new Document(stream, new LoadOptions() { LoadFormat = LoadFormat.Html });
stream.Close();
MemoryStream saveStream = new MemoryStream();
doc.Save(saveStream, SaveFormat.Text);
FileStream writeStream = new FileStream(args[1], FileMode.OpenOrCreate, FileAccess.Write, FileShare.None);
byte[] buff = saveStream.ToArray();
writeStream.Write(buff, 0, (int)buff.Length);
writeStream.Close();

This extracts the text without regards to the pagelayout i.e. the text from a table is output formatted by each cell's contents with the cells laid down the page rather than across. Is there another way to do this?

Thank you,

Jeff

Hi Jeff,


Thanks for your inquiry. You can specify whether the program should attempt to preserve layout of tables when saving in the plain text format by using TxtSaveOptions.PreserveTableLayout Property. The default value is false. I hope, this helps.

Best regards,

Thank you, that gives me the information I was looking for.

Jeff