I have HTML pages that I would like to convert to text with the layout close to how it appears in HTML. Is this possible? I used the code below which was from a previous post:
Stream stream = File.OpenRead(args[0]);
Document doc = new Document(stream, new LoadOptions() { LoadFormat = LoadFormat.Html });
stream.Close();
MemoryStream saveStream = new MemoryStream();
doc.Save(saveStream, SaveFormat.Text);
FileStream writeStream = new FileStream(args[1], FileMode.OpenOrCreate, FileAccess.Write, FileShare.None);
byte[] buff = saveStream.ToArray();
writeStream.Write(buff, 0, (int)buff.Length);
writeStream.Close();
This extracts the text without regards to the pagelayout i.e. the text from a table is output formatted by each cell's contents with the cells laid down the page rather than across. Is there another way to do this?
Thank you,
Jeff