Extracting html from doc section by section


#1

I need to open a word doc and copy each section, one at a time, saving the section to memory stream as html.
I then want to get the html and manipulate it.

I tried this code:

Document doc = new Document(“C:\output.doc”);
Document doc2 = new Document();

foreach (Section sec in doc.Sections)
{
Section section2 = sec.Clone();
doc2.Sections.Add(section2);
MemoryStream sw = new MemoryStream();
doc2.Save(sw, SaveFormat.FormatHtml);

string str = Encoding.ASCII.GetString(sw.GetBuffer());
MessageBox.Show(str);
}

But this isn’t working at all.
Aspose gives an ArgumentOutOfRangeException error.

Any ideas?


#2

If it throws inside Aspose.Word, please send the document to me to word@aspose.com and we will have a look.


#3

I managed to get this working by cloning the document and clearing the sections from the clone.
I could then clone sections from original doc into the new (cloned) doc and save out as html.