Extracting Header and Footer Text as HTML

medtrans · October 26, 2007, 5:54am

Hi,
I want to extract header and footer as html, basically i m able to extarct them as text from range.Txt
method. As aspose.words document.save method writes the document converted html
to memory stream and we can use it later as string in our applications.
Is there a way to extract specific document section as html.
thanks

alexey.noskov · October 26, 2007, 6:34am

Hi
Thank you for your interest in Aspose products. Unfortunately, you can’t extract specific document section as html. But you can convert whole document to HTML and then parse it.
Best regards.

medtrans · October 26, 2007, 8:03am

hi,
I solved it by doing some workaround i m able to extract header, footer and maincontent as html strings
here’s the code for that

public void asposeReturnHtmlFromDoc(string docPath, ref string mainContent, ref string header, ref string footer)
{
    // open the document
    Document myDoc = new Document(docPath);
    Document myDocHeader = new Document(docPath);
    Document myDocFooter = new Document(docPath);
    try
    {
        // remove Only Header Footer From the Obj 
        myDoc.FirstSection.ClearHeadersFooters();
        // Remove All the Content
        foreach (Section sec in myDocHeader.Sections)
        {
            sec.Body.RemoveAllChildren();
        }
        // Remove Footer
        foreach (HeaderFooter myFooter in myDocHeader.FirstSection.HeadersFooters)
        {
            if (myFooter.IsHeader == false)
            {
                myFooter.RemoveAllChildren();
            }
        }
        // Remove All Content
        foreach (Section sec in myDocFooter.Sections)
        {
            sec.Body.RemoveAllChildren();
        }
        // Remove Header
        foreach (HeaderFooter myHeader in myDocFooter.FirstSection.HeadersFooters)
        {
            if (myHeader.IsHeader == true)
            {
                myHeader.RemoveAllChildren();
            }
        }
        // Memory stream To Read The different Section
        MemoryStream msOut = new MemoryStream();
        MemoryStream msHeadOut = new MemoryStream();
        MemoryStream msFooterOut = new MemoryStream();
        // read to memorystreams
        myDoc.Save(msOut, SaveFormat.Html);
        myDocHeader.Save(msHeadOut, SaveFormat.Html);
        myDocFooter.Save(msFooterOut, SaveFormat.Html);
        // extract html string from streams
        Encoding enc = Encoding.UTF8;
        mainContent = enc.GetString(msOut.GetBuffer());
        header = enc.GetString(msHeadOut.GetBuffer());
        footer = enc.GetString(msFooterOut.GetBuffer());
    }
    catch (Exception ex)
    {
    }
    finally
    {
        myDoc = null;
        myDocFooter = null;
        myDocHeader = null;
        GC.Collect();
    }
}

medtrans · October 26, 2007, 8:08am

This Will Only Work For Single Header or Footer I think.
thanks

alexey.noskov · October 26, 2007, 10:40am

Hi
Also you can try to use NodeImporter to import headers and footers. See the following code.

// open sourse document
Document doc = new Document(@"306_100188_medtrans\in.doc");
// create new document
Document header = new Document();
// create node importer
NodeImporter importer = new NodeImporter(doc, header, ImportFormatMode.KeepSourceFormatting);
// import Header form sourse document
header.FirstSection.AppendChild(importer.ImportNode(doc.FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary], true));
// save header as HTML
header.Save(@"306_100188_medtrans\out.html", SaveFormat.Html);

I hope that this will help you.
Best regards.