Read Data From Word file in c#.net

Hi,

How to read data from attached word file.

10092409.zip (9.0 KB)

@pravinghadge,

For example, you can get/read all text from this Word document by using the following code:

Document doc = new Document("E:\\Temp\\10092409\\10092409.docx");
string allText = doc.ToString(SaveFormat.Text);

Or you can parse content Paragraph by Paragraph by using the following code:

Document doc = new Document("E:\\Temp\\10092409\\10092409.docx");
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    Console.WriteLine(para.ToString(SaveFormat.Text));
}

Please also refer to the following article:

Thank you hafeez for your reply

I tried your code is working fine.

But i need to separate paragraph based on Box given in the document

As this Box data will be fetched in different textboxes.

In this document there is no bookmark . So i am not able to do separate the paragraph

Kindly suggest

10092409.zip (26.1 KB)

Please find attached document with some sample data

@pravinghadge,

These boxes in your document are actually Content Controls (represented by StructuredDocumentTag class in Aspose.Words). You can get Paragraphs contained inside these boxes by using the following code. Hope, this helps.

Document doc = new Document("E:\\Temp\\10092409\\10092409.docx");
Table tab = doc.FirstSection.Body.Tables[0];

int i = 1;
foreach (StructuredDocumentTag sdt in tab.GetChildNodes(NodeType.StructuredDocumentTag, true))
{
    if (sdt.Level == MarkupLevel.Block)
    {
        Console.WriteLine("Box=" + i + " -------------");
        i++;
        foreach (Paragraph para in sdt.GetChildNodes(NodeType.Paragraph, true))
        {
            Console.WriteLine(para.ToString(SaveFormat.Text));
        }
    }
}

Thanks Haffez for your reply.

I am getting data by block wise.

But in one of box there is Table . But data is reading one by one

Can i get full data of Box one time

So that i can set directly to textbox

While using bookmarks i have use following code. which was working perfectly for me

Document htmlDoc = AsposeLicense.GenerateDocument(doc, nodes);
try
{
htmlDoc.FirstSection.Body.FirstParagraph.Remove();
}
catch { }
String sb = htmlDoc.ToString(SaveFormat.Html);

Can i get same syntax for boxes?

Thanks

@pravinghadge,

You can get HTML representation of full/complete ‘content control’ by using the following code:

Document doc = new Document("E:\\Temp\\10092409\\10092409.docx");
Table tab = doc.FirstSection.Body.Tables[0];

int i = 1;
foreach (StructuredDocumentTag sdt in tab.GetChildNodes(NodeType.StructuredDocumentTag, true))
{
    if (sdt.Level == MarkupLevel.Block)
    {
        Console.WriteLine("Box=" + i + " -------------");
        i++;

        HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.Html);
        opts.PrettyFormat = true;
        opts.ExportImagesAsBase64 = true;
        // specify any more HtmlSaveOptions
        Console.WriteLine(sdt.ToString(opts));
    }
} 

Hope, this helps.

Thanks haffez its working fine

Thanks for your support

@pravinghadge,

Thanks for your feedback. In case you have further inquiries or need any help in future, please let us know.