How to Load multiple Documents and Extract Text using .NET

I am passing in documents one at a time and extracting the title from the document. The code is repeatedly extracting the title from only from the first document and the other title’s from the second and third document are still pulling the title from the first document. I have been working on this code for 2 days. Any help would be greatly appreciated. My code is below:
Document doc = new Document(strDocFilePath + strFileName);
doc.UpdateListLabels();
Paragraph para = (Paragraph)doc.FirstSection.Body.GetChild(NodeType.Paragraph, 1, true);
foreach (Paragraph paragraph in doc.GetChildNodes(NodeType.Paragraph, true))
{

   if (paragraph.IsListItem && paragraph.ListLabel.LabelString.Contains("WORK ITEM 1:"))
                        {
                            strTitle = paragraph.ToString(SaveFormat.Text);
                            strTitle = strTitle.Substring(12).TrimStart();
                           

                        }



                        }

The title for the first document gets extracted, but all other documents have only the first documents extracted title.

@nsanoir

The code example gets the text of paragraph that is list item and has list label ‘WORK ITEM 1:’. If there are multiple list items with same list labels, you will get multiple results.

You need to import the other documents into Aspose.Words.Document separately and use the same code to get the paragraph’s text.

Could you please share what is the title of document as your knowledge? Please also share your other documents along with expected output. We will then provide you code example according to your requirement.

Workitem.doc expected output ”Buoy Crane, Inspect And Service”

Workitem2.doc expected output “Anchors Chains And Stoppers, Inspect And Preserve”

Workitem3.doc expected output “Incinerator, Exhaust Piping, Commercial Clean”

I am only getting one result. The files are attached.

Thanks,

Nenna Anya-Sanoir

non library work items.zip (682 KB)

@nsanoir

Please use the following code example to get the desired output. You need to import each document e.g. workitem2.doc and workitem3.doc in separate Document object.

Document doc = new Document(MyDir + "workitem.doc");
doc.UpdateListLabels();

foreach (Paragraph paragraph in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (paragraph.IsListItem && paragraph.ListLabel.LabelString.Contains("WORK ITEM"))
    {
        Console.WriteLine(paragraph.ToString(SaveFormat.Text));
        break;
    }
}

Please let us know if you still face any issue.

I don’t have just 3 documents to load it. Could be hundreds of documents dynamically. How would I import each document in a separate document object when I have hundreds of documents.

Thanks for the help,

Nenna

@nsanoir

Aspose.Words does support multi-threading. The only thing you need to make sure is that always use separate Document instance for each thread. One thread should use one Document object. You can use Parallel.ForEach as shown below to achieve your requirement.

string testFilesDirectory = Directory.GetCurrentDirectory() + "\\Test_Files";
string[] files = Directory.GetFiles(testFilesDirectory);

try
{
    Parallel.ForEach(files, file =>
    {
        Aspose.Words.LoadOptions options = new Aspose.Words.LoadOptions();
        options.LoadFormat = LoadFormat.Docx;
        Document doc = new Document(file, options);
        Console.WriteLine("Successfully opened " + file);
        //Your code ....
        //Your code ....
    });

}
catch (Exception ex)
{
    Console.WriteLine(ex.Message);
}

Thank you for all of your help…