Resume Parsing from upload docuemnt

raju.net · March 20, 2018, 9:51am

Hi team,
I am using aspose licence dll word and pdf in c#.net
I want to read Resume data like Name,Gender,DOB,Email,Mobile ,Summary,Technical skills etc… from upload document (it may be different resume templates and different extensions like doc,docx,pdf )

I am using this code Refer this link

but I am not getting Gender ,DOB ,Nationality from personal details.
Please provide full details for Resume Parsing with different templates

Thanks ,
Raju.

tahir.manzoor · March 20, 2018, 4:14pm

@raju.net

Thanks for your inquiry. Please ZIP and attach your input Word and PDF documents here for our reference. We will then provide you code example according to your requirement. Thanks for your cooperation.

raju.net · March 21, 2018, 5:59am

Thank you for your reply tahir.
I uploaded different documents Documents.zip (168.9 KB)

tahir.manzoor · March 21, 2018, 4:44pm

@raju.net

Thanks for sharing the documents. We suggest you please read about Aspose.Words’ document object model.
Aspose.Words Document Object Model

Please refer to the following article about extracting contents from DOC/DOCX.
How to Extract Selected Content Between Nodes in a Document

E.g. if you want to extract the content between PROFESSIONAL SUMMARY and EDUCATIONAL QUALIFICATIONS, please get the index of these paragraphs and extract content between paragraphs. Please check the following code example. You can save the final document to TXT file format. Hope this helps you.

Document doc = new Document(MyDir + @"1007290_Resume_RaviTeja_4 Yrs Exp_Dot Net Developer.docx");

NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);

List <Node> startPara = paragraphs.Cast<Node>().Where(node => node.ToString(SaveFormat.Text).Trim().Contains("PROFESSIONAL SUMMARY")).ToList<Node>();
List<Node> endPara = paragraphs.Cast<Node>().Where(node => node.ToString(SaveFormat.Text).Trim().Contains("EDUCATIONAL QUALIFICATIONS")).ToList<Node>();
if (startPara.Count > 0 && endPara.Count > 0)
{
    ArrayList extractedNodes = Common.ExtractContent(startPara[0].NextSibling, endPara[0].PreviousSibling, true);

    Document dstDoc = Common.GenerateDocument(doc, extractedNodes);
                    
    dstDoc.Save(MyDir + "output.docx");
    dstDoc.Save(MyDir + "output.txt");
}

Farhan.Raza · March 21, 2018, 9:46pm

@raju.net

Thank you for your interest in our APIs.

We have analyzed the Resume Parser application by downloading it from Dev Center link, which has been shared by you. It simply converts a PDF document to a Word document with Aspose.PDF for .NET and then parses the text with Aspose.Words for .NET, as per your requirements. Kindly visit Convert PDF to DOC or DOCX format for your kind reference. Please feel free to let us know if you need any further assistance.

raju.net · March 22, 2018, 6:12am

Thansk for reply Tahir.

You mentioned PROFESSIONAL SUMMARY and EDUCATIONAL QUALIFICATIONS
between content .we don’t know exact template it may be different templates and different keywords ,different order .

some keywords like for PROFESSIONAL SUMMARY are SUMMARY ,CAREER OBJECTIVE ,CAREER SYNOPSIS etc…

my question is I have some keywords ,using these keywords I want to fetch content ,finally i am asking Resume parsing using Aspose

example:

Personal Information:
DOB : 13-01-1990
Gender : Male
Nationality : Indian

based on above information I want to read DOB,Gender,Nationality

please provide solution ASAP

Thanks in Advance

tahir.manzoor · March 22, 2018, 8:20am

@raju.net

Thanks for your inquiry. Please note that Aspose.Words does not provide API for resume parser. It is a class library that enables your applications to perform a great range of document processing tasks. You can extract the content according to your requirement from the Word document.

The code of Resume Parser by Aspose for .NET is using same ExtractContent method that was shared in my previous post. Please read the articles shared in my previous post.

You can use the same approach to extract the content from Word document. In your case, we suggest you following solution.

Create a List of your keywords e.g. List1.
Create another List e.g. List2 to store the Paragraph (keyword) node and its index.
Iterate over the List1.
Get the paragraph and its index that contains any keyword from the List. You can get the paragraph using the following line of code.

List <Node> startPara = paragraphs.Cast<Node>().Where(node => node.ToString(SaveFormat.Text).Trim().Contains("... keyword ...")).ToList<Node>();

Please use NodeCollection.IndexOf method to get the index of paragraph node. Following code snippet shows how to get the index of paragraph.

NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
//para is the Paragraph that contains the keywork
int index = paragraphs.IndexOf(para);

Once iteration is completed sort the List2 by paragraph’s index in ascending order.
Iterate over this List and extract the contents between paragraphs. Following line of code shows how to extract the content between two paragraphs.
ArrayList extractedNodes = Common.ExtractContent(startPara[0].NextSibling, endPara[0].PreviousSibling, true);

You may use following approach to get the content between keywords.

Create a List of keywords.
Iterate over all paragraph nodes of document
Get paragraph’s text using Node.ToString(SaveFormat.Text) method.
If the paragraph’s text contains the keyword e.g. WORKING ORGANIZATION: from the List, keep getting the text of paragraph nodes until you get the paragraph that contains any keyword from the List.
Repeat the steps 2 to 4 for the next keyword that you find.

Following code snippet shows how to get paragraph collection of document and iterate over them.

Document doc = new Document(MyDir + @"Resume.docx");
NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);

foreach (Paragraph para in paragraphs)
{
    //Your code.
    Console.WriteLine(para.ToString(SaveFormat.Text));
}

Once you have extracted the contents you can use them in your application according to your requirement. Hope this helps you.

raju.net · March 22, 2018, 9:40am

Thanks for your reply Tahir,

can you provide full code of above mentioned reply .

urgent requirement this one ,please send full code.

Thanks in advance
raju.

tahir.manzoor · March 22, 2018, 5:55pm

@raju.net

Thanks for your inquiry. Please spare us some time to write the code for your case. We will get back to you soon.

tahir.manzoor · March 23, 2018, 6:45am

@raju.net

Thanks for your patience. As you know do not know the order of keywords, we suggest you please add the bookmarks to the keywords and extract the content. Please check the following code example.

Regarding mobile number, email, name information, please use Paragraph.ToString(SaveFormat.Text) method to get this information.

var keywords = new List<string>();

keywords.Add("EDUCATIONAL QUALIFICATIONS");
keywords.Add("WORKING ORGANIZATION");
keywords.Add("TECHNICAL SUMMARY");
keywords.Add("Brief Description of the project");
keywords.Add("Roles and Responsibilities");
keywords.Add("PROFESSIONAL EXPERIENCE");
keywords.Add("PROJECT");
keywords.Add("PROFESSIONAL SUMMARY");

Document maindoc = new Document(MyDir + "1007290_Resume_RaviTeja_4 Yrs Exp_Dot Net Developer.docx");
Document doc = (Document)maindoc.Clone(true);
doc.Range.Bookmarks.Clear();
DocumentBuilder builder = new DocumentBuilder(doc);
int i = 1;
NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);

foreach (string key in keywords)
{
    List<Node> paras = paragraphs.Cast<Node>().Where(node => node.ToString(SaveFormat.Text).Trim().Contains(key)).ToList<Node>();
    if (paras.Count > 0)
    {
        foreach (var item in paras)
        {
            builder.MoveTo(item);
            builder.StartBookmark("keyword_" + i);
            builder.EndBookmark("keyword_" + i);
            i++;
        }
    }
}
builder.MoveToDocumentEnd();
builder.StartBookmark("keyword_" + i);
builder.EndBookmark("keyword_" + i);

for (int j = 0; j < doc.Range.Bookmarks.Count - 1; j++)
{
    ArrayList extractedNodes = Extract_Contents.Common.ExtractContent(doc.Range.Bookmarks[j].BookmarkStart.ParentNode, doc.Range.Bookmarks[j + 1].BookmarkStart.ParentNode, true);

    Document dstDoc = Extract_Contents.Common.GenerateDocument(doc, extractedNodes);
    dstDoc.LastSection.Body.LastParagraph.Remove();
    dstDoc.Save(MyDir + "output_" + j + ".txt");
}