Free Support Forum - aspose.com

Extra item found in document content

Hi Support,

I use regex format to find document, and total records found is 30 instead of 29. Addition document found is MGR.002.002.3404.

This is the following code:-

public List FindValidDoc(string filePath, string regexFormat)
{
var foundDocList = new List();

        Aspose.Words.Document doc = new Aspose.Words.Document(filePath);

        foreach (Paragraph p in doc.GetChildNodes(NodeType.Paragraph, true))
        {
            MatchCollection matchCollection = Regex.Matches(p.GetText(), regexFormat);

            foreach (Match matchedWord in matchCollection)
            {
                foundDocList.Add(matchedWord.Value);
            }
        }

        return foundDocList;
    }

Document Result List:-

No.;Document
001;MAN.001.0001.0009
002;MAN.001.0001.0009
003;MAN.001.0001.0010
004;MAN.001.0001.0011
005;MGR.002.001.2246
006;MGR.002.001.2246
007;MGR.002.001.2246
008;MGR.002.001.2246
009;MGR.002.002.3404
010;MGR.002.002.3404
011;MGR.002.002.3404
012;MGR.002.002.3404
013;MGR.002.002.3404
014;MGR.002.002.3404
015;MGR.002.002.3404
016;MGR.002.002.3404
017;MGR.002.002.3478
018;MGR.002.002.3478
019;MGR.002.002.3478
020;MGR.002.002.3478
021;MGR.002.002.3478
022;MGR.002.002.3478
023;MGR.002.002.3478
024;MGR.002.002.3478
025;MGR.002.002.3382
026;MGR.002.002.3382
027;MGR.002.002.3382
028;ABC.0001.002.0001
029;ABC.0001.002.0001
030;ABC.001.001.0001

Please find the attached of word document for your reference. And hope to hear from you soon.
TestResult.zip (27.8 KB)

Cheers,
Angie

@angieng
Thanks for your inquiry. Please ZIP and attach your input Word document and regexFormat you are using.
We will investigate the issues and provide you more information.

Here you go. Test1.zip (16.3 KB)
Thank you.

@angieng

Thank you for sharing document. Kindly also share regexFormat you are using to find documents for testing.

Hi,

You can use this regex format: ([a-zA-Z]{1,}.){1}([a-zA-Z0-9]{1,}.*)+\d{2,}(_\d{2,})?

Cheers!~

@angieng
Thank you for patience. We have tested your input document with shared regexFormat and found that footnotes text is repeating and that is why there is an extra occurrence of MGR.002.002.3404 document. Please check Test1Result.zip (9.8 KB)