Comparison document with random watermark

Hi Team,

I have documents that I would like to compare.
In some case I compare word documents, in some cases pdf format documents.

Documents may have watermarks, that are randomly added by the system that generates them.
As watermark is being added in random location of document, Aspose.Words for .NET would consider them as NOT the same.

Question: Is it possible to use Aspose.Word solution and ignore watermarks on document, or maybe set some additional parameters.

Please feel free to advice any solution you have in mind for this scenario.

@usernamename Could you please attach your sample documents here for our reference? Unfortunately, there is no option to ignore shapes (watermarks) upon comparing document. But theoretically, you can remove the watermarks from the documents before comparing them and then put them back, if required.

Thank you @alexey.noskov for quick response. I am attaching simple example of how would our document look like with actual watermark in it.

Please let me know if removing watermark is possible for similar documents, and what would be the best way to do it, as this solution could work for us.

Sample.docx (55.3 KB)
Sample.pdf (56.6 KB)

@usernamename In case of DOCX document, the watermarks can be identified by their names. In your case their names start with “Freeform”. So you can remove such shapes using code like this:

Document doc = new Document(@"C:\Temp\Sample.docx");

NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
foreach (Shape s in shapes)
{
    if (s.Name.StartsWith("Freeform"))
        s.Remove();
}

doc.Save(@"C:\Temp\out.docx");

Or the following code, if you preferer such syntax more:

Document doc = new Document(@"C:\Temp\Sample.docx");

doc.GetChildNodes(NodeType.Shape, true).Cast<Shape>().Where(s => s.Name.StartsWith("Freeform"))
    .ToList().ForEach(s => s.Remove());
            
doc.Save(@"C:\Temp\out.docx");

Unfortunately, from PDF document the watermarks are imported as simple image shapes, and there is no way to identify them.

1 Like

Thanks for code samples and explanation! Will run some tests, but I believe it should work for .docx documents. What comes to PDF, yes, I was expecting it may not be that simple.
Thank you!

1 Like

@alexey.noskov I have run some testing, and found out that is some case I am not able to see ‘Name’ property for some shapes (watermarks), even if I am able to see them in Word selection pane (below example, name starts with AutoShape):

I was wondering is there any reason why I am not able to see actual name (and remove shape) when I run a code described above.

I am also attaching example of document for which I am facing this behaviour.
Proof_Copy.docx (38.7 KB)

@usernamename Unfortunately shape name is not required field and can be empty like in your example. Watermarks in your document are simple custom shapes, that have nothing special, that can be used to identify them. Internally they are represented like this:

<w:pict w14:anchorId="7BA47C87">
	<v:shape id="_x0000_s2052" style="position:absolute;margin-left:42.65pt;margin-top:58.7pt;width:89.25pt;height:57.75pt;z-index:251654144;mso-position-horizontal-relative:page;mso-position-vertical-relative:page" coordorigin="1505,2071" coordsize="3149,2038" o:spt="100" adj="0,,0" path="m2138,3286v16,45,24,87,22,129c2158,3456,2149,3494,2130,3529v-18,35,-44,67,-79,95c2017,3652,1972,3676,1917,3696r-102,37l1932,4054v1,4,2,7,1,11c1932,4068,1929,4072,1925,4075v-3,3,-9,7,-16,10c1902,4089,1892,4093,1881,4098v-11,4,-21,7,-29,8c1844,4108,1837,4109,1832,4109v-5,,-9,-1,-12,-3c1817,4104,1815,4101,1814,4097l1515,3275v-7,-18,-7,-33,,-44c1522,3220,1531,3212,1543,3207r191,-69c1753,3131,1772,3125,1790,3120v18,-5,40,-9,66,-12c1882,3104,1910,3105,1941,3111v30,5,58,15,83,30c2049,3155,2071,3175,2091,3199v19,25,35,54,47,87m2017,3341v-13,-36,-31,-64,-53,-83c1941,3238,1918,3226,1894,3222v-25,-5,-48,-5,-70,-1c1801,3226,1780,3232,1760,3239r-110,40l1780,3636r107,-39c1923,3584,1951,3569,1971,3551v20,-18,35,-38,46,-60c2027,3469,2032,3445,2032,3420v,-26,-5,-52,-15,-79m2681,3127v4,10,7,18,9,25c2692,3159,2693,3165,2693,3170v,4,-1,8,-2,10c2690,3183,2687,3185,2683,3187v-3,1,-8,2,-14,1c2663,3188,2656,3188,2649,3189v-8,,-16,1,-25,2c2615,3192,2606,3195,2596,3198v-11,4,-22,11,-31,19c2556,3226,2547,3238,2539,3253v-9,15,-16,33,-23,56c2509,3331,2502,3357,2495,3389r148,407c2645,3800,2645,3803,2644,3806v-1,3,-3,7,-7,10c2634,3819,2629,3823,2622,3826v-7,4,-17,8,-28,12c2583,3842,2574,3845,2566,3847v-8,2,-14,2,-19,2c2542,3849,2538,3848,2535,3846v-2,-2,-4,-5,-5,-9l2304,3218v-1,-4,-2,-7,-1,-10c2303,3205,2305,3202,2309,3198v3,-4,8,-7,14,-10c2329,3185,2338,3182,2348,3178v10,-4,19,-7,26,-8c2380,3168,2386,3167,2391,3168v4,1,7,2,10,4c2404,3174,2405,3177,2407,3181r33,90c2448,3240,2456,3214,2465,3193v10,-21,19,-39,29,-52c2504,3127,2515,3116,2527,3108v12,-9,25,-15,38,-20c2571,3086,2578,3084,2585,3082v8,-2,17,-4,26,-5c2620,3076,2628,3075,2635,3075v8,,13,,17,2c2655,3078,2658,3079,2660,3081v1,2,3,4,5,7c2667,3091,2669,3096,2672,3102v2,6,5,14,9,25m3482,3132v19,50,29,99,31,147c3515,3326,3509,3370,3494,3410v-16,40,-40,76,-73,108c3387,3550,3344,3575,3292,3594v-51,19,-98,27,-141,26c3107,3619,3068,3608,3032,3588v-36,-19,-67,-48,-95,-84c2910,3467,2887,3422,2868,3370v-19,-51,-29,-99,-32,-147c2834,3176,2840,3132,2856,3092v15,-40,39,-76,72,-108c2962,2953,3005,2928,3057,2909v51,-19,99,-27,142,-26c3242,2884,3282,2895,3318,2915v35,19,67,47,95,84c3440,3036,3464,3080,3482,3132t-117,51c3353,3150,3339,3119,3322,3092v-17,-28,-37,-50,-60,-68c3240,3007,3214,2996,3185,2992v-29,-4,-62,,-98,13c3054,3018,3027,3034,3007,3055v-19,20,-33,45,-41,72c2959,3154,2956,3184,2960,3217v3,33,11,67,24,102c2996,3353,3011,3384,3028,3411v17,28,37,50,59,67c3110,3495,3136,3506,3165,3510v29,4,62,,98,-14c3296,3484,3322,3468,3342,3447v20,-21,34,-45,42,-72c3392,3348,3394,3318,3390,3286v-3,-33,-12,-67,-25,-103m4225,2862v19,50,29,99,31,146c4258,3056,4252,3099,4236,3140v-15,40,-39,76,-72,108c4130,3279,4087,3305,4035,3324v-51,18,-98,27,-141,26c3850,3348,3811,3338,3775,3318v-36,-20,-68,-48,-95,-85c3653,3196,3629,3152,3610,3099v-18,-50,-28,-99,-31,-146c3577,2906,3583,2862,3598,2822v16,-41,40,-77,73,-108c3704,2683,3747,2658,3800,2638v51,-18,98,-27,142,-25c3985,2614,4025,2624,4060,2644v36,20,68,48,95,85c4183,2766,4206,2810,4225,2862t-117,51c4096,2879,4081,2849,4064,2821v-17,-27,-36,-50,-59,-67c3983,2737,3957,2726,3928,2722v-29,-5,-62,,-98,13c3797,2747,3770,2764,3750,2784v-20,21,-33,45,-41,73c3701,2884,3699,2914,3702,2947v4,32,12,66,25,102c3739,3083,3754,3113,3771,3141v17,27,37,50,59,67c3853,3225,3879,3235,3908,3240v29,4,61,-1,98,-14c4039,3214,4065,3198,4085,3177v20,-21,34,-45,42,-72c4134,3078,4137,3048,4133,3015v-3,-32,-12,-66,-25,-102m4528,2120v4,9,6,16,7,21c4536,2147,4536,2152,4536,2155v,4,,7,-2,8c4533,2165,4531,2167,4528,2168v-3,1,-8,1,-13,1c4509,2169,4502,2169,4495,2168v-8,,-17,1,-27,3c4458,2172,4446,2175,4434,2179v-17,6,-31,14,-40,24c4384,2213,4377,2225,4374,2240v-3,14,-3,31,1,51c4378,2311,4385,2334,4395,2360r24,68l4557,2378v4,-2,8,-2,11,-1c4571,2377,4574,2379,4578,2382v3,3,6,8,10,13c4591,2401,4594,2408,4597,2417v6,16,9,29,7,38c4603,2464,4599,2469,4592,2472r-139,50l4651,3065v1,4,2,7,1,10c4651,3079,4649,3082,4645,3085v-4,4,-9,7,-16,11c4622,3100,4613,3103,4602,3107v-11,4,-21,7,-28,9c4566,3118,4560,3118,4555,3118v-5,,-9,-1,-12,-3c4540,3113,4538,3110,4537,3107l4339,2564r-87,32c4244,2598,4238,2597,4231,2591v-6,-6,-12,-18,-18,-34c4210,2548,4207,2541,4206,2534v-1,-6,-2,-11,-1,-16c4205,2513,4207,2510,4209,2507v2,-2,5,-4,8,-6l4305,2469r-23,-64c4266,2362,4256,2324,4253,2290v-3,-34,,-64,9,-90c4272,2175,4287,2153,4309,2134v21,-18,49,-34,83,-46c4408,2082,4424,2078,4440,2075v17,-2,30,-3,39,-3c4489,2073,4495,2074,4499,2075v4,2,7,5,11,9c4513,2087,4516,2092,4519,2098v3,6,6,13,9,22e" fillcolor="#b3b3b3">
		<v:stroke joinstyle="round"/>
		<v:formulas/>
		<v:path strokeok="f" o:connecttype="segments"/>
		<w10:wrap anchorx="page" anchory="page"/>
	</v:shape>
</w:pict>
1 Like

@alexey.noskov I was suspecting something similar, but was not sure. Thank you for explanation and some details on how it is represented.

1 Like