Duplicate Pages

Lillith · July 21, 2015, 2:14pm

Is there a way to recognize and remove duplicate pages not only within a single PDF but across a collection of them as well?

The business scenario is a marketing document pull of PDF sell sheets. The client would like to scan across the documents and strip them of repeated pages since the sell sheets have end sheets/title sheet that could be repeated, etc.

Is there a way to accomplish this?

Thanks,
Lillith

codewarior · July 23, 2015, 6:00pm

Hi Lillith,

Thanks for your interest in our API’s.

In order to accomplish your requirement, you may consider comparing contents of individual PDF file pages with remaining pages within PDF document. To make the comparison process simple, you may consider splitting the PDF file to individual page documents and then start comparing each file contents with other document and for comparison purpose, currently you can only compare Textual contents of PDF files. For more information, please visit

Besides this, you may also consider using ComparisonApp of our sister company named GroupDocs.

In the event of any further query, please feel free to contact.