We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Compare PDF files

Does Aspose provide any API for comparing the two PDF files and could return the difference
about

(1)text content
(2)table
(3)image

?

Many thanks!

@ruhongcai

Thank you for contacting support.

Please note Aspose.PDF for Java mimics the behavior of Adobe Acrobat and can be used to create and manipulate PDF documents and some other file formats. We are afraid PDF files may not be compared in Adobe Acrobat for the differences and therefore not available in Aspose.PDF API. If you are able to achieve the same with Adobe Acrobat then please share the steps with us so that we may investigate further to help you out.

Hi,

Thanks for response.
Two more questions,

(1)
Given tow PDF files: file1, file2,
if we extract elements(Image, Text, Table, etc) from pdf files and do our own comparison,
when difference is found, does Aspose PDF provide a way to figure out the position in the original file 1 and file2 ?

(2)When try to extract elements(Image, Text, Table, etc) from pdf files , if the Table/Text cross two pages and more, what Aspose return? So We had better extract
from whole file instead of page by page?

Thanks!

Ruhong

@ruhongcai

If you are extracting contents of a PDF document then extracted image does not has any property that holds position information. Neither text position can be retrieved from TextAbsorber. However, page number associated with any table can be retrieved with AbsorbedTable.PageNum property as in the code snippet below:

Document doc = new Aspose.Pdf.Document(dataDir + "Doc2.pdf");
TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
foreach (Aspose.Pdf.Page page in doc.Pages)
{
    absorber.Visit(page);
}

foreach (AbsorbedTable table in absorber.TableList)
{
    Console.WriteLine(table.PageNum);
}

Moreover, Absorber classes include Visit method which iterates page by page, as in the code snippet above. About the Table existing on more than one pages, Aspose.PDF for .NET API returns separate table for each page.