Does Aspose provide any API for comparing the two PDF files and could return the difference
about
(1)text content
(2)table
(3)image
…
?
Many thanks!
Does Aspose provide any API for comparing the two PDF files and could return the difference
about
(1)text content
(2)table
(3)image
…
?
Many thanks!
Thank you for contacting support.
Please note Aspose.PDF for Java mimics the behavior of Adobe Acrobat and can be used to create and manipulate PDF documents and some other file formats. We are afraid PDF files may not be compared in Adobe Acrobat for the differences and therefore not available in Aspose.PDF API. If you are able to achieve the same with Adobe Acrobat then please share the steps with us so that we may investigate further to help you out.
Hi,
Thanks for response.
Two more questions,
(1)
Given tow PDF files: file1, file2,
if we extract elements(Image, Text, Table, etc) from pdf files and do our own comparison,
when difference is found, does Aspose PDF provide a way to figure out the position in the original file 1 and file2 ?
(2)When try to extract elements(Image, Text, Table, etc) from pdf files , if the Table/Text cross two pages and more, what Aspose return? So We had better extract
from whole file instead of page by page?
Thanks!
Ruhong
If you are extracting contents of a PDF document then extracted image does not has any property that holds position information. Neither text position can be retrieved from TextAbsorber. However, page number associated with any table can be retrieved with AbsorbedTable.PageNum
property as in the code snippet below:
Document doc = new Aspose.Pdf.Document(dataDir + "Doc2.pdf");
TableAbsorber absorber = new Aspose.Pdf.Text.TableAbsorber();
foreach (Aspose.Pdf.Page page in doc.Pages)
{
absorber.Visit(page);
}
foreach (AbsorbedTable table in absorber.TableList)
{
Console.WriteLine(table.PageNum);
}
Moreover, Absorber classes include Visit
method which iterates page by page, as in the code snippet above. About the Table existing on more than one pages, Aspose.PDF for .NET API returns separate table for each page.