PDF document compare Issues using aspose.word

Hello Team,
Below are the issues we are facing while comparing two PDFs and generating new output pdf with differences with in that, Attaching two input PDFs for your reference to troubleshot at your end…

We are using Aspose.Word latest version 25.10 while comparing files.

Below are the four issues we encountered :

  1. Red Line is observed on left side of compared pdf which is generating (Issue on all pages)

  2. Footer shifting on next Page (Issue on Page No.2)

  3. Text missing or not showing up for differences (Issue on Page No. 3)

  4. Table Header is repeated in between of Page (Issue on Page No. 4)

attaching input files…
Input2-AsposePDF.pdf (594.1 KB)

Input1-AsposePDF.pdf (499.0 KB)

Regards,
Anurag Mudrakol

@anurag.mudrakol

Hello Anurag,

It seems you are encountering several issues while comparing PDF documents using Aspose.Words. Here are some suggestions to address the problems you’ve listed:

  1. Red Line on the Left Side: This may be due to the default settings for highlighting differences in the output PDF. You can adjust the appearance of the differences by modifying the CompareOptions used during the comparison. Look for properties that control the visual representation of changes.

  2. Footer Shifting: This issue may arise from layout differences between the two PDFs. Ensure that the page setup, including margins and dimensions, is consistent in both documents before performing the comparison. You may also want to check the CompareOptions for any layout-related settings that could affect the output.

  3. Text Missing for Differences: If certain text is not appearing, it could be related to how the text is rendered in the original PDFs. Ensure that the fonts used in both documents are properly embedded. Additionally, verify that the text is being recognized correctly during the comparison process.

  4. Table Header Repeated: This could result from how the comparison algorithm interprets the table structure. Review the table formatting in both documents to ensure consistency. You may also want to explore options in the CompareOptions that manage table headers and their behavior during comparison.

For further troubleshooting, consider reviewing the Aspose.Words documentation on document comparison and rendering for more detailed guidance.

If you continue to face issues, please reach out to the Aspose support forum with specific details about your implementation and the settings you are using.

Best regards!

@anurag.mudrakol Please note, Aspose.Words is designed to work with MS Word documents. MS Word documents are flow documents and they have structure very similar to Aspose.Words Document Object Model. But on the other hand PDF documents are fixed page format documents . While loading PDF document, Aspose.Words converts Fixed Page Document structure into the Flow Document Object Model. Unfortunately, such conversion does not guaranty 100% fidelity.
Regarding PDF document comparison using Aspose.Words, though PDF document might look the same visually, their structure might be different, that leads into the different DOM build by Aspose.Words and as a result the differences in document comparison.

You can disable revision bars using RevisionOptions.ShowRevisionBars property:

Document v1 = new Document(@"C:\Temp\v1.pdf");
Document v2 = new Document(@"C:\Temp\v2.pdf");
v1.Compare(v2, "AW", DateTime.Now);
v1.LayoutOptions.RevisionOptions.ShowRevisionBars = false;
v1.Save(@"C:\Temp\out.pdf");

As it was mentioned above, the PDF document is converted to flow DOM when it is loaded into Aspose.Words Document. Upon document comparison revisions shift the content, that make the content moved to the next page.

It looks like the problem occurs because table row height is fixed in the table when document is loaded from PDF. So some content moved down by produced revisions is not visible.

It looks like in the PDF document the table is supposed to be single table that spans multiple pages. After loading PDF document into Aspose.Words DOM there are separate tables on each page. This must be caused the mentioned problem.

I am afraid there is no way to compare PDF documents visually at the moment. This feature request is logged as WORDSNET-24926.