Queries Regarding File Comparison Using Aspose (Missing Tags)

Hi Team,

I am comparing two files (DOCX and PDF) using the Aspose Library. The response generates a third document where changes are tracked and displayed with redlining when opened in Word.

I have a few questions:

  1. While comparing two PDFs, I am losing some important information where changes cannot be identified accurately.
  • For example, with color coding, it can identify that text has been deleted. However, if the text size, format, or font has been changed, it detects that there is a change but fails to retain details about what specifically has been modified.
  1. Is it possible to generate two separate files for additions and deletions in the document, so that they can be viewed side by side to identify what has been added and removed?
  2. Will you be able to provide annotation data (e.g., tags) similar to what is displayed in the DOCX file, where it identifies formatting changes? This data could be loaded into a side panel to enhance the user experience.

Thanks,
Munish Singla

@munish.singla

Can you please specify which Aspose library you are using for file comparison and provide more details about the specific issues you are encountering?

Hi Team,

I am using groupdocs-comparison-24.7.jar.

Regards,
Munish

@munish.singla

Please note, Aspose.Words is designed to work with MS Word documents. MS Word documents are flow documents and they have structure very similar to Aspose.Words Document Object Model. But on the other hand PDF documents are fixed page format documents. While loading PDF document, Aspose.Words converts Fixed Page Document structure into the Flow Document Object Model. Unfortunately, such conversion does not guaranty 100% fidelity.

Regarding PDF document comparison using Aspose.Words, though PDF document might look the same visually, their structure might be different, that leads into the different DOM build by Aspose.Words and as a result the differences in document comparison.

In addition, as I can see you have specified words-java tag, Aspose.Words for Java does not support loading PDF documents at all. This feature is supported only in .NET and Python versions of Aspose.Words.

No, there is no way to achieve this using Aspose.Words. Aspose.Words comparison feature works similar to MS Word document comparison, i.e. changes are marked with revisions in the output document. Please see our documentation for more information:
https://docs.aspose.com/words/net/compare-documents/

The answer is the same as the question above.

Hi @alexey.noskov

Is there no way to obtain information about the differences between the source and target files, other than the third file (the compared file) that we are receiving? Do you not provide, or are you unable to provide, details about the number of changes or the specifics of those changes?

Regards,
Munish

@munish.singla As Mentioned above the differences are marked as revisions in the output document. Therefore you can read revisions in the document to detect the differences. Please see our documentation to learn how to work with revisions:
https://reference.aspose.com/words/java/com.aspose.words/range/#getRevisions

@alexey.noskov

Revisions are supported for docx and pdf files or only it’s supported for docx files?

@munish.singla When you open document using Aspose.Words the document is loaded in Aspose.Words DOM. So there is no difference what was the original document format anyways the document is in the DOM. So after comparing the changes will be marked with revisions.