Comparison and fetching coordinates using aspose.pdf

I am trying to compare two documents(docx,pdf) and get the coordinates of the changes in the initial documents. How can this be achieved?

  1. Can I get the coordinates of the changes from the pdf?
  2. Is docx to pdf conversion highly acccurate?

@Likitha_T

Sure, I can help you with your questions regarding Aspose.PDF and Aspose.Words for Java.

Question 1: Can I get the coordinates of the changes from the PDF?

Aspose.PDF for Java allows you to manipulate PDF documents and extract text along with its position (coordinates) from the document. However, directly extracting coordinates for detected changes requires additional logic.

Here are the steps you can follow:

  1. Compare Documents: Use Aspose.Words to compare two DOCX documents and get the changes. Aspose.Words provides methods to capture changes in the documents.
  2. Convert DOCX to PDF: Convert the DOCX document to a PDF using the Aspose.Words functionality.
  3. Extract Text with Coordinates: Use Aspose.PDF to extract text along with its coordinates from the resulting PDF.

Example Code Snippet

Here is a basic example to guide you through the process:

import com.aspose.words.*;
import com.aspose.pdf.*;

public class DocumentComparison {
    public static void main(String[] args) {
        // Compare DOCX documents
        Document docOriginal = new Document("original.docx");
        Document docRevised = new Document("revised.docx");
        Document docResult = docOriginal.compare(docRevised, "John Doe", DateTime.now());

        // Convert the result to PDF
        docResult.save("document_result.pdf", SaveFormat.Pdf);

        // Extract text with coordinates from the resulting PDF
        PdfDocument pdfDocument = new PdfDocument("document_result.pdf");
        for (int pageCount = 1; pageCount <= pdfDocument.getPages().size(); pageCount++) {
            Page page = pdfDocument.getPages().get(pageCount - 1);
            TextFragmentCollection textFragments = page.getTextFragments();
            for (TextFragment textFragment : textFragments) {
                String text = textFragment.getText();
                Rectangle rect = textFragment.getRect();
                float x = rect.getLLX();
                float y = rect.getLLY();
                float width = rect.getWidth();
                float height = rect.getHeight();

                System.out.println("Text: " + text + ", X: " + x + ", Y: " + y + ", Width: " + width + ", Height: " + height);
            }
        }
    }
}

Question 2: Is DOCX to PDF conversion highly accurate?

Yes, Aspose.Words provides a highly accurate DOCX to PDF conversion, maintaining the formatting and layout of the original document.

For more details and comprehensive examples, please refer to the official documentation:

If you have any specific requirements or run into issues with the code snippets provided, please let me know, and I can offer more detailed assistance.

From aspose.pdf can we get the coordinates of the changes from the initial file and not from the result file?

@Likitha_T

Regarding Aspose.PDF, we need to investigate this requirement in details. Would you kindly share your sample file(s) along with generated outputs? Also, please share the sample code snippet for our reference as well. We will log an investigation ticket and share the ID with you.

@asad.ali

The requirement is I have two document on which I have to do comparison on. There are two input files. After comparison - I need to get the changes along with their coordinates. Something of this sort:

{
            "x": 1.333299994468689,
            "width": 1.333299994468689,
            "y": 1.333299994468689,
            "text": "i",
            "type": "Deleted",
            "height": 1.333299994468689
        },

Is there a way Aspose.pdf provides does comparison? And will provide me the accurate coordinates of the changes in the original input files such that I will be able to accurately highlight changes in my frontend application?

@Likitha_T

The file comparison feature in the Aspose.PDF API is still in its early stages, and we are actively working to enhance its functionality. To better understand and evaluate your requirements, we kindly request you to share the sample files and code snippet mentioned in our previous response. Once we receive this information, we will proceed with further investigation accordingly.