Hello,
I have 2 pdf files and I want to perform comparison on them. Please find attached first.pdf and second.pdf. When I performed comparison I am getting result as result.pdf. If you look at this, he is not in good shape. I tried with latest version to perform this. Please help.
I am doing this by:
- First read first PDF
- Then read second PDF
- Convert both PDF’s into word
- Perform accept all revisions for both words
- Perform comparison
- Convert resultant word to PDFfirst.pdf (220.6 KB)
result.pdf (249.0 KB)
second.pdf (125.1 KB)
@prashantm1989
Can you please share the complete code snippet or a sample console application for our reference so that we can try to test the scenario in our environment and address it accordingly.
Here is overall logic:
public async Task<string> GetTrackedChanges(string html1, string html2)
{
//Create first PDF
Stream updatedMemStream = ConvertHTMLToPDF(html1);
updatedMemStream.Position = 0;
using (Document pdfDoc = new Document(updatedMemStream))
{
using (MemoryStream ms = new MemoryStream())
{
pdfDoc.Save(ms);
}
}
// Create secon PDF
Stream originalUpdatedMemStream = ConvertHTMLToPDF(html1);
originalUpdatedMemStream.Position = 0;
using (Document originalPdfDoc = new Document(originalUpdatedMemStream))
{
using (MemoryStream ms = new MemoryStream())
{
originalPdfDoc.Save(ms);
}
string differance = ComparePDF(originalPdfDoc, pdfDoc);
return differance;
}
}
public static string ComparePDF(Document originalPdf, Document updatedPdf)
{
var docA = ConvertPDFToWord(originalPdf);
var docB = ConvertPDFToWord(updatedPdf);
// There should be no revisions before comparison.
docA.AcceptAllRevisions();
docB.AcceptAllRevisions();
docA.Compare(docB, "Author Name", DateTime.Now);
// return ConvertWordToHtml(docA);
Document pdfDoc = ConvertWordToPdf(docA);
MemoryStream dstStream = new MemoryStream();
docA.Save(dstStream, Aspose.Words.SaveFormat.Pdf);
return Convert.ToBase64String(dstStream.ToArray());
}
private static Aspose.Pdf.Document ConvertWordToPdf(Aspose.Words.Document wordDocument)
{
// Save the document in stream
MemoryStream outStream = new MemoryStream();
wordDocument.Save(outStream, Aspose.Words.SaveFormat.Pdf);
outStream.Seek(0, SeekOrigin.Begin);
return new Aspose.Pdf.Document(outStream);
}
@prashantm1989
It looks like this case is related to Aspose.Words. Therefore, we are moving this thread to respective forum category where you will be assisted accordingly.
Thanks, can Aspose.Words team help me here?
@prashantm1989
We apologize for the delay in the response. The thread could not get moved timely to the respective category. Nevertheless, it is moved now and you will be receiving response shortly.
@prashantm1989 Please note, Aspose.Words is designed to work with MS Word documents. MS Word documents are flow documents and they have structure very similar to Aspose.Words Document Object Model. On the other hand PDF documents are fixed page format documents . While loading PDF document, Aspose.Words converts Fixed Page Document structure into the Flow Document Object Model. Unfortunately, such conversion does not guaranty 100% fidelity. So, I am afraid it not possible to preserve exact PDF document layout after loading it into Aspose.Words DOM and saving it back to PDF.
We have a feature request to add functionality to compare PDFs without loading PDF documents into the DOM. I have linked your request to WORDSNET-24926 issue to keep you updated regarding this feature request.
Also, I have checked your documents comparison using MS Word, and the result is quite similar to Aspose.Words: ms.pdf (148.0 KB)
Thanks for the update! Do we have any expected date for WORD SNET-24926?
BTW, I changed my apprach to:
Earlier approach:
- Convert HTML 1 to PDF1
- Convert HTML 2 to PDF2
- Convert both PDF’s into Word
- Compare Word
- Convert result to PDF
- Display on UI
This resulted into above issue
Now changed to:
- Convert HTML1 to Word1
- Convert HTML2 to Word2
- Comapre Words
- Convert result into PDF
- Display result on UI
This is much better than above
@prashantm1989 I am afraid, the WORDSNET-24926 issues is not yet scheduled for development, so at the moment we cannot provide you any estimates.
Yes, the second approach will work much better and much faster.