Attachments.zip (562.5 KB)
At present we are using APOSE JAVA PDF 17.2.1 version jar. We are planning to use latest version i.e. 17.9. While doing regression testing with 17.9 PDF jar we found some different which are causing functional issues in our product. Can you please look into those and provide inputs.
Attachment Details:
- Input PDF Document: 2006 OH App Ct Briefs LEXIS 133.pdf
- Word Document converted from “PDF” using ASPOSE Java 17.2.1 : 2006 OH App Ct Briefs LEXIS 133-CP_PDF_17.2.1.docx
- Word Document converted from “PDF” using ASPOSE Java 17.9 : 2006 OH App Ct Briefs LEXIS 133-PDF_17.9.0.docx
Issue 1: Word doc converted using latest 17.9 is different from 17.2.1
Please refer attached “Issue1_ExampleOutput_WordDocDifference.docx” document having example difference. Can you please take a look and explain why this difference is occurring with latest 17.9 jar?
Issue 2: RUN object differences in word document converted from 17.2.1 vs 17.9. Latest 17.9 is splitting the text into multiple runs e.g.
For following text in the PDF document:
Telephone: (216) 621-1500 Facsimile: (216) 621-1551 E- mail: rdkehoe@kehoelaw.net E- mail: ibkenneyakehoelaw.net
we are getting different runs i.e. “rdkehoe@kehoelaw.net” email id text is getting split into two runs in 17.9.
Sysout of runs while processing “2006 OH App Ct Briefs LEXIS 133-CP_PDF_17.2.1.docx” word document converted using 17.2.1 PDF jar.
run.getNodeType()=Run text=Telephone:
run.getNodeType()=Run text=
run.getNodeType()=Run text=(216)
run.getNodeType()=Run text=
run.getNodeType()=Run text=621-15
run.getNodeType()=Run text=00
run.getNodeType()=Run text=Facsim
run.getNodeType()=Run text=ile:
run.getNodeType()=Run text=
run.getNodeType()=Run text=(216)
run.getNodeType()=Run text=
run.getNodeType()=Run text=621-155
run.getNodeType()=Run text=1
run.getNodeType()=Run text=
run.getNodeType()=Run text=E-
run.getNodeType()=Run text=mail:
run.getNodeType()=Run text=
run.getNodeType()=Run text=rdkehoe@kehoelaw.net
run.getNodeType()=Run text=
run.getNodeType()=Run text=E-
run.getNodeType()=Run text=mail:
run.getNodeType()=Run text=
run.getNodeType()=Run text=ibkenneyakehoelaw.net
Sysout of run while processing "2006 OH App Ct Briefs LEXIS 133-PDF_17.9.0.docx" word document converted using 17.9 PDF jar.
run.getNodeType()=Run text=Telephone:
run.getNodeType()=Run text=
run.getNodeType()=Run text=(216)
run.getNodeType()=Run text=
run.getNodeType()=Run text=621-15
run.getNodeType()=Run text=00
run.getNodeType()=Run text=Facsim
run.getNodeType()=Run text=ile:
run.getNodeType()=Run text=
run.getNodeType()=Run text=(216)
run.getNodeType()=Run text=
run.getNodeType()=Run text=621-155
run.getNodeType()=Run text=1
run.getNodeType()=Run text=
run.getNodeType()=Run text=E-
run.getNodeType()=Run text=mail:
run.getNodeType()=Run text=
run.getNodeType()=Run text=rdke
run.getNodeType()=Run text=hoe@kehoelaw.net
run.getNodeType()=Run text=
run.getNodeType()=Run text=E-
run.getNodeType()=Run text=mail:
run.getNodeType()=Run text=
run.getNodeType()=Run text=ibkenneyakehoelaw.net
Note: In word document converted using 17.9 PDF, “rdkehoe@kehoelaw.net” text got split into two runs. Can you please explain the reason for this.
Issue 3: LineSpacing Differences in word document converted using PDF 17.2.1 and 17.9
Please refer attached Issue3 document showing the exact differences. Refer following input and converted document also:
Attachment Details:
- Input PDF Document: CiteSpreadToNextPage.pdf
- Word Document converted from “PDF” using ASPOSE Java 17.2.1 : CiteSpreadToNextPage-CP_PDF_17.2.1.docx
- Word Document converted from “PDF” using ASPOSE Java 17.9 : CiteSpreadToNextPage-PDF_17.9.0.docx
Following is the code snippet which is used to convert PDF to WORD using both 17.2.1 and 17.9 PDF JAVA jars.
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(inStream);
// Instantiate Doc SaveOptions instance
DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
saveOptions.setMaxDistanceBetweenTextLines(3.5f);
// Set output file format as DOCX
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
saveOptions.setAddReturnToLineEnd(false);
// Save resultant DOCX file
pdfDocument.save(outStream, saveOptions);
pdfDocument.close();