@Aspose Support,
We have taken the Aspose.PDF temporary license, but for few of the files it is taking too long for the document conversion. We are extracting the text along with the images
Could you please check?
1536002223858-EngagementContractInfosysV1.zip (511.5 KB)
@RChilli_Nidhi,
Can you please attach the code snippet you are using?
I am not sure what you mean by conversion until I see your code. There is multiple conversion, that is why.
Below is the code, also after converting large number of documents there is concern with the memory on the server. it got stucked!
try {
String fileDataBase64 = "";
// adding licence into Aspose word
com.aspose.pdf.License pdflicense = new com.aspose.pdf.License();
// Set license from Stream
pdflicense.setLicense(licenceStream());
byte[] fileBytedatanew = com.Base64.decode(fileData);
try(InputStream streamDatanew = new ByteArrayInputStream(fileBytedatanew);
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(streamDatanew)) {
HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.setPartsEmbeddingMode(HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml);
saveOptions.setLettersPositioningMethod(LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss);
saveOptions.setRasterImagesSavingMode(HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground);
// Create TextAbsorber object to extract text
TextAbsorber textAbsorber = new TextAbsorber();
// Accept the absorber for all the pages
pdfDocument.getPages().accept(textAbsorber);
// save location on memory
// create memory stream to load data into memory
try(ByteArrayOutputStream bOutput = new ByteArrayOutputStream(textAbsorber.getText().getBytes().length)){
pdfDocument.save(bOutput, saveOptions);
//close the document object
pdfDocument.freeMemory();
// Converting Image byte array into Base64 String
byte[] dataString = Base64.encodeBase64(bOutput.toByteArray());
fileDataBase64 = new String(dataString);
}
}
}
catch (Exception e) {
e.printStackTrace();
}
@RChilli_Nidhi,
This is due to the rasterization process, which is a very costly process. Sadly there is no way to speed this up.