PDF document taking too long for conversion

RChilli_Nidhi · March 31, 2023, 6:37am

We have taken the Aspose.PDF temporary license, but for few of the files it is taking too long for the document conversion. We are extracting the text along with the images

Could you please check?
1536002223858-EngagementContractInfosysV1.zip (511.5 KB)

carlos.molina · March 31, 2023, 12:57pm

@RChilli_Nidhi,

Can you please attach the code snippet you are using?

I am not sure what you mean by conversion until I see your code. There is multiple conversion, that is why.

RChilli_Nidhi · March 31, 2023, 2:20pm

Below is the code, also after converting large number of documents there is concern with the memory on the server. it got stucked!

    try {			
String fileDataBase64  = "";
    			// adding licence into Aspose word
    			com.aspose.pdf.License pdflicense = new com.aspose.pdf.License();
    			
    			// Set license from Stream
    			pdflicense.setLicense(licenceStream());
    			
    			byte[] fileBytedatanew =  com.Base64.decode(fileData);
    			
    			try(InputStream streamDatanew = new ByteArrayInputStream(fileBytedatanew);				
    				com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(streamDatanew)) {
    				
    				HtmlSaveOptions saveOptions = new HtmlSaveOptions();				
    				saveOptions.setPartsEmbeddingMode(HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml);
    			    saveOptions.setLettersPositioningMethod(LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss);
    			    saveOptions.setRasterImagesSavingMode(HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground); 
    								
    				// Create TextAbsorber object to extract text
    				TextAbsorber textAbsorber = new TextAbsorber();
    				// Accept the absorber for all the pages
    				pdfDocument.getPages().accept(textAbsorber);
    			    
    				// save location on memory 
    				// create memory stream to load data into memory
    				try(ByteArrayOutputStream bOutput = new ByteArrayOutputStream(textAbsorber.getText().getBytes().length)){
    					pdfDocument.save(bOutput, saveOptions);
    					
    					//close the document object
    					pdfDocument.freeMemory();					
    					
    					// Converting Image byte array into Base64 String
    					byte[] dataString = Base64.encodeBase64(bOutput.toByteArray());					
    					fileDataBase64 = new String(dataString);					
    				}
    				
    				
    			}						
    		}
    		catch (Exception e) {
    			e.printStackTrace();
    		}

carlos.molina · March 31, 2023, 4:18pm

@RChilli_Nidhi,

This is due to the rasterization process, which is a very costly process. Sadly there is no way to speed this up.