Issues with PDF to HTML images

Hi there,

We have an issue with PDF to HTML conversion.

The images extracted from PDF doesnt match the quality of the original PDF. Could you please take a look and advise ?

Code Snippet :

public static void main(String[] args) {

Document zoomDoc = null;
try
{
	System.out.println("start pdf2html");
	String pdfPath = "C:\\data\\5314051.pdf";
	String outputDirectory = "C:\\Users\\gm69267\\Desktop\\output\\";
	String docId = "5314051";
	Document doc = new Document(pdfPath);
	zoomDoc = getZoomPDFDoc(doc, new HashMap<String, Object>());
	String outHtmlFile = outputDirectory + docId + "_.html";
	// Create HtmlSaveOption with tested feature
	HtmlSaveOptions saveOptions = new HtmlSaveOptions();
	saveOptions.setFixedLayout(true);
	saveOptions.setSplitIntoPages(true);
	saveOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
	saveOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
	// save output as HTML
	zoomDoc.save(outHtmlFile, saveOptions);
	System.out.println("end pdf2html");
}
catch (Exception e)
{
	System.out.println("Error in extracting pdf"+e);
}
finally
{
	if (zoomDoc != null)
	{
		zoomDoc.close(); 
	}
}

}

public static Document getZoomPDFDoc(Document doc, Map<String, Object> map)
{
PdfPageEditor pdfEditor = new PdfPageEditor();
pdfEditor.bindPdf(doc);

float baseWidthInPixel = 1075f;

float newPageWidthInPoint = baseWidthInPixel * 1.3333f; //806.25f
float realWidthInPoint = (float) pdfEditor.getDocument().getPages().get_Item(1).getRect().getWidth();
float realHeightInPoint = (float) pdfEditor.getDocument().getPages().get_Item(1).getRect().getHeight();
float adjustedPageHeightInPoint = (newPageWidthInPoint/realWidthInPoint) * realHeightInPoint;
pdfEditor.setPageSize(new PageSize(newPageWidthInPoint, adjustedPageHeightInPoint));
map.put("_WIDTH", baseWidthInPixel);
map.put("_HEIGHT", adjustedPageHeightInPoint * 1.3333f);
float zoom = (float) newPageWidthInPoint / realWidthInPoint;
pdfEditor.setZoom(zoom);
// Create stream object to hold file with updated dimensions
ByteArrayOutputStream output = new ByteArrayOutputStream();
// Save file to stream object
pdfEditor.save(output);

byte[] bytes = output.toByteArray();
ByteArrayInputStream inputStream = new ByteArrayInputStream(bytes);

Document zoomDoc = new Document(inputStream);
return zoomDoc;

}
}

5314051.zip (973.4 KB)

@gm69267

Thank you for contacting support.

Would you please share a narrowed down sample application along with generated files because shared code snippet includes undefined class RenditionConstant, thus can not be compiled. Before sharing requested data, please ensure using Aspose.PDF for Java 18.12 in your environment.

Hi @Farhan.Raza - I have updated the code snippet, please let me know if there are any issues.

Thanks

@gm69267

Thank you for the updates.

We have attached generated data for your kind reference. Would you please elaborate your concerns specifically, while mentioning file names so that we may proceed further to help you out.

Generated.zip

Hi @Farhan.Raza, Please check the 5314051__files/img_04.png and compare the same image from PDF.

The generated output file is not matching the quality of input PDF file.

@gm69267

Thank you for elaborating it.

We have logged a ticket with ID PDFJAVA-38314 in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.