Prevent Cropping of Inline Images during Converting Email Message to PDF File using Java | Shrink Image Shape to Fit PDF Page

We are converting email to PDF (via mhtml) in Java. Some emails have an inline photo which on screen renders like you think it would fit on an A4 page. However when I check the resultant PDF I see that the image is enlarged and rotated and heavily cropped.

Is there any way I can specify that images should be scaled to fit on the (A4) PDF page?

Thanks, Mark.

@mawheadon,

You can build logic on the following Aspose.Words for Java’s code to resize wider images in MHTML and fit them within the page bounds of PDF:

PageSetup ps = doc.getFirstSection().getPageSetup();
double effectiveWidth = ps.getPageWidth() - (ps.getLeftMargin() + ps.getRightMargin());

NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true);
for (Shape shape : (Iterable<Shape>) shapes)
    shape.setWidth((shape.getWidth() > effectiveWidth) ? effectiveWidth : shape.getWidth()); 

In case the problem still remains, then please ZIP and attach the following resources here for testing:

As soon as you get these pieces of information ready, we will start investigation into your particular scenario/issue and provide you more information.

Thanks for that. My JUnit test code converts the email to mhtml and then passes the stream to a new Words Document. Before saving to PDF I iterate through any shapes (as per your suggestion) and if they are too big I try to scale them. The logging looks good, the only problem is that nothing seems to be different in the resulting PDF file. That is, it is the same as it was without any of the shrink to fit code. Perhaps there is something else I have to do or I am missing something.

convert() is called by my JUnit test with the mhtml InputStream and an outputType of “pdf”, which is converted into SaveFormat.PDF by translateToSaveFormat().

//--------------------------------------------------------------------------
/** Converts an office document to another type of document.
 * @param is The office document.
 * @param outputType The type to convert to: doc, docx, htm, html, odt, pdf, tif, tiff, txt.
 * @return The converted document.
 * @throws Exception */
public static byte[] convert(InputStream is, String outputType) throws Exception {
	HtmlLoadOptions htmlLoadOpts = new HtmlLoadOptions();
	// try to prevent very long wait times when it can't load embedded image links.
	htmlLoadOpts.setWebRequestTimeout(5000);	// milliseconds, default is 100 seconds
	log.info("output type is {}", outputType);
	Document doc = new Document(is, htmlLoadOpts);
	int pageCount = doc.getPageCount();
	log.info("Page count is {}", pageCount);
	shrinkImageToFit(doc);
	int saveFormat = translateToSaveFormat(outputType);
	ByteArrayOutputStream baos = new ByteArrayOutputStream();
	doc.save(baos, saveFormat);
	byte[] bytes = baos.toByteArray();
	log.info("Byte count for {} document is {}", outputType, bytes.length);
	return(bytes);
}

//--------------------------------------------------------------------------
private static void shrinkImageToFit(Document doc) {
	PageSetup ps = doc.getFirstSection().getPageSetup();
	double contentWidth = ps.getPageWidth() - (ps.getLeftMargin() + ps.getRightMargin());
	double contentHeight = ps.getPageHeight() - (ps.getTopMargin() + ps.getBottomMargin());
	log.info("Page height is {}, width is {}, content height is {}, width is {}",
			ps.getPageHeight(), ps.getPageWidth(), contentHeight, contentWidth);

	NodeCollection<Shape> shapes = doc.getChildNodes(NodeType.SHAPE, true);
	for (Shape shape : (Iterable<Shape>) shapes) {
		double imageHeight = shape.getHeight();
		double imageWidth = shape.getWidth();
		log.info("Image found, height is {}, width is {}, rotation is {}",
				imageHeight, imageWidth, shape.getRotation());
		double vScale = 1;
		if (imageHeight > contentHeight) {
			vScale = contentHeight / imageHeight;
		}
		double hScale = 1;
		if (imageWidth > contentWidth) {
			hScale = contentWidth / imageWidth;
		}
		double scale = Math.min(hScale, vScale);
		double imageHeight2 = imageHeight *= scale;
		double imageWidth2 = imageWidth *= scale;
		log.info("hScale is {}, vScale is {}, scale is {}, new height is {}, width is {}",
				hScale, vScale, scale, imageHeight2, imageWidth2);
		try {
			shape.setHeight(imageHeight2);
			shape.setWidth(imageWidth2);
			//shape.setHeight(50);
			//shape.setWidth(50);
			log.info("New height {} and width {} set", imageHeight2, imageWidth2);
		}
		catch (Exception e) {
			log.warn("Unable to scale image to scale {}", scale);
			log.warn("Unable to scale image to scale", e);
		}
	}
}

The logging output is:

Nov 11, 2020 2:35:33 PM com.mycomp.asposeutils.WordsConversion shrinkImageToFit
INFO: Page height is 792.0, width is 612.0, content height is 648.0, width is 468.0
Nov 11, 2020 2:35:33 PM com.mycomp.asposeutils.WordsConversion shrinkImageToFit
INFO: Image found, height is 1188.0000000000002, width is 1584.0, rotation is 0.0
Nov 11, 2020 2:35:33 PM com.mycomp.asposeutils.WordsConversion shrinkImageToFit
INFO: hScale is 0.29545454545454547, vScale is 0.5454545454545453, scale is 0.29545454545454547, new height is 351.00000000000006, width is 468.0
Nov 11, 2020 2:35:33 PM com.mycomp.asposeutils.WordsConversion shrinkImageToFit
INFO: New height 351.00000000000006 and width 468.0 set
Nov 11, 2020 2:35:33 PM com.mycomp.asposeutils.WordsConversion convert
INFO: Byte count for pdf document is 2058676

I tried with the (commented out) hard-coded 50 for height and width but this made no difference. I am using Words 20.10.

Okay just solved my own problem by looking at another issue (use A4 paper - not solved yet), so I added

doc.updatePageLayout();

To the end of the shrinkImageToFit() method.

Okay, so now the only issue remaining is the rotation of the image. Looking at the exif data of the original image it has an orientation setting of 6. When I view the email in Thunderbird the picture appears portrait and upright as it should. Once I have converted it to PDF, I have manged to scale it but the image is on its side. I can’t see a way of finding the exif orientation setting so that I can rotate it and scale it correctly. Perhaps there is another solution.

I think the problem can be reproduced by taking a photo in portrait mode with a smart phone (check the exif orientation is 6) then pasting the photo inline into an HTML email, then converting to PDF.

I do have an example .eml file but would want it to not be public if possible.

Thanks very much, Mark.

@mawheadon,

Yes, it is required to call Document.updatePageLayout method before saving to PDF because you are modifying document (image size adjustment etc) after invoking the Document.getPageCount property.

Secondly, please note that it is safe to attach files in the forum. If you attach your documents/resources here, only you and Aspose staff members can download them. You can also post the ZIP file ‘via private message’. In order to send a private message with attachment, please click on my name and find “Message” button.

Sharing Data with Aspose – Data Security

Thanks for that, I have just sent you a zip file containing an example email, which if converted to PDF should demonstrate my problem.

Forgive me if I add that I was able to see the exif data and orientation setting using ImageMagick.

A general comment in that I would suggest that the default behaviour of the convert to PDF code would be to have the images scaled to fit on the page if possible and honouring orientation, just as Thunderbird does.

Thanks, Mark.

@mawheadon,

I am afraid, we are unable to reproduce the issue related to ‘rotation of the image’ on our end. We used the following code for generating this (by aspose.words for java 20.10.pdf (810.9 KB)) file on our end:

MailMessage eml = MailMessage.load("C:\\Temp\\non_rotated_inline_image\\non_rotated_inline_image.eml");
ByteArrayOutputStream emlStream = new ByteArrayOutputStream();
eml.save(emlStream, com.aspose.email.SaveOptions.getDefaultMhtml());

HtmlLoadOptions htmlLoadOptions = new HtmlLoadOptions();
htmlLoadOptions.setLoadFormat(LoadFormat.MHTML);
Document doc = new Document(new ByteArrayInputStream(emlStream.toByteArray()), htmlLoadOptions);

shrinkImageToFit(doc); 

doc.save("C:\\temp\\non_rotated_inline_image\\awjava-20.10-shrinkImageToFit.pdf");

private static void shrinkImageToFit(Document doc) {
    PageSetup ps = doc.getFirstSection().getPageSetup();
    double contentWidth = ps.getPageWidth() - (ps.getLeftMargin() + ps.getRightMargin());
    double contentHeight = ps.getPageHeight() - (ps.getTopMargin() + ps.getBottomMargin());

    NodeCollection<Shape> shapes = doc.getChildNodes(NodeType.SHAPE, true);
    for (Shape shape : (Iterable<Shape>) shapes) {
        double imageHeight = shape.getHeight();
        double imageWidth = shape.getWidth();
        double vScale = 1;
        if (imageHeight > contentHeight) {
            vScale = contentHeight / imageHeight;
        }
        double hScale = 1;
        if (imageWidth > contentWidth) {
            hScale = contentWidth / imageWidth;
        }
        double scale = Math.min(hScale, vScale);
        double imageHeight2 = imageHeight *= scale;
        double imageWidth2 = imageWidth *= scale;

        try {
            shape.setHeight(imageHeight2);
            shape.setWidth(imageWidth2);
            //shape.setHeight(50);
            //shape.setWidth(50);
        } catch (Exception e) {

        }
    }
}

We have also converted your “non_rotated_inline_image.eml” file to .mhtml format (see by ms outlook 2019.zip (2.0 MB)) by using MS Outlook 2019 and can see that the orientation of image in this .mhtml matches to what is shown in Aspose.Words generated PDF. So, this seems to be an expected behavior. Can you please open .eml with Thunderbird and share the screenshot of image?

We have logged this requirement in our issue tracking system. Your ticket number is:

  • WORDSNET-21394: Provide option to scale large images in MHTML down to fit within the PDF page bounds

We will further look into the details of this requirement and will keep you updated on the status of the linked issue.

@mawheadon

I have created an issue with ID EMAILJAVA-34765 in our issue tracking system to further investigate the issue with reference to Aspose.Email rendering of MHT. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

Great, Thanks.