HTML with images to PDF

PS-CL · October 21, 2014, 10:59am

Hello,

I use Aspose.Word in order to perform the convert HTML to PDF.

This usually works fine but I once got a PDF with a red cross in place of an image.

I retried the conversion from the same html and got a fine PDF every other time.

It’s quite critical to me to be sure that the pdf is generated without missing any image thus I’m wondering if there is a way to know if some images could not be retreived from the HTML.

I’m basically running this :

Document document = new Document();
DocumentBuilder builder = new DocumentBuilder(document);
builder.insertHtml(data_in);
PdfSaveOptions so = new PdfSaveOptions();
ByteArrayOutputStream result = new ByteArrayOutputStream();
document.save(result, so)

Thanks,

NB : If you have some ideas about what could caused failure in the process of retreiving the images.

awais.hafeez · October 22, 2014, 6:33am

Hi Paulin,

Thanks for your inquiry. Could you please attach your 1) input Word document, 2) output PDF file showing the undesired behavior and 3) HTML string (data_in) here for testing? We will investigate the issue on our end and provide you more information.

Secondly, Aspose.Words for Java depends upon the Java Advanced Imaging (JAI) package from Sun in order to process some image formats such as TIFF. You can find the required packages jai and jai-imageio at these locations:

http://download.java.net/media/jai/builds/release/1_1_3/
http://download.java.net/media/jai-imageio/builds/release/1.1/

Best regards,

PS-CL · October 22, 2014, 8:40am

Hi,

Let me explain my request.

If I run the above code with this input (complete source code is attached to this thread) :

final String data_in = "<p>Here is a valid image</p><p><img src=\"http://www.contract-live.com/img/logo-gray.png\" /></p>"
		+ "<p>Here is a invalid image</p><p><img src=\"http://not_existing_image.png\" /></p>";

I got the attached pdf file.
The first image is correctly loaded into the pdf and the second one is replaced by a red cross (which is completely normal as the image does not exist).

My question is :
Is there a way to detect missing images in the pdf programmatically ?

I’m thinking of something like :

final PdfSaveOptions so = new PdfSaveOptions();
so.setImagesMandatory(true);


try {
	document.save(result, so);
} catch (MissingImageException e) {
	// some images are missing
}

awais.hafeez · October 23, 2014, 2:30am

Hi Paulin,

Thanks for the additional information. The red cross is a 32x32 png image (see attachment). I think, you can add this file in your project as a resource and then compare it with any invalid shapes in document before saving t PDF. Please try running the following code:

final Document document = new Document();
final DocumentBuilder builder = new DocumentBuilder(document);

final String data_in = "<p>Here is a valid image</p><p><img src=\"http://www.contract-live.com/img/logo-gray.png\" /></p>"
        + "<p>Here is a invalid image</p><p><img src=\"http://not_existing_image.png\" /></p>";

builder.insertHtml(data_in);

BufferedImage crossImg = ImageIO.read(new File(getMyDir() + "resource.png"));
for (Shape s : (Iterable<Shape>) document.getChildNodes(NodeType.SHAPE, true))
{
    BufferedImage img = ImageIO.read(s.getImageData().toStream());

    if (bufferedImagesEqual(crossImg, img)){
        System.out.println("true");
    }
    else{
        System.out.println("false");
    }
}

final PdfSaveOptions so = new PdfSaveOptions();

final ByteArrayOutputStream result = new ByteArrayOutputStream();
document.save(result, so);

final FileOutputStream fos = new FileOutputStream("test.pdf");
try {
    fos.write(result.toByteArray());
} finally {
    fos.close();
}

public static boolean bufferedImagesEqual(BufferedImage img1, BufferedImage img2) {
    if (img1.getWidth() == img2.getWidth() && img1.getHeight() == img2.getHeight()) {
    for (int x = <font color="BROWN">0; x < img1.getWidth(); x++) {
    for (int y = <font color="BROWN">0; y < img1.getHeight(); y++) {
    if (img1.getRGB(x, y) != img2.getRGB(x, y))
    return false;
    }
    }
    } else {
    return false;
    }
    return true;
}

I hope, this helps.
Best regards,

PS-CL · October 28, 2014, 10:20am

Hi,

Thanks for your answer.

Your solution is a bit hacky but it does the job. I guess it means there is no mean to detect those missing images at loading time.

My concern is that this code will break if one day you change the red cross image in aspose.

Regards,

awais.hafeez · October 29, 2014, 9:56am

Hi Paulin,

Thanks for your inquiry. We will most likely not change this red cross image in future; however, please tell us in case you have any troubles in future and we will be glad to look into this further for you.

Best regards,

PS-CL · January 29, 2015, 4:34am

Hi,
I’ve just discovered that the solution you provided me with has a bug.
If you run the above code with data_in = "<hr />" (or any string that contains it), the “test.pdf” file generated will be fine whereas the check will detect a “red cross” image.
This is quite critical to me as I have no longer a way to detect those missing images without triggering on all html that contains a <hr />.
Thanks
Paulin

awais.hafeez · January 30, 2015, 5:25am

Hi Paulin,

Thanks for your inquiry. You’re right; in this case, the code detects a red cross images. I am in communication with our development team and will get back to you soon.

Best regards,

awais.hafeez · February 4, 2015, 4:33am

Hi Paulin,

Thanks for being patient. We can add a warning using our IWarningCallback system if an image is unavailable. This will help to detect empty images. We have logged an issue in our bug tracking system. The ID of this issue is WORDSNET-11403. Your thread has also been linked to the appropriate issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Best regards,

PS-CL · February 4, 2015, 4:56am

Hi,

Do you have any time estimate for this issue to be solved ?

Thanks

awais.hafeez · February 6, 2015, 12:37am

Hi Paulin,

Thanks for your inquiry. This issue is pending for analysis and is in the queue. Unfortunately, at the moment we cannot provide you any reliable estimate regarding this issue. However, we will keep you informed of further developments and let you know via this thread once this issue is resolved.

Best regards,

PS-CL · February 19, 2015, 4:39am

Hi Awais,

Any progress on this issue on your side ?

Is there some workaround I can do on my side to handle this issue while your development is still is progress ? As I said before, this issue is quite critical to me.

If a code something like :

if (numberOfRedCross > numberOfHrInHtml) {
    // at least one image is a real red cross
}

Should I expect some cases where this code will fail ?

Thanks

awais.hafeez · February 20, 2015, 3:50am

Hi Paulin,

Thanks for your inquiry. Unfortunately, your issue is not resolved yet. We have asked the ETA of this issue from our development team and will update you as soon as any estimates or workaround are available. We apologize for your inconvenience.

Best regards,

aspose.notifier · April 2, 2015, 1:13am

The issues you have found earlier (filed as WORDSNET-11403) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

PS-CL · May 18, 2015, 10:51am

Hi Awais,

I tried the latest version of Aspose words (15.4.0) in order to use the new Warning Callback feature to detect missing images but I have some strange behavior.

Again, I tried to convert two very simple htmls :
"Here is a valid image<img src=\"http://www.contract-live.com/img/logo-gray.png\" />"
And
"Here is a invalid image<img src=\"http://not_existing_image.png\" />"

I used the following code :

final Document document = new Document();
final DocumentBuilder builder = new DocumentBuilder(document);

builder.insertHtml(html);

final PdfSaveOptions so = new PdfSaveOptions();
so.setWarningCallback(new IWarningCallback() {
	@Override
	public void warning(final WarningInfo info) {
		System.out.println("GOT : " + info.getSource() + ", " + info.getDescription() + ", " + info.getWarningType());
	}
});


final ByteArrayOutputStream result = new ByteArrayOutputStream();
document.save(result, so);


final FileOutputStream fos = new FileOutputStream("test.pdf");
try {
	fos.write(result.toByteArray());
} finally {
	fos.close();
}

I got those two outputs :

GOT : 2, DrawingML shape is replaced with fallback Shape, some formatting might be lost., 65536
GOT : 1, DrawingML shapes are not fully supported. Object could be rendered differently.
	At DrawingML Object 167.25x26.25, Paragraph 3, Section 1, 65536
GOT : 1, DrawingML shapes are not fully supported. Object could be rendered differently.
	At DrawingML Object 167.25x26.25, Paragraph 3, Section 1, 65536

AND

GOT : 2, DrawingML shape is replaced with fallback Shape, some formatting might be lost., 65536
GOT : 1, DrawingML shapes are not fully supported. Object could be rendered differently.
	At DrawingML Object 24x24, Paragraph 3, Section 1, 65536
GOT : 1, DrawingML shapes are not fully supported. Object could be rendered differently.
	At DrawingML Object 24x24, Paragraph 3, Section 1, 65536

I’m a bit confused because I was expecting more differences than just “167.25x26.25” vs “24x24”.

Can you help me on this ? Am I missing something ?

Thanks

NB : setWarningCallback is deprecated but I couldn’t find which method I should call instead. Can you help me on this too ?

awais.hafeez · May 19, 2015, 7:57am

Hi Paulin,

Thanks for your inquiry. You can determine WarningSource, WarningType and Description by using the following code:

DocumentBuilder builder = new DocumentBuilder();
Callback warningsHandler = new Callback();
builder.getDocument().setWarningCallback(warningsHandler);
builder.insertHtml("<p>Here is a invalid image</p><p><img src='http://not_existing_image.png\' /></p>");

static class Callback implements IWarningCallback {
    public void warning(WarningInfo info) {
    try {
            System.out.println(info.getSource() + " | "+ info.getWarningType() + " | " + info.getDescription());
    } catch (Exception ex) {

    }
  }
}

I hope, this helps.

Best regards,

PS-CL · May 19, 2015, 10:30am

Hi Awais,

Thanks to your help I succeeded to get the desired warning.

I was expecting the warning to be raised when converting the doc to pdf while it was raised when converting the html to doc.

Thanks !

awais.hafeez · May 20, 2015, 11:01am

Hi Paulin,

Thanks for your feedback. In case you have further inquires or need any help, please let us know.

Best regards,