Png extraction from Word doc - resolution degradation

Hi,

I am evaluating Aspose.Words and Aspose.PDF for my company. I am running into a problem generating PDFs which I can trace back to the conversion from Word to AsposePDF format.

The png image extracted from the doc seems to have a lower resolution than the one in the document. I wondered what’s going wrong here and what I should do to render it properly.

The code I’m using is simply:

Document doc = new Document("test.doc");
doc.Save(args[0] + ".aspose.xml", SaveFormat.AsposePdf);

I tried updating PdfExportMetafileResolution (100,300,3000) but this did not seem to have any effect.

Many thanks,
Andrew

Hello!
Thank you for your interest in Aspose components.
I looked into your document. There are two images, both are Windows enhanced metafiles (EMF). First one (horizontal bar) has its own resolution of 600 DPI, and second one (image with double spiral) 79 DPI. This is very little. The further explanation is given in the following thread:
https://forum.aspose.com/t/107799
Please let me know if I can help you further.
Regards,

Hi Klepus,

thanks for your post. Unfortunately, I can’t really tell what you are suggesting as a solution to this. It is the spiral image that I am concerned about. I can’t change the format of the image as it is part of a widely used template.

I don’t expect every document to be 100% pixel perfect, but if CutePDF can render this image with no problem (attached), I would expect your components to be able to?

thanks
Andrew

If you don’t have the ability to change source documents then you can perform the same programmatically. Every image in Aspose.Words document model contains image data or information on where it can be retrieved. You can use shape.ImageData.ImageBytes property for getting the image of a shape. This is an array of bytes the same as normally stored in files. You can create a Stream on that array and an Image from that Stream. After transformation insert the modified image back to the shape with shape.ImageData.SetImage() method. All the remaining, information how to change resolution, you can find in the referenced post. You don’t have necessarily to save the modified document, just use it in conversion.
This behavior is by design. We don’t use higher resolution then explicitly specified in the metafile when converting to PDF. And I already explained why. You can remove resolution property from the metafile or render it to any raster format.
Regards,

Hi Viktor,
thanks very much for your help. I’m not quite there though. I now get a black background to my image which I suppose is due to translating to bitmap format with no transparency. The image is still a bit pixellated. I have studied the threads you mentioned. I am not convinced I am getting the transformation stage correct. Is the below code the type of thing you are recommending?

Many thanks,
Andrew

Document doc = new Document("test.doc");

// transform any images with resolution <300dpi
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true, false);

int imageIndex = 0;
foreach (Aspose.Words.Drawing.Shape shape in shapes)
{
    if (shape.HasImage &&
    (shape.ImageData.ImageSize.HorizontalResolution < 300 ||
    shape.ImageData.ImageSize.VerticalalResolution < 300))
    {
        MemoryStream imageStream = new MemoryStream(shape.ImageData.ImageBytes);

        System.Drawing.Image metaFileData = System.Drawing.Image.FromStream(imageStream);

        System.Drawing.Bitmap dstImage = new System.Drawing.Bitmap(shape.ImageData.ImageSize.WidthPixels, shape.ImageData.ImageSize.HeightPixels, PixelFormat.Format24bppRgb);
        dstImage.SetResolution(300, 300);

        using (System.Drawing.Graphics gr = System.Drawing.Graphics.FromImage(dstImage))
            gr.DrawImage(metaFileData, 0, 0, dstImage.Width, dstImage.Height);

        shape.ImageData.SetImage(dstImage);
        imageIndex++;
    }
}

Hello!
Thank you for your inquiry.
This code meets general idea. But I can recommend checking some details.
Black background doesn’t depend on destination image format. If you are rendering a transparent metafile then background pixels won’t change anyway. A newly created non-transparent bitmap seems to be black and a transparent has initially “full-transparent color”. I was unable to find any promise from Microsoft that it is a constraint. So that’s you choose whether to rely on that or not. If you know more please share that with the community. Maybe it’s an implication from that .NET Framework always initializes objects to respective zeroes (numeric zero, false, null).
You should either know what background will be and fill the bitmap before rendering or create a transparent bitmap. First case is not universal. For instance, you cannot determine from the document model what color could be in background of a floating image. It could be anything, even another image. Second case is not perfect because Aspose.Pdf makes images “pixellated” as you expressed. It is so-named transparency moire at the boundary between transparent and non-transparent areas. The same dilemma we experience inside Aspose.Words and try to determine background ourselves wherever possible. I asked Aspose.Pdf Team about rendering quality but they seemingly postponed the issue.
Regarding destination image size I meant what is returned by Aspose.Words.Drawing.ShapeBase.SizeInPoints property of the current shape. That’s what I would start with because Aspose.Words chooses this size when scaling. If you take image size in pixels from ImageData containing a metafile then you get something calculated according metafile’s resolution and intended size. That’s not what you need since we are trying to change resolution. Image in terms of MS Word and Aspose.Words is a kind of shape. That’s why terminology is a bit confusing.
You can play with Graphics class to improve quality. For instance, try setting InterpolationMode.HighQualityBicubic.
If you can change metafile resolution somehow manually or programmatically then you won’t need rendering at all. Resetting resolution stored in images to zero will default it to SaveOptions.PdfExportMetafileResolution that you originally tried to set. Aspose.Words API related to image scaling and resolution looks tricky. We tried to make it clearer but that’s not easy too. If you have any idea how to improve it please share it with us.
Regards,

Hi Viktor,

Thanks for the information. I will make the change from pixel to points that you suggested.

In terms of comments for improving the API, I would suggest a method of enabling the user to override your functionality of using MIN(image resolution,** PdfExportMetafileResolution). That is the thing that is preventing me being able to recommend proceeding with the product at the moment.

It’s a hard sell to stakeholders to say - to process existing documents (convert doc to PDF), pay $xxx for a component AND write image conversion code which will need to be tested against a range of image types. I just want a server safe version of the Word interops - Save As .PDF function. Desktop conversion of doc to PDF do not impose the logic you describe around image resolution. That is what the output of your component will inevitably be compared against.

It’s unfortunate that there is no way to directly override the resolution of an EMF programmatically, as making translating images between formats and back again is likely to introduce inconsistencies. It also slows up the whole process.

Even if I do get this particular example working to an acceptable output and performance, I would be a bit nervous that the solution would be able to stand up without further monitoring and tuning when other documents are processed.

At the moment I have to reluctantly say that this is a show-stopper in proceeding with the Aspose word and PDF components. Please let me know if you have any comments or suggestions, even if you have to recommend an alternative product at this stage.

Many thanks,
Andrew

Hi!
Thank you for your opinion. I agree with you and trend to solving this problem. Idea of giving the ability to override current behavior lays right on the surface. But it is an interface extension. We should consult with the team, consider all other scenarios, possible pitfalls, legacy support etc. At last we should make properties related to image resolution more user friendly and understandable.
Best regards,

Hello Andrew!
Thank you for your patience. After some deeper investigation I managed to change default behavior. From now metafiles will be scaled to the resolution given in SaveOptions. I think that’s exactly what you asked about. Once any other cases come up important we’ll consider additional parameterization.
The issue is logged as #5277 in our defect database and implemented in the current codebase. New implementation will be available with the next hotfix in about 2-3 weeks.
Best regards,

The issues you have found earlier (filed as 5277) have been fixed in this update.