EMF to PDF conversion produces garbled document

Hi,

I just downloaded ASPOSE.Words .Net 9.5. In my test application I’m trying to convert an EMF file to PDF, but the output doesn’t seem to recognize the appropriate font/encoding (?) I’m attaching both the source EMF file (a screenshot of this website), source.emf, and the PDF generated (output1.pdf).

Here is the code I’m using (taken from the examples I found in this forum). I tried different combinations for the PDF options object to no avail:

Document doc = new Document();
doc.RemoveAllChildren();


DocumentBuilder builder = new DocumentBuilder(doc);
builder.PageSetup.LeftMargin = Aspose.Words.ConvertUtil.MillimeterToPoint(10);
builder.PageSetup.RightMargin = Aspose.Words.ConvertUtil.MillimeterToPoint(10);


// read the image from file, ensure it is disposed.
using (Image image = Image.FromFile(sSourceEMFFilePath))
{
	// get the number of frames in the image.
	int framesCount = image.GetFrameCount(FrameDimension.Page);


	// loop through all frames.
	for (int frameIdx = 0; frameIdx < framesCount; frameIdx++)
	{
		// insert a section break before each new page, in case of a multi-frame TIFF.
		if (frameIdx != 0)
			builder.InsertBreak(BreakType.SectionBreakNewPage);


		// select active frame.
		image.SelectActiveFrame(FrameDimension.Page, frameIdx);


		// we want the size of the page to be the same as the size of the image.
		// convert pixels to points to size the page to the actual image size.
		PageSetup ps = builder.PageSetup;
		ps.PageWidth = ConvertUtil.PixelToPoint(image.Width, image.HorizontalResolution);
		ps.PageHeight = ConvertUtil.PixelToPoint(image.Height, image.VerticalResolution);


		// insert the image into the document and position it at the top left corner of the page.
		builder.InsertImage(
				image,
				RelativeHorizontalPosition.Page,
				0,
				RelativeVerticalPosition.Page,
				0,
				ps.PageWidth,
				ps.PageHeight,
				WrapType.None);
	}
}


var pdfOps = new PdfSaveOptions()
{
	JpegQuality = 80,
	//IsPreserveFormFields = false,
	TextCompression = PdfTextCompression.None,
	//IsEmbedTrueTypeFontsForAsciiChars = false,
	EmbedFullFonts = false,
	PrettyFormat = false,
	SaveFormat = SaveFormat.Pdf,
};


// save the document to PDF temp file
sOutPDFFilePath = Path.ChangeExtension(Path.GetTempFileName(), ".pdf");
doc.Save(sOutPDFFilePath, pdfOps);

Just for testing purposes I also tried rasterizing the image before adding it to the document with the code below (also found in these forums) and that produced a legible PDF document, although of far inferior quality (not being vector and all that). It also has some weird black background. I attached output2.pdf so you can see the output.

using (MemoryStream resterImageStream = new MemoryStream())
{
	image.Save(resterImageStream, ImageFormat.Jpeg);


	// insert the image into the document and position it at the top left corner of the page.
	builder.InsertImage(
			resterImageStream,
			RelativeHorizontalPosition.Page,
			0,
			RelativeVerticalPosition.Page,
			0,
			ps.PageWidth,
			ps.PageHeight,
			WrapType.None);
}

Any ideas what could be wrong or how to achieve the EMF to PDF conversion in vector mode? Thank you in advance for your help.

Esteban

Hello

Thank you for reporting this problem to us. I managed to reproduce the problem on my side. Your request has been linked to the appropriate issue. You will be notified as soon as it is resolved. As a temporary workaround you can try using the following code before saving to PDF:

// Get all shapes in the document.
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
// Loop through all shapes.
foreach(Shape shape in shapes)
{
    // If shape contains a vector image, convert it to raster image.
    if (shape.HasImage && (shape.ImageData.ImageType == ImageType.Wmf || shape.ImageData.ImageType == ImageType.Emf))
    {
        using(MemoryStream vectorImageStream = new MemoryStream(shape.ImageData.ImageBytes))
        using(Image image = Image.FromStream(vectorImageStream))
        using(MemoryStream resterImageStream = new MemoryStream())
        {
            image.Save(resterImageStream, ImageFormat.Png);
            shape.ImageData.SetImage(resterImageStream);
        }
    }
}
// Save the document to PDF temp file
doc.Save(@"C:\Temp\out.pdf", pdfOps);

Best regards,

Thank you for the confirmation.

Yes, I was familiar with your example. That’s how I was also dealing with this as a workaround as I explained in my original post. And as I mentioned there, this solution not only produces a lower quality image, but also it adds a black background to the page.

How do you deal with this background black colour?

And of course the mandatory question: when do you think this issue should be fixed? Should I be asking here frequently? Can you share the ID# for this particular case?

Finally, can you tell us a bit more about the source of this problem? It seems like I’m not the first one converting EMF files to PDF, so there must be something unique about these files, no? Maybe understanding that I may find a way to port my files to a format supported by the current build of Aspose.Words?

Thank you in advance for any hints.

E

Hi,

A quick comment to say that I initially misread your sample code. I now noticed you told me to rasterize the image to PNG whereas I was doing it to JPEG. Changing this took care of the black background issue, so thank you for that.

But just to emphasize the importance of getting this fixed, the PDF files generated this way are at least 10 times bigger than what I get by using other end-user tools like PDFCreator and it takes more than 5 times the time to generate as well. For example, a one-page print out of a web page is 90KB and takes 3 seconds with PDFCreator, whereas with my ASPOSE-powered program the same generated PDF file is 1.3MB in size and takes almost 20 seconds to generate. And that’s without even considering the poorer quality because you can’t scale the file without getting all pixelated (caused by the EMF to PNG conversion most likely)

To be clear, I’m not attempting to create a program to compete with PDFCreator (or the likes) I’m just mentioning one example that I’m sure our users will compare us against when they use our software.

So, again, any hints on how to improve this workaround a bit so it’s not that far from other standard software packages, and I must ask one more time for an estimate for when this issue may get fixed in ASPOSE.Words?

Thanks in advance!

Esteban

Hello

Thank you for additional information. Let me clarify, as you know Aspose.Words is a class library for processing Word documents programmatically. On the one hand, MS Word document is flow document and does not contain any information about its layout into lines and pages. On other hand PDF format is “fixed page format”. So the task of conversion Word document to PDF is converting between flow and fixed page models. When you save document as PDF, document is layout into pages first (this process takes most of time of conversion) and then each page is saved to PDF.
Regarding the issue, it is difficult to provide you any reliable estimate regarding this problem. You will be notified as soon as it is fixed. And unfortunately, currently I cannot suggest you any other way to work this problem around.
Best regards,

Hi Andrey,

Thank you for providing more information about the problem. I’m looking forward to that new version that fixes this issue.

Just for my own education, something I don’t understand is how come tools like PDFCreator don’t seem to be affected by this “flow / fixed layout” issue you described when converting the Word doc to PDF?

The only reason I’m asking is to see if I can find a workaround. I love using ASPOSE.Words so I want to see if I could pre-process my source files somehow to accommodate to ASPOSE’s current limitations. What I actually need is to convert the EMF file to PDF, i.e., I can avoid the Word document step altogether if needed. Maybe you can suggest something along these lines?

PS. Again, I mention PDFCreator only as an example that I happen to have installed in my end-user PC. I’m in no way attempting to compare it to ASPOSE’s component.

Thanks in advance,

Esteban

Hi Esteban,
Thanks for your inquiry.
The reason why PDFCreator does not encounter any of these problems is because it is a virtual printer, which just intercepts the layout of a sent document in lines and pages (just like how a document is sent to the printer) and renders this to PDF instead. It does not store the actual data of the document (like Aspose.Words does).
Aspose.Words on the other hand loads in document formats and stores them in a dynamic format in the DOM. This allows you to fully add, modify and remove content before then saving or rendering to a different format.
In this case the issue is occuring during rendering because of some sort of issue to do with the font in the vector image. The only work around we can suggest for now is to convert it to raster first
We will inform you when the fix to this issue is avaliable through a post in this thread.
Thanks,

Hi Adam,

Thank you for the detailed explanation. That makes a lot of sense.

As I said, I’m looking forward to that fix. If you have some beta code you need someone to try, please let me know.

Excellent customer support!

Thanks,
Esteban

Hi Esteban,

Thanks for your inquiry. We will be sure to inform you of any developments regarding this issue.
Best regards,

The issues you have found earlier (filed as 22077) have been fixed in this .NET update and in this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

Thank you for the notification. I will give the new version a try a.s.a.p. and report back here my results.

Thanks