Converting Word Document To JPEG is very slow


#1

Hello,

I have a use case where I need to convert a word document to individual jpeg and tiff images. I am running into performance problems when saving the document as individual files. I have attached a sample file I trimmed down to for the code example.

Here is an example of the code that converts the document to jpeg, this takes about 1:30 seconds for 50 pages, the source document has 1000 pages and takes nearly 10 hours to create jpeg. Is there a better way to perform this operation? This doesn’t seem to happen when saving pdf documents to jpg (example below too).

Is there a better way to do this from word? The same thing happens when saving it to individual tiff files. Saving as a multi-page tiff does not have this problem but is not an option for me. I think it might be rendering the entire document each time a page is imaged.

    public void SaveAsJpg()
    {
        var input = @"D:\input\50pages.doc";

        System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();

        Document doc = new Document(input);

        int pageCount = doc.PageCount;
        for (int pageNumber = 0; pageNumber < doc.PageCount; pageNumber++)
        {
            var file = string.Format(@"d:\output\word_{0}.jpg", pageNumber);
            var options = new ImageSaveOptions(SaveFormat.Jpeg);
            options.PageIndex = pageNumber;
            options.PageCount = 1;
            doc.Save(file, options);
        }

        sw.Stop();
        Console.WriteLine(sw.Elapsed);
        Console.ReadLine();
    }

Here is an example of saving it to pdf then imaging the pdf, it runs in about 30 seconds for the attached file. On the orginal file that took 10 hours, converting it to pdf then imaging the pdf took about 25 minutes.
public void SaveAsPdfThenJpeg()
{
var input = @“D:\input\50pages.doc”;
var pdfFile = @“D:\output\50pages.pdf”;

        System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();

        Document doc = new Document(input);
        doc.Save(pdfFile, SaveFormat.Pdf);

        Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(pdfFile);
        var resolution = new Aspose.Pdf.Devices.Resolution(300);
        var jpegDevice = new Aspose.Pdf.Devices.JpegDevice(resolution);

        int pageCount = 0;
        foreach (Aspose.Pdf.Page page in pdf.Pages)
        {
            var file = string.Format(@"d:\output\pdf_{0}.jpg", pageCount);
            jpegDevice.Process(page, file);
            pageCount++;
        }


        sw.Stop();
        Console.WriteLine(sw.Elapsed);
        Console.ReadLine();
    }

50pages.zip (76.3 KB)

Any suggestions would be great.

Thanks

Ed


#2

@ejt66,

Please upgrade to the latest version of Aspose.Words for .NET i.e. 18.1 and try using the following code:

Document doc = new Document(MyDir + @"50pages\50pages.doc");

ImageSaveOptions opts = new ImageSaveOptions(SaveFormat.Jpeg);
opts.PageIndex = 0;
opts.PageCount = doc.PageCount;
opts.UpdateFields = false;
opts.PageSavingCallback = new HandlePageSavingCallback();

doc.Save(MyDir + @"50pages\18.1.jpg", opts);

The class implementing the IPageSavingCallback Interface is:

private class HandlePageSavingCallback : IPageSavingCallback
{
    public void PageSaving(PageSavingArgs args)
    {
        args.PageFileName = string.Format(@"D:\Temp\50pages\Page_{0}.jpg", args.PageIndex);
    }
}

#3

Beautiful! Took my 10 hour process down to less than 4 minutes!

Thanks for your all your help, you guys always get back really quickly.

My project is to using aspose to convert documents (word, excel, power point, etc) into single page images (jpeg and tiff) . Is this page saving callback pattern used in other Aspose products?


#4

@ejt66,

Thank you for your feedback. We are looking into your requirement. We will update you very soon about our findings.


#5

@ejt66,

This is to update you that you can convert power point documents to image. Slide.GetThumbnail() method has several overloads that generates the image and return bitmap object that can then be saved to different image formats like JPEG, BMP, PNG or GIF. Please visit the link Creating Slides Thumbnail Image for details.