Hello,
I have a use case where I need to convert a word document to individual jpeg and tiff images. I am running into performance problems when saving the document as individual files. I have attached a sample file I trimmed down to for the code example.
Here is an example of the code that converts the document to jpeg, this takes about 1:30 seconds for 50 pages, the source document has 1000 pages and takes nearly 10 hours to create jpeg. Is there a better way to perform this operation? This doesn’t seem to happen when saving pdf documents to jpg (example below too).
Is there a better way to do this from word? The same thing happens when saving it to individual tiff files. Saving as a multi-page tiff does not have this problem but is not an option for me. I think it might be rendering the entire document each time a page is imaged.
public void SaveAsJpg()
{
var input = @"D:\input\50pages.doc";
System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
Document doc = new Document(input);
int pageCount = doc.PageCount;
for (int pageNumber = 0; pageNumber < doc.PageCount; pageNumber++)
{
var file = string.Format(@"d:\output\word_{0}.jpg", pageNumber);
var options = new ImageSaveOptions(SaveFormat.Jpeg);
options.PageIndex = pageNumber;
options.PageCount = 1;
doc.Save(file, options);
}
sw.Stop();
Console.WriteLine(sw.Elapsed);
Console.ReadLine();
}
Here is an example of saving it to pdf then imaging the pdf, it runs in about 30 seconds for the attached file. On the orginal file that took 10 hours, converting it to pdf then imaging the pdf took about 25 minutes.
public void SaveAsPdfThenJpeg()
{
var input = @“D:\input\50pages.doc”;
var pdfFile = @“D:\output\50pages.pdf”;
System.Diagnostics.Stopwatch sw = System.Diagnostics.Stopwatch.StartNew();
Document doc = new Document(input);
doc.Save(pdfFile, SaveFormat.Pdf);
Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(pdfFile);
var resolution = new Aspose.Pdf.Devices.Resolution(300);
var jpegDevice = new Aspose.Pdf.Devices.JpegDevice(resolution);
int pageCount = 0;
foreach (Aspose.Pdf.Page page in pdf.Pages)
{
var file = string.Format(@"d:\output\pdf_{0}.jpg", pageCount);
jpegDevice.Process(page, file);
pageCount++;
}
sw.Stop();
Console.WriteLine(sw.Elapsed);
Console.ReadLine();
}
50pages.zip (76.3 KB)
Any suggestions would be great.
Thanks
Ed