How do I speed up TIFF frames extraction/writing?


#1
Hi Aspose team!

I need to extract TIFF image frames from a multiple-paged TIFF image. The only library that does not seem to require `RenderedImage`s I could find is the Aspose Imaging library. Now, I have the following test code like to make a microbenchmark:

final Stopwatch stopwatch = createStarted();
final File inputFile = new File("../input.tiff");
try ( final InputStream inputStream = new FileInputStream(inputFile);
final TiffImage sourceTiffImage = (TiffImage) Image.load(inputStream) ) {
final TiffFrame[] frames = sourceTiffImage.getFrames();
System.out.printf("%sms: %d frames, %d bytes\n", stopwatch.elapsed(MILLISECONDS), frames.length, inputFile.length());
for ( int i = 0; i < frames.length; i++ ) {
final File outputFile = new File("../output." + i + ".tiff");
try ( final OutputStream outputStream = new FileOutputStream(outputFile) ) {
final TiffFrame sourceTiffFrame = frames[i];
final TiffFrame destinationTiffFrame = copyFrame(sourceTiffFrame);
try ( final TiffImage destinationTiffImage = new TiffImage(destinationTiffFrame) ) {
destinationTiffImage.save(outputStream, sourceTiffFrame.getFrameOptions());
}
System.out.printf("%sms: frame #%d, %d bytes\n", stopwatch.elapsed(MILLISECONDS), i + 1, outputFile.length());
}
}
}

The report output:

1176ms: 32 frames, 85797844 bytes
3391ms: frame #1, 3096072 bytes
5959ms: frame #2, 2308852 bytes
7825ms: frame #3, 1867124 bytes
9933ms: frame #4, 5288200 bytes
12022ms: frame #5, 2135232 bytes
13928ms: frame #6, 2043956 bytes
15842ms: frame #7, 4862936 bytes
17804ms: frame #8, 4395816 bytes
19702ms: frame #9, 3376140 bytes
21646ms: frame #10, 4763260 bytes
23551ms: frame #11, 2102252 bytes
25404ms: frame #12, 2179348 bytes
27409ms: frame #13, 4900184 bytes
29333ms: frame #14, 2070264 bytes
31172ms: frame #15, 2168304 bytes
33024ms: frame #16, 2171856 bytes
35128ms: frame #17, 4005108 bytes
37123ms: frame #18, 5468384 bytes
39356ms: frame #19, 3244668 bytes
41187ms: frame #20, 1974792 bytes
43028ms: frame #21, 1877584 bytes
44910ms: frame #22, 1938668 bytes
46838ms: frame #23, 1924736 bytes
49204ms: frame #24, 1994724 bytes
51366ms: frame #25, 1916320 bytes
53313ms: frame #26, 1966052 bytes
55174ms: frame #27, 1867436 bytes
57107ms: frame #28, 1790396 bytes
58909ms: frame #29, 644328 bytes
60842ms: frame #30, 2189824 bytes
63636ms: frame #31, 2479336 bytes
65500ms: frame #32, 608292 bytes

As you can see, the input document contains 32 pages that are about 82 MB in total. I'm extracting the frames page by page and it takes about 2 seconds to extract and write each page. The overall multi-paged TIFF file parsing/analysis takes about 1-2s at the very start and this is perfectly fine to me.

The main performance killer here is the destinationTiffImage.save(...) invocation, and I guess it takes too long due to re-encoding (am I getting the method behind the scenes stuff right?). Would it be possible just to store source frames without re-encoding or whatever heavy stuff happenning during the destinationTiffImage.save(...) method invocation directly to target single TIFF page images?

Am I doing something wrong and is there any way to speed the things up? Or, probably, the save method is designed to re-encode and I have to use another method just to redirect frames? Any help would be greatly appreciated. Thank you!

(P. S. My another idea was extracting the source TIFF image input streams directly somehow using the DataStreamSupporter.getDataStreamContainer method, and then just decorating it somehow to let it be a single-image TIFF file (don't know yet if it's ever possible), but the method always returns null for my files. But I'm afraid it might return raw data always without any way of putting the original metadata to the destination files.)

#2

Hi Alex,

Thanks for inquiring Aspose.Imaging.

I have observed the requirements and used sample code shared by you and request you to please share the source TIFF file that you are using on your end. I will be able to investigate the issue further on my end on provision of requested information.

Many Thanks,


#3

Hi Mudassir,


Thank you for the quick reply! I edited my initial question in order to fix some grammar and improve it a little. I’m not sure if it’s fine to upload such a big file directly to the forum board, so please find the source TIFF file here: https://www.dropbox.com/s/7y68bz0rwygkqjr/input.tiff?dl=0 (Mass Effect article in Russian from the Wikipedia [RU] => PDF export => PDF to TIFF converter).

#4

Hi Alex,


Thank you for sharing the source file with us. I have worked with source file shared by you using Aspose.Imaging for Java 17.4 and have been able to observe the issue. A ticket with ID IMAGINGJAVA-717 has been added as enhancement in our issue tracking system to further investigate and resolve the issue. This thread has been linked with the issue so that we may share notification with you once issue will be fixed.

We are sorry for your inconvenience,

#5
Hi Muhammad,

That's really great! Thank you very much!

#6

@user33,

Can you please share files once again. We are unable to access files you shared.


#7

The issues you have found earlier (filed as ) have been fixed in this Aspose.Words for JasperReports 18.3 update.