Aspose Words & Imaging Performance tuning

hemassridhar · January 22, 2024, 6:07am

We are using Aspose in Java 11.0.16 on Linux (RHEL 8), Words - v22.7, Imaging - v23.8. Also we are deploying the Java application in the Kubernetes as a container.

We have wide range documents that flows into our app and there is no way for us to know the Fonts that will be used by a given document, we have placed all the Fonts from C:/Windows/Fonts/ to the container and setting path of FontsFolder using FontSettings.getDefaultInstance().setFontsFolder(fontsPath, false) at app startup.

We have two use cases:

Read the Word Document properly (we are setting Fonts path like mentioned above & also Field Locking field.isLocked(true); )
Read the Word document like mentioned in point 1 and then converting to TIFF.

Is there anything we can do to improve performance for both the use cases please? We are stuck with this issue in PROD.

alexey.noskov · January 22, 2024, 6:16am

@hemassridhar Could you please clarify what the problem is? Does Aspose.Words stuck on loading or saving the document? Does the problem occurs with all documents or with some particular document? If the problem occurs with a particular document, please attach it here for testing.
Also, please note, TIFF format is not directly supported by Aspose.Words. To process TIFF images you should add additional dependencies:
https://docs.aspose.com/words/java/system-requirements/#optional-dependencies

You can try adding the following to the POM file:

<dependency>
    <groupId>javax.media.jai</groupId>
    <artifactId>com.springsource.javax.media.jai.core</artifactId>
    <version>1.1.3</version>
</dependency>

hemassridhar · January 22, 2024, 6:29am

@alexey.noskov, it is taking quite sometime to process any document, like in the order of mins (so far we have noticed roughly 2 mins) sometimes.

Also, we are using jai like below;

    <dependency>
        <groupId>javax.media</groupId>
        <artifactId>jai-core</artifactId>
        <version>1.1.3</version>
    </dependency>

    <dependency>
        <groupId>com.sun.media</groupId>
        <artifactId>jai-codec</artifactId>
        <version>1.1.3</version>
    </dependency>

alexey.noskov · January 22, 2024, 6:35am

@hemassridhar Could you please attach the problematic document and code that will allow us to reproduce the problem? We will check it on our side and provide you more information.

hemassridhar · January 22, 2024, 7:27am

Unfortunately, I will not be able to share the word document, but can share the sample code:

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
LoadOptions loadOptions = new LoadOptions();
loadOptions.setLoadFormat(LoadFormat.AUTO);
Document doc0 = new Document();
for (int i = 0; i < streamsList.size(); i++)
{
    if (i == 0)
    {
        doc0 = new Document(streamsList.get(i), loadOptions);
        for (Field field : doc0.getRange().getFields())
        {
            field.isLocked(true);
        }
    }
    else
    {
        Document doc1 = new Document(streamsList.get(i), loadOptions);
        for (Field field : doc0.getRange().getFields())
        {
            field.isLocked(true);
        }
        doc0.appendDocument(doc1, ImportFormatMode.USE_DESTINATION_STYLES);
    }
}
int totalpagecount = doc0.getPageCount();
doc0.save(outputStream, SaveFormat.DOC);
byte[] bytearr = outputStream.toByteArray();
InputStream inptstream = new ByteArrayInputStream(bytearr);
outputStream.close();
mergestream.setInputstream(inptstream);
mergestream.setToatlapgecnt(totalpagecount);

alexey.noskov · January 22, 2024, 7:33am

@hemassridhar As I can see your code simply concatenates documents into one document. The most time consuming operation in your code is the following line of code:

int totalpagecount = doc0.getPageCount();

Since it is required to build document layout to calculate number of pages in the final document.

hemassridhar · January 22, 2024, 8:03am

@alexey.noskov
Yeah, we have noticed that in our app monitoring tools, so is there any other way to get page count?

I have seen a method, doc.getBuiltInDocumentProperties().getPages(), is it safe to use this method? What are the difference between these two?

alexey.noskov · January 22, 2024, 8:22am

@hemassridhar MS Word documents are flow by their nature, so there is no “page” concept. Consumer applications reflows document content into pages on the fly. The same does Aspose.Words when you convert document to fixed page format such as PDF, XPS Image etc, or call Document.getPageCount() property. BuiltInDocumentProperties.Pages property is written by consumer application and does not always contain an actual information. Especially in your case you are concatenating several documents, the BuiltInDocumentProperties.Pages property will contain the value written by the consumer application for the first document only. Also, you should note that it is not mandatory for consumer application to write this property at all.

hemassridhar · January 22, 2024, 9:44am

@alexey.noskov, is it possible to join a call with us?

https://jpmchase.zoom.us/my/callhema

Any help would be highly appreciated.

alexey.noskov · January 22, 2024, 9:45am

@hemassridhar Unfortunately, we do not provide support via phone or video calls. The main place for getting support is this forum.

hemassridhar · January 22, 2024, 10:56am

@alexey.noskov, okay sure.

So, all we are doing is just merge multiple streams we get (each stream may have one or more pages). We want to merge all of them into one and finally trying to get the pageCount. So, in this case, BuiltInDocumentProperties will not have the updated page count, is my understanding right? And also, to get right page count Document.getPageCount() is the go to method, right?

Also we are setting fonts folder at app startup as:
FontSettings.getDefaultInstance().setFontsFolder(fontsPath, false), and this folder has all the windows fonts copied and this app is running on Linux container. Is this is good practice to do so? Because, when Document.getpageCount() method is called, FontSettings method is the one taking lot of time.

We are also looking for options to preload fonts at startup so that Aspose need not to read from folder everytime. There seems to be caching mechanism available for font load but I am thinking may be it will build search index cache over the period of time. Not really sure if this helps to solve the problem.

alexey.noskov · January 22, 2024, 12:52pm

@hemassridhar

Yes, you are right.

Yes, it is good practice to set fonts folder on application start. But you can also, force Aspose.Words to read and catch fonts by saving an empty document to PDF on application start.
On the first call Aspose.Words inits static resources, such as fonts, which are then reused on the subsequent calls. You can create an empty document and save it as PDF, for example, on your application start to force Aspose.Words to init resources to avoid “cold” start on the real requests.

hemassridhar · January 22, 2024, 1:10pm

@alexey.noskov, thanks for the response.

In this case, should we consider this as well:
https://docs.aspose.com/words/java/specify-truetype-fonts-location/?secureweb=Teams&secureweb=Teams#save-and-load-a-font-search-cache

alexey.noskov · January 22, 2024, 1:18pm

@hemassridhar You can consider this as well if you would like to speed up application start where the fonts will be read. But if application start even occurs rarely, it is not necessary to catch the fonts.

hemassridhar · January 22, 2024, 1:29pm

@alexey.noskov. Possible to share the sample code?

Also, for the first time when Aspose tries to convert .doc to PDF or just build document, it looks for all the fonts in the predefined folder path (in our case, we have folderpathconfigured and path has all Windows fonts from C drive) and will not look again and again for next document loads or conversion, is my understanding accurate?

alexey.noskov · January 22, 2024, 1:41pm

@hemassridhar Here is code example of using font cache:

FontSettings.getDefaultInstance().setFontsSources(
        new FontSourceBase[] { new FolderFontSource("C:\\Temp\\fonts", true) });
// Save the parsed font catch.
FileOutputStream cache = new FileOutputStream("C:\\temp\\cache");
FontSettings.getDefaultInstance().saveSearchCache(cache);


// .......................
// .......................
// Load cache.
FontSettings.getDefaultInstance().setFontsSources(
        new FontSourceBase[] { new FolderFontSource("C:\\Temp\\fonts", true) },
        new FileInputStream("C:\\temp\\cache"));

Yes, you are right, Aspose.Words reads all fonts only once, then the read information is reused by Aspose.Words in the subsequent calls.