- Product: Aspose.Words for Java
- Version: 24.2
- License Type: Licensed
Issue Description:
During the HTML-to-PDF conversion, I am trying to implement a timeout mechanism for document loading using IDocumentLoadingCallback (or ProgressCallback), but the callback is not being invoked during the document loading process.
What I’m trying to achieve:
- Load Word documents with a timeout mechanism
- Stop the loading operations after certain time(Request timeout).
- Cancel the loading operation if it exceeds a specified time limit so resource will be free.
Current Behavior:
- The
IDocumentLoadingCallback.notify() method is never called during document loading
- My timeout mechanism cannot work because the callback isn’t invoked
- Documents that should timeout continue loading.
Test Document:
I can provide a test document if needed. Please let me know:
- The current HTML file is approximately 18 MB and contains table-structured data.
- It’s takes ~20 minutes to complete the loading process.
What I Need:
- Is IDocumentLoadingCallback supposed to work during new Document() loading, or only for specific scenarios?
- Are there specific document formats or loading methods where the callback is invoked?
- Is there a different mechanism I should use for implementing timeouts during document loading?
- Could this be a bug where the callback is accepted but not actually invoked?
Code to Reproduce the Issue:
package com.abc.aspose.service;
import java.util.Date;
import java.util.concurrent.TimeUnit;
import com.aspose.words.Document;
import com.aspose.words.DocumentLoadingArgs;
import com.aspose.words.IDocumentLoadingCallback;
import com.aspose.words.LoadOptions;
public class HtmlToPdfConverter {
public static void main(String[] args) {
try {
setAsposePdfLicense();
System.out.println("Loading with 20 second timeout ===");
loadDocumentWithTimeout("/home/user/Downloads/test-18MB.html", 20);
} catch (Exception e) {
e.printStackTrace();
}
}
private static void loadDocumentWithTimeout(String filePath, double maxDurationSeconds) {
try {
LoadingProgressCallback callback = new LoadingProgressCallback(maxDurationSeconds);
LoadOptions opts = new LoadOptions();
opts.setProgressCallback(callback);
System.out.println("Starting to load: " + filePath);
System.out.println("Max allowed duration: " + maxDurationSeconds + " seconds");
Document doc = new Document(filePath, opts);
System.out.println("? Document loaded successfully!");
} catch (IllegalStateException e) {
// Timeout occurred
System.err.println("? Loading cancelled: " + e.getMessage());
} catch (Exception e) {
System.err.println("? Error loading document: " + e.getMessage());
}
}
private static void setAsposePdfLicense() {
try {
com.aspose.words.License license = new com.aspose.words.License();
license.setLicense("/home/user/a.lic");
System.out.println("Aspose.Words license applied successfully.");
} catch (Exception e) {
}
}
}
class LoadingProgressCallback implements IDocumentLoadingCallback {
private final Date startTime;
private final double maxDurationSeconds;
public LoadingProgressCallback(double maxDurationSeconds) {
this.startTime = new Date();
this.maxDurationSeconds = maxDurationSeconds;
System.out.println("Progress callback initialized at: " + startTime);
}
@Override
public void notify(DocumentLoadingArgs args) {
System.out.printf("notify called");
// Calculate elapsed time
Date currentTime = new Date();
long diff = currentTime.getTime() - startTime.getTime();
double elapsedSeconds = TimeUnit.MILLISECONDS.toSeconds(diff) + (diff % 1000) / 1000.0;
// Check if we've exceeded the timeout
if (elapsedSeconds > maxDurationSeconds) {
String msg = String.format("Loading timeout exceeded! Progress: %.1f%%, Time: %.2fs, Max: %.2fs",
args.getEstimatedProgress() * 100, elapsedSeconds, maxDurationSeconds);
throw new IllegalStateException(msg);
}
}
}
@vvarmora7930
You are trying to implement a timeout mechanism for loading large HTML documents using IDocumentLoadingCallback or ProgressCallback in Aspose.Words for Java 24.2, but the callback is not being invoked during the loading process .
- The
IDocumentLoadingCallback and ProgressCallback mechanisms in Aspose.Words for Java are intended to provide progress notifications during document loading, but they may not be invoked for all document formats or loading scenarios, particularly for large HTML files .
- In Aspose.Words for Java version 24.2, the
ProgressCallback set via LoadOptions.setProgressCallback() is primarily designed for specific operations such as saving documents or applying transformations, and may not trigger during the initial loading phase of large HTML documents .
- For implementing a timeout mechanism during document loading, especially for large HTML files, the recommended approach is to use a separate thread with a timeout mechanism (e.g.,
ExecutorService with Future) rather than relying on IDocumentLoadingCallback or ProgressCallback .
Here is a revised version of your code using ExecutorService for timeout handling:
package com.abc.aspose.service;
import java.util.Date;
import java.util.concurrent.*;
import com.aspose.words.Document;
import com.aspose.words.LoadOptions;
public class HtmlToPdfConverter {
public static void main(String[] args) {
try {
setAsposePdfLicense();
// Load with 20 second timeout
System.out.println("=== Example 1: Loading with 20 second timeout ===");
loadDocumentWithTimeout("/home/user/Downloads/test-18MB.html", 20);
} catch (Exception e) {
e.printStackTrace();
}
}
private static void loadDocumentWithTimeout(String filePath, double maxDurationSeconds) {
ExecutorService executor = Executors.newSingleThreadExecutor();
Future<Document> future = executor.submit(() -> {
LoadOptions opts = new LoadOptions();
// Note: ProgressCallback is not used here for timeout control
return new Document(filePath, opts);
});
try {
Document doc = future.get((long) maxDurationSeconds, TimeUnit.SECONDS);
System.out.println("? Document loaded successfully!");
} catch (TimeoutException e) {
future.cancel(true); // Attempt to cancel the loading task
System.err.println("? Loading timed out after " + maxDurationSeconds + " seconds");
} catch (ExecutionException e) {
System.err.println("? Error loading document: " + e.getCause().getMessage());
} catch (InterruptedException e) {
Thread.currentThread().interrupt(); // Restore interrupt status
System.err.println("? Loading interrupted");
} finally {
executor.shutdown();
}
}
private static void setAsposePdfLicense() {
try {
com.aspose.words.License license = new com.aspose.words.License();
license.setLicense("/home/user/a.lic");
System.out.println("Aspose.Words license applied successfully.");
} catch (Exception e) {
// Handle license error appropriately
}
}
}
This approach uses a Future with a timeout to manage the loading process, which is more reliable for enforcing time limits on document loading than relying on callbacks that may not be triggered for large HTML files.
With the solution you suggested, the thread will be cancelled, but the resources (CPU and memory) are still not being released.
@vvarmora7930
You are trying to implement a timeout mechanism for loading large HTML documents in Aspose.Words for Java 24.2 using IDocumentLoadingCallback or ProgressCallback, but the callback is not invoked during the loading process . You’ve adopted a workaround using ExecutorService with Future to enforce a timeout, but you’re observing that CPU and memory resources are not being released even after cancellation .
The IDocumentLoadingCallback and ProgressCallback in Aspose.Words for Java 24.2 are not invoked during the initial loading phase of large HTML documents, which prevents implementing a timeout mechanism using these callbacks . Using ExecutorService with Future for timeout handling is a more reliable approach for enforcing time limits on document loading, but it does not guarantee immediate release of CPU and memory resources when the task is cancelled .
High CPU load and memory consumption during document processing, including HTML-to-PDF conversion, has been reported in other cases involving Aspose.Words for Java, suggesting potential performance issues with large documents or repeated resource access . This aligns with your observation that resources are not freed after cancellation.
To mitigate resource consumption, consider:
- Ensuring proper cleanup of resources after cancellation by explicitly closing documents or using try-with-resources blocks.
- Investigating whether caching mechanisms or pre-loading of shared resources can reduce repeated resource access .
- Profiling your application to identify bottlenecks in resource handling or repeated operations .
A support engineer will follow up in this thread if needed.
@vvarmora7930 Could you please attach your problematic input document here for testing? We will check the issue and provide you more information.
Thanks for reaching out. Please find the test HTML document attached. I have compressed it into a ZIP file since HTML files are not allowed to be uploaded here.
html-table-18mb.html.zip (94.4 KB)
@vvarmora7930 Thank you for additional information. In LoadOptions.ProgressCallback Docx, FlatOpc, Docm, Dotm, Dotx, Markdown, Rtf, WordML, Doc, Dot, Odt, Ott formats supported. Unfortunately, HTML format is not supported yet. The feature request is logged as WORDSNET-29003.
1 Like
Hi @alexey.noskov
Additionally, I need the IDocumentSavingCallback.notify() method to be invoked when saving an HTML document to PDF. Currently, this callback is not being triggered when saving the document using Document.save() with SaveFormat.PDF.
As per the documentation, SaveFormat.PDF is not supported yet for IDocumentSavingCallback.notify().
Documentation: SaveOptions | Aspose.Words for Java
Questions
- Why is
IDocumentLoadingCallback not supported for HTML loading?
- Why is
IDocumentSavingCallback not supported when saving to PDF format?
- Could you please confirm whether we can expect a fix for this issue, as well as for
WORDSNET-29003, in the next upcoming release?
@vvarmora7930
This feature is not yet implemented for HTML and some other input file formats. We will keep you updated and let you know once it is resolved.
Currently IDocumentSavingCallback is supported only for Docx, FlatOpc, Docm, Dotm, Dotx, Doc, Dot, Html, Mhtml, Epub, XamlFlow, and XamlFlowPack.
For PDF format you can use IDocumentSavingCallback to cancel save operation.
Upon saving document to PDF or any other fixed page formats, the most time is taken by document layout process than by saving operation. You can interrupt the layout process using IPageLayoutCallback.
No, unfortunately, I cannot promise WORDSNET-29003 will be resolved in the next version of Aspose.Words. This is quite big and complex feature with quite low demand. So currently the feature implementation is not scheduled.