Aspose Words Layout Engine Infinite Hang Bug Report
Issue Summary
Product: Aspose.Words for Java
Version: 26.3
Issue: Layout engine enters infinite loop during getPageCount(), updateFields(), or save() operations
Impact: Document conversion hangs indefinitely (tested > 1 hour)
Reproduction
Reproduction Document
File: stuck.docx
stuck.docx (50.6 KB)
Size: 44KB
Pages: 1 page
Note: Customer-sensitive information has been redacted from this document
Document Characteristics
Based on comparison with a working copy:
- Style Collection: Only 17 styles (normal documents have 30-40+)
- Structure: 1 section, 2 tables, 68 paragraphs, 3 shapes
- History: Document accumulated 53 revisions before sanitization
- Likely Issue: Corrupted internal structure from accumulated revisions
Reproduction Code
import com.aspose.words.Document;
import com.aspose.words.LoadOptions;
import com.aspose.words.PdfSaveOptions;
public class ReproduceLayoutHang {
public static void main(String[] args) throws Exception {
// Load document
LoadOptions loadOptions = new LoadOptions();
Document doc = new Document("stuck.docx", loadOptions);
System.out.println("Document loaded successfully");
// ANY of these operations will hang indefinitely:
// Option 1: getPageCount() hangs
int pageCount = doc.getPageCount(); // HANGS HERE
// Option 2: updateFields() hangs
// doc.updateFields(); // HANGS HERE
// Option 3: save() hangs
// PdfSaveOptions options = new PdfSaveOptions();
// doc.save("output.pdf", options); // HANGS HERE
}
}
Expected Behavior
getPageCount()should return quickly (< 1 second for 1-page document)updateFields()should complete in < 5 secondssave()should complete in < 10 seconds
Actual Behavior
- All layout-triggering operations hang indefinitely
- No exceptions thrown
- No error messages
- CPU usage remains steady (not a performance issue, it’s stuck in a loop)
Analysis
Root Cause (Suspected)
The document appears to have corrupted internal structure that creates circular dependencies in the layout engine:
- Style collection corruption: Only 17 styles vs 30-40+ in normal documents
- Accumulated revision history: 53 revisions may have caused internal inconsistencies
- Table metadata corruption: Document has 2 tables which are involved in the hang
Diagnostic Tests Performed
We systematically tested various hypotheses:
| Test | Result | Conclusion |
|---|---|---|
| Remove all shapes | Shapes not the issue | |
| Remove all tables | Tables involved but… | |
| Modify table properties | Can’t fix by changing table settings | |
| Copy content to new document | Internal structure is corrupted |
Workaround Found
Solution: Copy all document content to a new document via importNode():
Document problematicDoc = new Document("problematic.docx", loadOptions);
// Create new document
Document newDoc = new Document();
newDoc.removeAllChildren();
// Copy all sections (deep clone)
for (int i = 0; i < problematicDoc.getSections().getCount(); i++) {
Section section = problematicDoc.getSections().get(i);
Section importedSection = (Section) newDoc.importNode(
section,
true, // deep clone
ImportFormatMode.KEEP_SOURCE_FORMATTING
);
newDoc.appendChild(importedSection);
}
// New document converts successfully!
int pageCount = newDoc.getPageCount(); // Works!
newDoc.save("output.pdf", saveOptions); // Works!
This workaround:
Successfully resolves the hang
Preserves all user-visible content
Strips corrupted internal metadata
Request to Aspose
Primary Request
Please investigate why this document causes an infinite loop in the layout engine and provide a fix in a future release.
Questions
- Is there internal diagnostic tooling to identify the specific corrupt structure?
- Can Aspose provide a document “sanitization” utility to fix such corruption?
- Are there known issues with documents that accumulate many revisions?
- Can the layout engine be made more resilient to corrupted internal structures?
Suggested Fix Approaches
- Add timeout protection to layout engine operations
- Add circular dependency detection to prevent infinite loops
- Improve error reporting when corrupt structures are detected
- Document validation utility to identify corruption before conversion
Additional Context
Customer Impact
This issue affects production conversion workflows where customers upload documents with unknown history. Documents that:
- Have been edited extensively (many revisions)
- May have been created in older Word versions
- May have SharePoint metadata
Sanitized Document Details
The attached reproduction document has been sanitized for privacy:
Customer names, company info redacted
Custom properties cleared (SharePoint metadata removed)
Author information anonymized
Internal structure preserved (still reproduces hang)
System Information
- Java Version: Java 17
- Aspose.Words Version: 26.3
- Original Document: Created in Microsoft Word (version unknown)
- File Format: DOCX (Office Open XML)
Attachments
stuck.docx- Reproduction document- Comparison data showing structural differences vs working document
Support Request
Please investigate this issue and provide guidance on:
- How to identify such corrupted documents before conversion
- Whether a fix can be provided in the layout engine
- If the
importNode()workaround is the recommended approach
Thank you for your support!