Aspsose.Words Throwing File Corrupted Exception

Hi Aspose Team,
I am using Aspose.Words Version 9.7.0.0 to convert Word files into PDF.When I try to convert a word file with 11424 pages of 10MB size, I got the Exception,
FileCorruptedException with the statement “The document appears to be corrupted and can’t be loaded”.
I also tested with the latest dll 10.0.0.0 and got the same Exception.So I would like to know

  1. What is the maximum size of the word file that Aspose.Words is able to convert.
  2. What is the maximum limit on number of pages.

Here I am attaching the word file that gave me error. Provide a solution to this problem.

Thanking you
sbjampani

Hi
Thanks for your request. I cannot reproduce the problem on my side. I can successfully open your document using the latest version of Aspose.Words.
There is no limitation of document size. The only limitation is an amount of memory on your side. But anyways, it is better to use few small documents than one huge.
Best regards,

Hi Alexey,
I convert the document into memory stream which results in 10,454,302bytes.Aspose is reading fine up to 7,000,000 bytes correctly and then throwing the File Corrupted Exception.I used the following code.

Document doc = new Document(memorystream);

This gives me the error.Try in this way and provide me the solution.Share with me the code you used for conversion and the resulted PDF file.

Regards
sbjampani

Hi
Thanks for your request. Here is code I used for testing:

using (MemoryStream docStream = new MemoryStream(File.ReadAllBytes(@"Test001\in.docx")))
{
    Document doc = new Document(docStream);
    Console.WriteLine("Loaded successfully!");
    doc.Save(@"Test001\out.pdf");
    Console.WriteLine("Converted successfully!");
}

I attached the output PDF. Conversion takes quite long (about 5 minutes), but this time is expected due to the size of your document (11424 pages).
I tested in Windows 7 on Phenom II X6 1090T with 4GB of RAM.
Best regards,

Even I got same exception. I used 10.6 version. Following is that exception I got:

2/3/2012 4:00:11 AM Message: HandlingInstanceID: dec1f4d6-cb73-4e04-a741-f62dc33ddf2e
An exception of type ‘Aspose.Words.FileCorruptedException’ occurred and was caught.
2012-02-02 23:00:11Z
Aspose.Words.FileCorruptedException, Aspose.Words, Version=9.0.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56
The document appears to be corrupted and cannot be loaded.
Aspose.Words in file: C:\WINDOWS\TEMP\IntraLinks\3caf44f6-abe1-40e5-a95e-e9ddcb0f80da.docx

System.Collections.ListDictionaryInternal
Void \uc0\u1296 (System.IO.Stream, Aspose.Words.LoadFormat, System.String)
at Aspose.Words.Document.\uc0\u1296 (Stream 
\f1 \uc0\u1293 
\f0 , LoadFormat 
\f1 \uc0\u1294 
\f0 , String 
\f1 \uc0\u1295 
\f0 )</span>
at Aspose.Words.Document.
\f1 \uc0\u1292 
\f0 (Stream 
\f1 \uc0\u1293 
\f0 , LoadFormat 
\f1 \uc0\u1294 
\f0 , String 
\f1 \uc0\u1295 
\f0 )
at Aspose.Words.Document…ctor(String fileName, LoadFormat loadFormat, String password)
at Aspose.Words.Document…ctor(String fileName)
at Intralinks.Common.Utilities.AsposeConverter.ConvertWordToPdf(String sourceFilePath, String destinationFilePath) in c:\ilbuilds\pdfconv\build_trunk\CommonUtilities
AsposeConverter.cs:line 38
at Intralinks.Common.Utilities.CommonUtilities.ConvertOfficeDocsToPDF(String sourceFile, String destinationFolder, Boolean useOriginalFileName) in c:\ilbuilds\pdfconv\build_trunk\CommonUtilities
CommonUtilities.cs:line 1357

<info name="WindowsIdentity" value="NT AUTHORITY
SYSTEM" />

Hi Viral,

First off, please try to use the latest version at your end and see if it resolves your issue. However, if it still doesn’t resolve your issue then please share the input document file with us. Also, please share the details of the environment where it fails i.e. OS, .NET Framework version, application type (Web, Windows, Service etc.). We’ll investigate the issue at our end and guide you accordingly.

We’re sorry for the inconvenience.

Please find attached the sample word document you requested. Following are the system details:

OS: Microsoft Windows Server 2003 R2 Standard Edition service pack 2
.Net framework version: 3.5
Application type: Service

Please do the needful asap.

Hi Viral,

Thanks for your inquiry. I was unable to reproduce this exception on my side. I would suggest you please upgrade to the latest version of Aspose.Words i.e. v11.0.0.

I hope, this will help.

Best Regards,

I tested with the version you mentioned but still it is failing.

Hi Viral,

Thanks for your inquiry. I’m afraid, I still can’t reproduce this exception on my side using the latest version of Aspose.Words i.e. 11.0.0 and the input document you attached. Could you please double check if you are using the latest version (11.0.0)? You can dynamically check if you’re referencing the correct DLL by using the following code snippet:

System.Reflection.Assembly[]
assemblies = AppDomain.CurrentDomain.GetAssemblies();
foreach (System.Reflection.Assembly
assembly in assemblies)
{
    System.Reflection.AssemblyName
    assemblyName = assembly.GetName();
    if (assemblyName.Name.Contains("Aspose.Words"))
        Console.WriteLine("Aspose.Words
        Version Number: " + assemblyName.Version.ToString());
}

If we can help you with anything else, please feel free to ask.

Best Regards,

It is working for you as you are converting with higher memory. I just converted with 1.5 GB available physical memory and it fails with 22 MB file. Before I started conversion it was 1.6 GB consumed and when I passed control through Document load() it gone upto 2.6 GB so 1 GB consumed for loading document and when I passed control through document.Save() it consumed all 3 GB and crashed with out of memory exception. Sometimes it crashes with FileCorruptedExcetion as outer exception and outOfMemory as innner exception. Following is sample. Can you please send me the calculation to get required physical memory for the given source file size urgently? I need for .doc and .docx both. I guess it requires 100 times more memory based on the test results I got with different size.

Timestamp: 2/28/2012 4:23:42 PM
Message: HandlingInstanceID: 166dc9d8-b605-4eb0-bcf6-091309885ceb
An exception of type ‘Aspose.Words.FileCorruptedException’ occurred and was caught.
2012-02-28 11:23:42Z
Aspose.Words.FileCorruptedException, Aspose.Words, Version=11.0.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56
The document appears to be corrupted and cannot be loaded.
Aspose.Words in file: C:\WINDOWS\TEMP\IntraLinks\d143fb57-f4c6-4093-951f-15a867cd172f.docx
System.Collections.ListDictionaryInternal
Void x5d4db34d48fb3129(System.IO.Stream, Aspose.Words.LoadOptions)
at Aspose.Words.Document.x5d4db34d48fb3129(Stream xcf18e5243f8d5fd3, LoadOptions x27aceb70372bde46)
at Aspose.Words.Document.x5d95f5f98c940295(Stream xcf18e5243f8d5fd3, LoadOptions x27aceb70372bde46)
at Aspose.Words.Document…ctor(String fileName, LoadOptions loadOptions)
at Aspose.Words.Document…ctor(String fileName)
at Intralinks.Common.Utilities.AsposeConverter.ConvertWordToPdf(String sourceFilePath, String destinationFilePath) in c:\ilbuilds\pdfconv\build_trunk\CommonUtilities\AsposeConverter.cs:line 39
at Intralinks.Common.Utilities.CommonUtilities.ConvertOfficeDocsToPDF(String sourceFile, String destinationFolder, Boolean useOriginalFileName) in c:\ilbuilds\pdfconv\build_trunk\CommonUtilities\CommonUtilities.cs:line 1436
xe8730a664ff488a4.xc5e345d2a919c94b, Aspose.Words, Version=11.0.0.0, Culture=neutral, PublicKeyToken=716fcc553a201e56
Cannot extract
Aspose.Words
System.Collections.ListDictionaryInternal
Void x4a77f4b2eb397877(System.String, System.IO.Stream, System.String)
at xe8730a664ff488a4.x990d54f34b2b5118.x4a77f4b2eb397877(String x4abc0735d5951ac3, Stream x7f5d4a91157364b5, String xe8e4b5871d71a79a)
at xfc5388ad7dff404f.xe965bada78e2d6b1.xfc0ead15b6083996(Stream xcf18e5243f8d5fd3)
at Aspose.Words.Document.x5d4db34d48fb3129(Stream xcf18e5243f8d5fd3, LoadOptions x27aceb70372bde46)
System.OutOfMemoryException, mscorlib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089
Exception of type ‘System.OutOfMemoryException’ was thrown.
mscorlib
System.Collections.ListDictionaryInternal
Void set_Capacity(Int32)
at System.IO.MemoryStream.set_Capacity(Int32 value)
at System.IO.MemoryStream.EnsureCapacity(Int32 value)
at System.IO.MemoryStream.Write(Byte[] buffer, Int32 offset, Int32 count)
at xe8730a664ff488a4.x990d54f34b2b5118._x3188c73c7f209072(Stream x9c13656d94fc62d0)
at xe8730a664ff488a4.x990d54f34b2b5118.x4a77f4b2eb397877(String x4abc0735d5951ac3, Stream x7f5d4a91157364b5, String xe8e4b5871d71a79a)
Category: Logging Exception
Priority: 1
EventId: 100
Severity: Error
Title:Enterprise Library Exception Handling
Machine: BOSPDFCONV02
Application Domain: ILPDFConversionService.exe
Process Id: 2172
Process Name: C:\PDFConvService\DEV2.pdfconv\instance1\Release\ILPDFConversionService.exe
Win32 Thread Id: 5376
Thread Name: 
Extended Properties: Product Version - 3.1.26984.0

Hi Viral,

Thanks for your inquiry.

With 4GB RAM installed, I managed to reproduce the ‘OutOfMemoryException’ upon converting to PDF format on my side. Probably the document you are trying to convert to PDF is simply too large. When you render a document Aspose.Words needs to build two model in the memory – one for document and the other for rendered document. That is why Aspose.Words utilizes more memory when you render a document than when you simply save a document in flow formats.
You should note that usually Aspose.Words needs few times more memory than document size to build model of the document in memory. For example if your document’s size is 1 MB, Aspose.Words needs 10-20 MB of RAM to build its DOM in memory. Multiplier depends on format because some formats are more compact than others. For example DOCX format is more compact than DOC and RTF, and DOC is more compact than RTF.
So I am not sure the issue you reported can be resolved in Aspose.Words. I would rather advise you to use few small documents instead of one huge document.

If we can help you with anything else, please feel free to ask.

Best regards,

Thanks for progressing this issue. Unfortunately I still not get complete answer. For each format can you send me how much memory is required for the given file sizes(15,20,20, 40). I am sure you must have done these tests. It would be great if you can provide me that table.

Hi

Thanks for your request. unfortunately, it is impossible to provide such statistics because memory usage depends not only from file size, but also from document complexity, i.e. of content in your document. You can test this on your side with your documents.

Best regards,

When I converted 22MB file to PDF with higher RAM(32 GB) it worked but created file 530MB . This is again a separate issue. But it occupied almost 4.5 GB RAM more than 200 times of source file size. So why it requires this much of memory. Is there any bulk API for conversion to reduce this memory consumption? Is there any other way I can implement my code or set different option to bring this memory down? Why dispose method is not expose on objects. It is not cleaning up memory after out of memory exception is thrown.

Hi,

Thanks for your inquiry.

  1. Please read the following thread on how to reduce the PDF file size:
    https://forum.aspose.com/t/how-to-reduce-pdf-files-size-using-aspose-words/64679

  2. As I mentioned in my previous posts, memory consumption totally depends on the contents inside input Word document. I would suggest you use small documents instead of one huge document.

  3. No, there is no such bulk API for conversion to reduce memory consumption?

  4. Could you please attach this document, which will allow me to reproduce the memory leak issue?

Best Regards,

Hi there,

Thanks for your inquiry.

In addition to what Awais has said, we will consider adding an option to use a temporary folder on disk during conversion to reduce the amount of working memory used.

I have linked your request to the appropriate issue. We will inform you as soon as it is available.

Thanks,