We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Looking for way to handle bad files

First let me explain why I am testing this way. We accept many files from many users. As such I would like to maintain stability when someone is being purposely mischievous/malicious.

Here is my code that loads either rtfs or docs.


try
{
this.highResolutionTimer.Start();
this.documentPath = documentPath;

if (ConfigSettings.AsposeInteractsWithMemoryStream)
{
using (FileStream fs = new FileStream(this.documentPath, FileMode.Open))
{
if (fs.CanRead)
{
MemoryStream ms = new MemoryStream();
fs.CopyTo(ms);
this.wordsDocument = new Aspose.Words.Document(ms);
}
}
}
else
{
this.wordsDocument = new Aspose.Words.Document(this.documentPath);
}
}
catch (Aspose.Words.FileCorruptedException wordsException)
{
Logger.LogError(“Aspose.Words file corrupted exception reading {0}. Message: {1}”, this.documentPath, wordsException.Message);
}
catch(Aspose.Words.IncorrectPasswordException wordsException)
{
Logger.LogError(“Aspose.Words incorrect password reading {0}. Message: {1}”, this.documentPath, wordsException.Message);
}
catch (Aspose.Words.UnsupportedFileFormatException wordsException)
{
Logger.LogError(“Aspose.Words unsuppoted format exception reading {0}. Message: {1}”, this.documentPath, wordsException.Message);
}
catch (Exception ex)
{
Logger.LogError(“System exception reading {0}. Message: {1}”, this.documentPath, ex.Message);
}
finally
{
this.highResolutionTimer.Stop();
Logger.LogInfo(“Aspose.Words took {0} seconds to open {1}”, this.highResolutionTimer.ElapsedTime, this.documentPath);
}

Here are documents. These files a purposely random junk.
https://www.dropbox.com/s/buce03a9anokvvi/RandomJunk.rtf?dl=0
https://www.dropbox.com/s/adr84pgmpgtpbmz/RandomJunk.doc?dl=0

My guess is that Aspose keeps allocating pages.

MS word basically stops allocating pages and stops at it’s page limit relativity quickly. Aspose.Words allocates a lot of memory and keeps processing for many minutes.

Is there any way to limit the number of pages Aspose.Words will process.


Hi Kent,

Thanks for your inquiry. Perhaps, you are using an older version of Aspose.Words; as with Aspose.Words v14.8.0, I am unable to reproduce this problem on my side. I would suggest you please upgrade to the latest version of Aspose.Words i.e. v14.8.0 and let us know how it goes on your side. I hope, this will help.

AA.Engineering:

Is there any way to limit the number of pages Aspose.Words will process.

Aspose.Words does not load limited number of pages into Aspose.Word DOM. Once you have loaded the document into Aspose.Words DOM, you can work with document pages using Aspose.Words.Layout API. The Aspose.Words.Layout namespace provides classes that allow to access information such as on what page and where on a page particular document elements are positioned, when the document is formatted into pages. Please let us know if you have any more queries.

I am using 14.8.0.

Let me be more specific. I was hoping you would have noticed that Aspose.Words.FileCorruptedException is not thrown in this case.

It is my assumption that a file full of random bytes should throw Aspose.Words.FileCorruptedException.

What does throw, based on the above example code, is this.wordsDocument.PageCount with System.ArgumentOutOfRangeException. That exception is thrown after allocating 5.2 GB of memory and spending 8+ minutes of CPU time. These types of resource allocation threaten system stability.


Thanks


Hi Kent,

Thanks for your inquiry. Please note that you can not load incomplete document into Aspose.Words DOM. Aspose.Words.FileCorruptedException is thrown during document load, when the document appears to be corrupted and impossible to load. Regarding Document.PageCount query, you can only get page count when document is loaded into Aspose.Words DOM.

Are you facing any issue while using Aspose.Words? If yes, please share some more detail about your query. We will investigate the issue and provide you more information. Please also share your working environment.

What environment are you running on?

  • OS (Windows Version or Linux Version)
  • Architecture (32 / 64 bit)
  • .NET Framework version

Windows 7 x64

64 Bit
4.5.1

Here is a short video of the issue. It always successful loads the file.

https://www.dropbox.com/s/gsipcqrgtp9ahe4/AsposeWordsCorruptedFile.mp4?dl=0

If you see my error let me know.


Hi Kent,

Thanks for sharing the detail. It would be great if you please share following detail for further investigation purposes.


  • Please attach your input Word document.
  • Please

    create a standalone/runnable simple application (for example a Console
    Application Project
    ) that demonstrates the code (Aspose.Words code) you used to generate
    your output document


Unfortunately,
it is difficult to say what the problem is without the Document(s) and
simplified application. We need your Document(s) and simple project to
reproduce the problem. As soon as you get these pieces of information to
us we’ll start our investigation into your issue.

Thanks for your cooperation.

https://www.dropbox.com/s/uriht2o5cy00rbr/AsposeTests.7z?dl=0

This 7z file has the both input files (RandomJunk.rtf and RandomJunk.doc), two batch files (RandomJunkRtfTest.bat and RandomJunkDocTest.bat) that run the release builds.


I have abbreviated the test and catch the exception on wordsDocument.PageCount as early as possible.

For me, the Aspose.Words.FileCorruptedException is not thrown. A System.ArgumentOutOfRangeException is thrown during execution of the PageCount property.

Hi Kent,

Thanks for sharing the detail. I am working over your query and will update you asap.

Hi Kent,

Thanks for your patience. I have tested the scenario and have managed to reproduce the same issue at my side. For the sake of correction, I have logged this problem in our issue tracking system as WORDSNET-10916. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

The issues you have found earlier (filed as WORDSNET-10916) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.