Corrupt Documents

Hi there,

Nearly all the Documents I am creating with Aspose.Words are corrupt according to MS Word 2007. I’ve attached a screenshot of the error message Word is providing (corrupt.png).

I don’t encounter any exceptions from Aspose.Words when I call Document.Save(), and the “corrupt” document (corrupt.docx) can be opened in Aspose.Words, again without any errors.

Are you able to look at this document (which was created with Aspose.Words) and tell me why MS Word is claiming it is corrupt?

Hi Alex,

Thanks for your inquiry. Could you please attach the complete source code you used to generate this document here for testing? I will investigate the issue on my side and provide you more information.

Best regards,

That will take a hell of a lot longer than just opening the document and inspecting its contents. Can you please investigate the previously attached Documents’ structure and explain why Aspose.Words will happily open and save the Document while MS Word will not? Thank you.

Hi Alex,

Thanks for your inquiry.

Aspose.Words throws Aspose.Words.FileCorruptedException during document load, when the document appears to be corrupted and impossible to load. In case Aspose.Words encounters a problem that can be resolved upon loading a document, it recovers a document silently and does not throw any exceptions. Re-saving such documents back to the disk using Aspose.Words might resolve your issues. If we can help you with anything else, please feel free to ask.

Best regards,

awais.hafeez:
Re-saving such documents back to the disk using Aspose.Words might resolve your issues. If we can help you with anything else, please feel free to ask.

Hi Awais,

Re-saving the silently recovered document fixes it, but I can’t open and re-save every document I create.

Why is this happening in the first place? If there is something wrong with the document, why does Aspose.Words not report this when the document is saved? Why would Aspose.Words happily write a corrupt document without telling anybody it is corrupt? That seems absurd.

Once I have created a document and added all its elements, how can I determine if the document is corrupt or not before I save it? How do I determine what is wrong with the document?

Hi Alex,

Thanks for your inquiry.

You can implement IWarningCallback interface if you want to have your own custom method called to capture loss of fidelity warnings that can occur during document loading or saving. For example, during loading a document you can capture warnings by using the following code snippet:

Aspose.Words.LoadOptions loadOptions = new Aspose.Words.LoadOptions();
loadOptions.LoadFormat = LoadFormat.Docx;
loadOptions.WarningCallback = new HandleDocumentWarnings();
Document doc = new Document(@"C:\Temp\corrupt.docx", loadOptions);
public class HandleDocumentWarnings : IWarningCallback
{
     public void Warning(WarningInfo info)
     {
          Console.WriteLine(info.WarningType + ": " + info.Description);
     }
}

I hope, this helps.

Best regards,

Hey there,

I converted your code to Java and received this warning when trying to open a document saved by Aspose.Words.

16777216: Threading information is not supported by Aspose.Words.

What does this warning mean, and why does this document not open in Microsoft Word?

Hi Alex,

Thanks for your inquiry. First of all please find below a complete list of warning types i.e. issued by Aspose.Words during document loading or saving:

  • DataLossCategory: Some text/char/image or other data will be missing from either the document tree following load, or from the created document following save. (255)
  • DataLoss: Generic data loss, no specific code. (1)
  • MajorFormattingLossCategory: The resulting document or a particular location in it might look substantially different compared to the original document. (65280)
  • MajorFormattingLoss: Generic major formatting loss, no specific code. (256)
  • MinorFormattingLossCategory: The resulting document or a particular location in it might look somewhat different compared to the original document. (16711680)
  • MinorFormattingLoss: Generic minor formatting loss, no specific code. (65536)
  • FontSubstitution: Font has been substituted. (131072)
  • UnexpectedContentCategory: Some content in the source document could not be recognized (i.e. is unsupported), this may or may not cause issues or result in data/formatting loss. (251658240)
  • UnexpectedContent: Generic unexpected content, no specific code. (16777216)

Secondly, it would be great if you please attach the source code you used to generate this document here for testing. Otherwise, I am afraid, we will not be able to investigate your issue and raise a ticket. Thanks for your cooperation.

Best regards,

I didn’t expect it to be such a huge issue - I kind of assumed Aspose.Words would tell me why this file was corrupt. Now that I have that information (“Generic unexpected content”) I have a much better platform to debug from. I’m probably inserting nodes in the wrong places or something.

If it looks like I’m not doing anything “illegal”, I’ll come back to you. Thanks.

Hi Alex,

Thanks for your cooperation. Sure, we will wait for you input. Also, this issue sounds like a bug and we will gladly look into fixing it as soon as we can reproduce it on our side.

Best regards,