Aspose.Word can't detect fileformat via stream

Hi Team

When we pass a PDF stream as byte[] content and LoadFormat as docx it wont throw error. Instead it return incorrect pagecount.

private int GetWordDocumentPageCount(byte[] content, LoadFormat format, string filename)
{
    try
    {
        int pageCount = 0;

        using (MemoryStream filestream = new MemoryStream(content))
        {
            filestream.Position = 0;
            LoadOptions op = new LoadOptions();
            op.Encoding = System.Text.Encoding.Default;
            op.LoadFormat = format;

            Aspose.Words.Document WordDocument = new Document(filestream, op);

            pageCount = WordDocument.PageCount;
        }
        return pageCount;
    }
    catch (Exception ex)
    {
        Log.Error("GetWordDocumentPageCount Error {Error} Inner Ex {InnerEx}", ex.Message, ex.InnerException?.Message);

        throw new Exception("File is corrupted or File extention is incorrect. Filename: " + filename);
    }
}

@Gpatil This is expected behavior. If load format is specified explicitly, Aspose.Words tries to load the document with the specified load format, but if format is specified improperly, Aspose.Words loads the document with auto detecting load format.
Regarding incorrect document layout. Please note, Aspose.Words is designed to work with MS Word documents. MS Word documents are flow documents and they have structure very similar to Aspose.Words Document Object Model. On the other hand PDF documents are fixed page format documents . While loading PDF document, Aspose.Words converts Fixed Page Document structure into the Flow Document Object Model. Unfortunately, such conversion does not guaranty 100% fidelity.

Hi @alexey.noskov
I understand. In our case we are allowing user to send the stream and filename but in certain scenarios user send pdf file in stream but the filename is not present.
Is their any way possible for any of the aspose api which can Identify file format, by using the incoming stream.

@Gpatil You can use FileFormatUtil to detect document format using Aspose.Words. For example:

FileFormatInfo info = FileFormatUtil.DetectFileFormat(inputStream);
Console.WriteLine(info.LoadFormat);
1 Like