Output Doc File Formatting lost completely conversion from HTML to DOC using Stream

I am using the following code to convert from html to doc using stream in one go. but the output word document (attached with post) formatting lost completely. please find the input pdf and output doc in attachment

using Aspose.Pdf;
using System.IO;

namespace TestPDFASpose
{
    class Program
    {
        static MemoryStream ss = new MemoryStream();
        static void Main(string[] args)
        {
            PDFtoHTMLStream();

        }

        public static void PDFtoHTMLStream()
        {
            Document doc = new Document(@"E:\ExternalTestsData\Farms_Ryan_20160516152944_report.pdf");

            // tune conversion params
            HtmlSaveOptions newOptions = new HtmlSaveOptions();
            newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
            newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
            newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
            newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
            newOptions.SplitIntoPages = false;// force write HTMLs of all pages into one output document
            newOptions.FixedLayout = true;
            newOptions.FontEncodingStrategy = HtmlSaveOptions.FontEncodingRules.Default;
            newOptions.HtmlMarkupGenerationMode = HtmlSaveOptions.HtmlMarkupGenerationModes.WriteAllHtml;
            newOptions.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy(SavingToStream);
            newOptions.PagesFlowTypeDependsOnViewersScreenSize = true;
            // we can use some non-existing puth as result file name - all real saving will be done
            // in our custom method SavingToStream() (it’s follows this one)
            string outHtmlFile = @"E:\SomeNonExistingFolder\SomeUnexistingFile.html";
            doc.Save(outHtmlFile, newOptions);

        }

        private static void SavingToStream(HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo)
        {
            byte[] resultHtmlAsBytes = new byte[htmlSavingInfo.ContentStream.Length];
            htmlSavingInfo.ContentStream.Read(resultHtmlAsBytes, 0, resultHtmlAsBytes.Length);
            // here You can use any writable stream, file stream is taken just as example
            // string fileName = "stream_out.html";
            // Stream ss = File.OpenWrite(fileName);
            ss.Write(resultHtmlAsBytes, 0, resultHtmlAsBytes.Length);
            Aspose.Words.Document docWord = new Aspose.Words.Document(ss);
            docWord.Save("E:\test\test.doc");
            // return outStream;
        }
    }

}

Hi Sandeep,

Thanks for contacting support.

I have tested the scenario and have observed that HTML to DOC is not properly being converted. However when saving PDF file to HTML format using Aspose.Pdf for .NET 11.6.0, I did not notice any problem. So the formatting issues appears to be occurring when Aspose.Words is trying to convert HTML file to DOC format. Therefore I am moving this thread to Aspose.Words forum, so that my fellow workers from respective team can further look into this matter and will reply accordingly.

For your reference, I have also attached the resultant HTML generated over my end.

[C#]

Document doc = new Document(@"c:\pdftest\Farms_Ryan_20160516152944_report.pdf");
// tune conversion params
HtmlSaveOptions newOptions = new HtmlSaveOptions();
newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
newOptions.SplitIntoPages = false;// force write HTMLs of all pages into one output document
newOptions.FixedLayout = true;
newOptions.FontEncodingStrategy = HtmlSaveOptions.FontEncodingRules.Default;
newOptions.HtmlMarkupGenerationMode = HtmlSaveOptions.HtmlMarkupGenerationModes.WriteAllHtml;
// newOptions.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy(SavingToStream);
newOptions.PagesFlowTypeDependsOnViewersScreenSize = true;
// we can use some non-existing puth as result file name - all real saving will be done
// in our custom method SavingToStream() (it's follows this one)
string outHtmlFile = @"c:\pdftest\Farms_Ryan_20160516152944_report.html";
doc.Save(outHtmlFile, newOptions);
private static void SavingToStream(HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo)
{
    byte[] resultHtmlAsBytes = new byte[htmlSavingInfo.ContentStream.Length];
    htmlSavingInfo.ContentStream.Read(resultHtmlAsBytes, 0, resultHtmlAsBytes.Length);
    // here You can use any writable stream, file stream is taken just as example
    // string fileName = "stream_out.html";
    // Stream ss = File.OpenWrite(fileName);
    MemoryStream ss = new MemoryStream();
    ss.Write(resultHtmlAsBytes, 0, resultHtmlAsBytes.Length);
    Aspose.Words.Document docWord = new Aspose.Words.Document(ss);
    docWord.Save("c:\\pdftest\\Farms_Ryan_20160516152944_report_test.doc");
    // return outStream;
}

Hi Sandeep,

Thanks for your inquiry. We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-13670. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Hi Sandeep,

Thanks for your patience. Please note that Aspose.Words mimics the same behavior as MS Word does. You are converting the Pdf to fixed layout html using Aspose.Pdf. If you open the html document in MS Word, the layout of document will not be good.

In your case, we suggest you please convert the Pdf to Word document using Aspose.Pdf. Please check following code example. Hope this helps you.

Aspose.Pdf.Document doc = new Aspose.Pdf.Document(MyDir + @"input.pdf");
Aspose.Pdf.DocSaveOptions options = new Aspose.Pdf.DocSaveOptions();
options.Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow;
doc.Save(MyDir + "Out.doc", options);