We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

PDF to HTML conversion takes time

Business problems and Solutions (1).pdf (2.1 MB)

Aspose.Words.Document pdfDocument = new Aspose.Words.Document(destFileName);

int index = 0;
for (int page = 0; page < pdfDocument.PageCount; page++)
{

    using (MemoryStream pageStream = new MemoryStream())
    {
        // Save each page as a separate document.
        Aspose.Words.Document extractedPage = pdfDocument.ExtractPages(page, 1);
        HtmlFixedSaveOptions htmlFixedSaveOptions = new HtmlFixedSaveOptions();
        htmlFixedSaveOptions.ExportEmbeddedCss = true;
        htmlFixedSaveOptions.ExportEmbeddedFonts = true;
        htmlFixedSaveOptions.ExportEmbeddedImages = true;
        htmlFixedSaveOptions.ExportEmbeddedSvg = true;
        htmlFixedSaveOptions.ExportFormFields = true;
        //htmlFixedSaveOptions.ExportGeneratorName = true;
        string cssprefix = "aspose_doc" + page;
        htmlFixedSaveOptions.CssClassNamesPrefix = cssprefix;
        htmlFixedSaveOptions.AllowEmbeddingPostScriptFonts = true;

        htmlFixedSaveOptions.FontFormat = ExportFontFormat.Ttf;
        htmlFixedSaveOptions.SaveFormat = Aspose.Words.SaveFormat.HtmlFixed;
        //htmlFixedSaveOptions.UseTargetMachineFonts = true;
        //htmlFixedSaveOptions.SaveFormat = Aspose.Words.SaveFormat.HtmlFixed;
        //htmlFixedSaveOptions.PrettyFormat = true;
        //htmlFixedSaveOptions.PageHorizontalAlignment = HtmlFixedPageHorizontalAlignment.Center;
        //htmlFixedSaveOptions.ExportFormFields = true;
        //htmlFixedSaveOptions.OptimizeOutput = false;


        extractedPage.Save(pageStream, htmlFixedSaveOptions);

        File.WriteAllBytes(path + index + ".html", pageStream.ToArray());
        //extractedPage.Save(path + index + ".html", htmlOptions);

        index++;
    }
}

This conversion takes around 108 sec, doc/docx to html conversion is very fast, why pdf is taking so long to get converted to html?

@pooja.jayan

Your query is related to Aspose.Words APIs. So, we have moved this thread to Aspose.Words forum where you will be guided appropriately.

@pooja.jayan While loading PDF document into Aspose.Words DOM, Aspose.Words converts fixed page PDF into flow document representation, which is more native for MS Word documents. This conversion takes some time. The next step in your code is conversion to HtmlFixed format, which is fixed page format, so Aspose.Words need to layout the flow document into fixed page format again.
Most of the conversion time in your case is taken by loading PDF document into the Aspose.Words.Document object, i.e. by conversion from fixed page format to flow document model

Hai,
Got it.
I tried it with loading pdf document into Aspose.Pdf.Document object and converting to html, still it is taking more time, is it because am I trying to convert a document from a fixed page format to another fixed page format

Business problems and Solutions (1).pdf (2.1 MB)
I tried this doc to convert to html, it took around 14sec and when I tried it after converting to docx, took only 7 sec.

Is there anything I can do to reduce the conversion time from pdf to html?

@pooja.jayan Conversion is performed fast if you convert similar formats. For example it fast to convert between different MS Word formats, like DOCX and DOC because their model is almost identical and the difference is only how to write the same elements in different formats.
But when you convert between formats with different model, like fixed-page PDF, where all elements are absolutely positioned and HTML or MS Word format, which are flow formats, model conversion and calculation are involved that takes memory and processor resources.
Aspose.PDF converts PDF to DOCX fast because it does not actually convert the fixed-page model into flow document, but simply writes content of the source PDF document as absolutely positioned elements, like frames. This makes the resulting document not so easy to edit in MSWord or other editor.
Regarding the difference in time in pdf-to-docx and pdf-to-html using Aspose.PDF, you should ask in Aspose.PDF forum, my colleagues will be glad to help you.