Converting PDF file to HTML and Then HTML to DOCX

We are converting PDF file to HTML format and then converting HTML to DOCX. But we are facing few issues related to styling.


Refer Below mentioned Code

private void mnuConvertWord_Click(object sender, RoutedEventArgs e)
{
// Convert document
try
{

string fullfilename = @“C:\Users\Desktop\HTML\InputFie.pdf”;
Aspose.Pdf.License license = new Aspose.Pdf.License();
license.SetLicense(“Aspose.Total.lic”);
license.Embedded = true;

Aspose.Pdf.Document doc = new Aspose.Pdf.Document(fullfilename);
// Instantiate HTML Save options object
HtmlSaveOptions htmlOptions = new HtmlSaveOptions();

// Enable option to embed all resources inside the HTML
htmlOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

// This is just optimization for IE and can be omitted
htmlOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
htmlOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
htmlOptions.HtmlMarkupGenerationMode = HtmlSaveOptions.HtmlMarkupGenerationModes.WriteAllHtml;

// Output file path
string outHtmlFile = @“C:\Users\Desktop\HTML\Test.html”;
doc.Save(outHtmlFile, htmlOptions);

Aspose.Words.Document doc123 = new Aspose.Words.Document(outHtmlFile);
doc123.Save(@“C:\Users\Desktop\HTML\OutputFile.docx”);

}

catch (Exception ex)
{
MessageBox.Show(ex.Message);
}

}

Hi Arunakumar,

Your input PDF files looks corrupt. Can you please ZIP and attach the file again for further investigation?

Please also share why it is not feasible for you to convert PDF to DOCX directly and go for PDF to HTML to DOCX option?

Best Regards,