We have analyzed aspose DOC/PDF jar to convert doc/docx/pdf to html.
We found the quality of html after conversion to be not that satisfactory
We then tried converting doc/docx to pdf and then to html, through which
quality improves a little bit but with that size increase to around
8-10 times
Please find the attached docs which we have used for this POC …
We are embedding all the resources in the html like images,css…
If possible can we separate out common resources like css which will be common for all the converted documents
Also is there any control on fonts like in few converted CVs having font
size in px and others are in pt … If we want to convert all docs to
html in one unit of font …
Please let us know will it be better in paid software, we are planning
to buy it. But prior to that need to cross check the quality of
converted CVs along with size
please find below the code which we are using for conversion:
Document doc = new Document(inputFile);
// Instantiate HTML Save options object
HtmlSaveOptions newOptions = new HtmlSaveOptions();
// Enable option to embed all resources inside the HTML
newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
// This is just optimization for IE and can be omitted
newOptions.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
// Output file path
doc.save(outputFile, newOptions);