I’ve begun to notice that some user submitted PDFs produce extra abnormal characters when converted to HTML.
My best guess is Aspose.PDF is unable to handle font encodings such as Roman on Windows.
What is the best way to handle this?
I have attached example PDFs that produce extra characters when converted.
CV-AndrewFord-12-Jul-2014.pdf (163.9 KB)
Web_Developer_Resume_1.pdf (260.8 KB)
Best Regards,
James
C# Code Used To Reproduce
Document doc = new Document(inputPath);
// Instantiate HTML Save options object
HtmlSaveOptions newOptions = new HtmlSaveOptions();
// Enable option to embed all resources inside the HTML
newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
// This is just optimization for IE and can be omitted
newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
// Instantiate HTML Save options object
HtmlSaveOptions newOptions = new HtmlSaveOptions();
// Enable option to embed all resources inside the HTML
newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
// This is just optimization for IE and can be omitted
newOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
// Saving
doc.Save(OutputPath, newOptions);