We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Font issues in converting Docx to Html

I tried converting a pdf document
to docx format, and it came out well. I got odf successfully converted to docx without any alignmnt or font issue,

But when I try to convert again this docx file to html, I am getting font issue, some text are not displaying properly.Some texts are found to be overlapped with each other.

I am getting this issue.PNG (6.9 KB)

and the document is Business problems and Solutions (1).docx (1.5 MB)
code used to convert pdf to docx

DocSaveOptions saveOption = new DocSaveOptions();
saveOption.Mode = DocSaveOptions.RecognitionMode.Textbox;
saveOption.Format = DocSaveOptions.DocFormat.DocX;
saveOption.RecognizeBullets = true;
saveOption.RelativeHorizontalProximity = 2.5f;
//this line resolves font issue
saveOption.AddReturnToLineEnd = false;
saveOption.MaxDistanceBetweenTextLines = 2.5f;
saveOption.CloseResponse = true;
saveOption.ExtractOcrSublayerOnly = true;
saveOption.TryMergeAdjacentSameBackgroundImages = false;
Aspose.Pdf.Document pdfDocument = new Document(destFileName);
// Save the file into MS document format
pdfDocument.Save(Path.Combine(path, "pdfToWord", fileName + ".docx"), saveOption);

code used to convert docx to html

HtmlFixedSaveOptions htmlFixedSaveOptions = new HtmlFixedSaveOptions();
htmlFixedSaveOptions.ExportEmbeddedCss = true;
htmlFixedSaveOptions.ExportEmbeddedFonts = true;
htmlFixedSaveOptions.ExportEmbeddedImages = true;
htmlFixedSaveOptions.ExportEmbeddedSvg = true;
htmlFixedSaveOptions.ExportFormFields = true;
htmlFixedSaveOptions.ExportGeneratorName = true;
string cssprefix = "aspose_doc" + page;
htmlFixedSaveOptions.CssClassNamesPrefix = cssprefix;
htmlFixedSaveOptions.AllowEmbeddingPostScriptFonts = true;
//htmlFixedSaveOptions.UseTargetMachineFonts = true;
htmlFixedSaveOptions.SaveFormat = Aspose.Words.SaveFormat.HtmlFixed;
htmlFixedSaveOptions.PrettyFormat = true;
htmlFixedSaveOptions.PageHorizontalAlignment = 
htmlFixedSaveOptions.OptimizeOutput = true;
Document doc = new Document(docxFile);
doc.Save(path, htmlFixedSaveOptions);

@pooja.jayan Thank you for reporting this problem to us. I have logged it as WORDSNET-23252. We will keep you informed and let you know once it is resolved.
Also, please note that Aspose.Words supports loading PDF documents itself. You can use the following simple code to open PDF document using Aspose.Words.

Document doc = new Document("in.pdf");

You can notice that in Aspose.Pdf produced Docx document all text content is in absolutely positioned frames, that makes such document not so easy to edit in MS Word. On other hand Aspose.Words loads PDF document as flow document, which is more natural for MS Word documents.

On coverting docx to html, some texts with “MJLHSE+HelveticaLTStd-LightCond” font in docx is changed to “Times new roman” in html, what can I do for this? I want the same font as in docx to be shown in html also

@pooja.jayan The font is changed while loading document into Aspose.Words DOM. We are currently analyzing the issue and provide you more information once analysis is finished.

The issues you have found earlier (filed as WORDSNET-23252) have been fixed in this Aspose.Words for .NET 22.1 update also available on NuGet.

Thank you for your support, It is working fine!!