Hi,
I have a docx file that i am trying to convert to html.
sample file:
Lorem ipsum dolor sit amet. In impedit dolor non animi unde et odit odit aut nisi natus et impedit repellendus. Ut consequuntur ipsa qui autem laudantium ut repellendus esse!
1. Aut officia doloremque aut eveniet porro id vitae sunt.
** 1.1 random text**
** 1.2 random text**
2. Eos assumenda dolorum et aliquid earum.
3. Ut doloremque mollitia sit natus delectus.
4. Et Quis mollitia qui consequatur eius in incidunt aspernatur.
5. Ab eveniet ipsum ad voluptas odio.
Sit minima fuga sed consectetur velit ut suscipit iste ea reiciendis nihil ut architecto dignissimos. Nam tempora quia quo aperiam obcaecati ad dolore possimus est ratione earum. Aut quia quibusdam aut vitae quibusdam qui omnis ducimus non sint debitis. Et labore distinctio ut doloremque mollitia in quia iure!
when i convert this to html, the bullets are converted to
tags,
i tried the isListItem method, but that returns false too.
Is there any way to parse this the right way?
Thanks
@randomuser123 can you please attach the file
@randomuser123 I tried, but I cannot reproduce your issue. I use the following code:
var wordtohtml = new Aspose.Words.Document("C:\\Temp\\input.docx");
wordtohtml.AcceptAllRevisions();
HtmlFixedSaveOptions htmlSaveOptions = new HtmlFixedSaveOptions();
htmlSaveOptions.ExportEmbeddedCss = true;
htmlSaveOptions.ExportEmbeddedFonts = true;
htmlSaveOptions.PrettyFormat = true;
htmlSaveOptions.AllowEmbeddingPostScriptFonts = true;
htmlSaveOptions.OptimizeOutput = true;
htmlSaveOptions.UseHighQualityRendering = true;
htmlSaveOptions.UseAntiAliasing = true;
wordtohtml.Save("C:\\Temp\\output\\output.html", htmlSaveOptions);
For this document:
input.docx (14.6 KB)
hi @eduardo.canal
randomtext.docx (17.1 KB)
I have attached the word document, we receive this as a stream and convert to html. below is the code to convert to
var filePath = Path.Combine(Directory.GetCurrentDirectory(), "Uploads");
var stream = new MemoryStream();
Document doc = new Document(fileStream);
HtmlSaveOptions saveOptions = new HtmlSaveOptions(SaveFormat.Html)
{
ExportTextInputFormFieldAsText = true,
ImagesFolder = filePath
};
saveOptions.ExportListLabels = ExportListLabels.ByHtmlTags;
doc.Save(stream, saveOptions);
stream.Position = 0;
return stream;
I also tried this,
var htmlText = new HTMLDocument(fileStream, "");
but in this, htmlText.Body has unicode characters
@randomuser123 sorry, I still can’t replicate your issue. This is the code that I’m using, with the version 23.2.0 of the Aspose.Words API:
var doc = new Aspose.Words.Document("C:\\Temp\\input.docx");
MemoryStream stream = new MemoryStream();
doc.Save(stream, Aspose.Words.SaveFormat.Docx);
var wordtohtml = new Aspose.Words.Document(stream, new Aspose.Words.Loading.LoadOptions()
{
LoadFormat = Aspose.Words.LoadFormat.Docx
});
HtmlSaveOptions htmlSaveOptions = new HtmlSaveOptions()
{
ExportListLabels = ExportListLabels.ByHtmlTags,
ExportTextInputFormFieldAsText = true,
ImagesFolder = "C:\\Temp\\output\\img"
};
wordtohtml.Save("C:\\Temp\\output\\output.html", htmlSaveOptions);
and this is what I got:
sorry if my description was confusing,
the html code for your output document does not have ol tags. it has h3 and h1 tags for the specific heading, is there a way to get this in the form of ol tags?
@randomuser123 Unfortunately, in this case there is no way to export the lists applied through heading styles to HTML using <ol>
tags. Aspose.Words exports to HTML preserving maximum roundtrip information as possible. In this particular case the heading styles are preserved using <h1>..<h6>
tags and lists are preserved using special -aw-xxx
attributes. So when you import the document back to Aspose.Words Document both heading and list formatting were preserved.
1 Like