We are currently leveraging the Converter.ConvertHTML to create a DOCX file from HTML content.
// Use a MemoryStreamProvider to handle the in-memory conversion
var streamProvider = new MemoryStreamProvider();
// Initialize an HTML document
var document = new HTMLDocument(html.Model, ".");
var pdfSaveOptions = new DocSaveOptions
{
PageSetup =
{
AdjustToWidestPage = true,
PageLayoutOptions = Aspose.Html.Rendering.PageLayoutOptions.ScaleToPageWidth,
AnyPage = new Page(new Size(request.Width, request.Height))
}
};
// Convert HTML to DOCX using the MemoryStreamProvider
Converter.ConvertHTML(document, pdfSaveOptions, streamProvider);
When a DOCX file is created and opened, it opens in compatibility mode.
When the file is converted (File > Info > Convert) all text content is lost.
Similarly, when the file is opened on a MAC device no text content is present in the DOCX file.
We have tried the following to diagnose / resolve this issue:
- Explicitly setting the FontEmbeddingRule = Aspose.Html.Rendering.Doc.FontEmbeddingRule.Full
*Utilising font and back up fonts in the CSS of the HTML
body {
font-family: Helvetica, sans-serif;
font-size: 12px;
margin: 0;
padding: 0;
padding-bottom: 50px;
page-break-before: always;
box-sizing: border-box;
}
- Removing all CSS all together.
- Simplifying the HTML content to only contain one element.
- Explicitly setting the Explicitly setting the DocumentFormat = Aspose.Html.Rendering.Doc.DocumentFormat.DOCX
This is now causing a business impact to one of our clients.
Any advice or assistance in this matter will be greatly appreciated.
Kind regards
Scotty
@Scott.Robson
It seems you are experiencing issues with the conversion of HTML to DOCX using Aspose.HTML, particularly with the resulting DOCX file opening in compatibility mode and losing text content. Here are some suggestions to help you troubleshoot and resolve this issue:
-
Ensure Proper Document Format: Make sure you are explicitly setting the document format to DOCX in your DocSaveOptions
. This is crucial as it ensures that the output file is recognized as a DOCX file. You can do this by setting options.DocumentFormat = Rendering.Doc.DocumentFormat.DOCX;
.
-
Check HTML Content: Simplifying your HTML content is a good approach. Ensure that the HTML is well-formed and does not contain any unsupported tags or attributes that might cause issues during conversion. You mentioned trying a single <p>
element; ensure that this element contains valid text.
-
Font Embedding: Since you are already setting the FontEmbeddingRule
to Full
, ensure that the fonts specified in your CSS are available on the system where the DOCX is being opened. If the fonts are not available, it may lead to rendering issues.
-
Test with Different CSS: While you have tried removing CSS, consider testing with minimal CSS that only includes essential styles. Sometimes, complex styles can interfere with the conversion process.
-
Use a Different Output Method: Instead of using a MemoryStreamProvider
, try saving the DOCX file directly to a file path. This can sometimes yield different results and help identify if the issue is related to the in-memory handling.
Here is a simplified example of how you might structure your code:
using Aspose.Html.Converters;
using Aspose.Html.Saving;
// Initialize an HTML document
var htmlDocument = new HTMLDocument("path/to/your/input.html");
// Set up save options
var docSaveOptions = new DocSaveOptions
{
DocumentFormat = Rendering.Doc.DocumentFormat.DOCX,
FontEmbeddingRule = Aspose.Html.Rendering.Doc.FontEmbeddingRule.Full
};
// Convert HTML to DOCX
Converter.ConvertHTML(htmlDocument, docSaveOptions, "path/to/your/output.docx");
- Check Compatibility Settings: If the DOCX file opens in compatibility mode, it may be due to the way the document is structured. Ensure that the conversion process does not introduce any legacy elements that could trigger compatibility mode.
If these suggestions do not resolve the issue, consider reaching out to Aspose support with specific details about your HTML content and the environment in which you are testing. This will help them provide more targeted assistance.
If you need further help, feel free to ask!
Thank you for your response and the suggestions provided.
I confirm that I have already attempted all of the suggested solutions, including:
- Ensuring the document format is explicitly set to DOCX.
- Simplifying the HTML content to its most basic form (including testing with a single
<p>
element).
- Verifying font availability and explicitly setting
FontEmbeddingRule
to Full
.
- Testing with minimal CSS, and also removing all CSS entirely.
- Testing by saving the DOCX file directly to a file path, instead of using a
MemoryStreamProvider
.
Unfortunately, none of these steps have resolved the issues we are experiencing.
To reiterate the core problems:
- DOCX Compatibility Mode: The generated DOCX files consistently open in compatibility mode in Microsoft Word. This is not the desired behaviour, and suggests an underlying issue with how the DOCX file is being structured during the conversion process.
- Loss of Text Content:
- When the DOCX file is opened and then “converted” within Microsoft Word (File > Info > Convert), all text content is lost. The document structure remains, but the actual text disappears.
- Crucially, this issue also occurs when the DOCX file is opened on a macOS device. The DOCX file opens, but it contains no text. This cross-platform inconsistency is a significant problem for our client.
These issues are causing a business impact for our client. The inability to reliably convert HTML to DOCX with consistent content across different operating systems (Windows and macOS) and without triggering compatibility mode is a critical concern.
Could we please escalate this issue to a member of your support team who can provide more in-depth assistance?
We are available to provide any other information that may be helpful in diagnosing the root cause of these problems.
Kind regards,
Scotty
@Scott.Robson
Would you kindly share your sample HTML in .zip format with us? We will test the scenario in our environment and address it accordingly.
Hi @asad.ali,
Thank you for getting back to me.
I can indeed provide an HTML file showing the desired final output.
However, as I am able to replicate this exact issue simply using a new HTML file containing only <p>test</p>
, I am not sure of the benefit or relevance of providing the full output at this time.
Is this a known issue?
Kind regards,
Scotty
@Scott.Robson
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): HTMLNET-6343
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.