I’ve been trying to convert Markdown to Docx
I’ve managed to perform the conversion using the following code block
Method 1:
var rootPath = Path.Combine(AppContext.BaseDirectory, "../../../");
var tempMarkdownFilePath = Path.Combine(rootPath, "Temp" + markdownId + ".md");
File.WriteAllText(tempMarkdownFilePath, content);
Document tempMarkdown = new Document(tempMarkdownFilePath);
var tempPath = Path.Combine(rootPath, markdownId + ".docx");
tempMarkdown.Save(tempPath, SaveFormat.Docx);
However I am exploring doing the conversion using MemoryStream instead, but it is not working.
This is the code snippet I am using
Method 2:
byte[] byteArray = Encoding.UTF8.GetBytes(content);
MemoryStream markdownStream = new MemoryStream(byteArray);
//Create Doc to Convert to Docx Components
Document tempMarkdown = new Document(markdownStream);
MemoryStream docxStream = new MemoryStream();
var rootPath = Path.Combine(AppContext.BaseDirectory, "../../../");
var tempPath = Path.Combine(rootPath, markdownId + ".docx");
tempMarkdown.Save(tempPath, SaveFormat.Docx);
The markdown are just replicated into the output and not converted.
I suspect the conversion is done base on file extension.
Do you have a better way of doing this?
These are the input and output I am getting
- My expected output using Method 1 : Underline-Expected.docx
- My Result with File Stream using Method 2: Underline-Incorrect-Result.docx
- Input: Underline.md
Archive.zip (13.7 KB)
@h1021010231023a
To convert Markdown to DOCX using a MemoryStream
, you need to ensure that the conversion process is correctly set up. The issue you’re facing, where the Markdown content is simply replicated in the output, is likely due to the way the Document
class processes the input stream.
Here’s a refined approach to achieve the conversion using MemoryStream
:
- Convert Markdown to HTML: First, you need to convert the Markdown content to an HTML document.
- Convert HTML to DOCX: Then, you can convert the resulting HTML document to DOCX format.
Here’s an example of how you can implement this:
using Aspose.Html;
using Aspose.Html.Converters;
using Aspose.Html.Saving;
using System.IO;
using System.Text;
// Your Markdown content
string content = "# Sample Markdown\nThis is a sample markdown content.";
// Step 1: Convert Markdown to HTML
using var markdownStream = new MemoryStream(Encoding.UTF8.GetBytes(content));
var document = Converter.ConvertMarkdown(markdownStream);
// Step 2: Prepare to save as DOCX
var docxStream = new MemoryStream();
var options = new DocSaveOptions();
// Step 3: Convert HTML to DOCX
Converter.ConvertHTML(document, options, docxStream);
// Save the DOCX to a file
var rootPath = Path.Combine(AppContext.BaseDirectory, "../../../");
var tempPath = Path.Combine(rootPath, "output.docx");
using (var fileStream = new FileStream(tempPath, FileMode.Create, FileAccess.Write))
{
docxStream.Position = 0; // Reset stream position
docxStream.CopyTo(fileStream);
}
Explanation:
- Markdown to HTML Conversion: The
Converter.ConvertMarkdown
method is used to convert the Markdown content from the MemoryStream
to an HTML document.
- HTML to DOCX Conversion: The
Converter.ConvertHTML
method takes the HTML document and converts it to DOCX format, which is then stored in another MemoryStream
.
- Saving the DOCX: Finally, the DOCX content is saved to a file by copying the content from the
MemoryStream
to a file stream.
This approach should resolve the issue of the Markdown content being replicated in the output, as it ensures that the conversion processes are correctly applied.
For more detailed information on converting Markdown to DOCX, you can refer to the official documentation on the Aspose website here.
If you have any further questions or need additional assistance, feel free to ask!
@h1021010231023a The problem occurs because Aspose.Words cannot detect the input as markdown when loading your document from stream. You can simply explicitly specify the document load format in this case:
byte[] byteArray = Encoding.UTF8.GetBytes(File.ReadAllText(@"C:\Temp\in.md"));
MemoryStream markdownStream = new MemoryStream(byteArray);
//Create Doc to Convert to Docx Components
Document tempMarkdown = new Document(markdownStream, new LoadOptions() { LoadFormat = LoadFormat.Markdown });
tempMarkdown.Save(@"C:\Temp\out.docx", SaveFormat.Docx);