How to get the converted Markdown file content in Memory Stream

sharook · August 4, 2023, 1:33pm

I am writing a method that converts HTML files to MarkDown files using Aspose.HTML dll,
Converter.ConvertHTML() method requires providing the output path for saving the converted files is there any way or other method that converts the HTML to markdown and returns the MemoryStream of the markdown content

asad.ali · August 4, 2023, 9:34pm

@sharook
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): HTMLNET-4797

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

sharook · August 7, 2023, 7:17am

hi @asad.ali, Thanks for the reply. I just wanted to know if this feature is impossible using the existing DLL version of Aspose.HTML as of now?

asad.ali · August 7, 2023, 11:57am

@sharook

In order to control where the document and its associated resources are saved during Markdown conversion, you can use the following code:

internal class Program
{
    static void Main()
    {
        // Open the HTML document to be saved
        using var doc = new HTMLDocument("Input.htm");
        var storage = new MyStorage();
        var options = new MarkdownSaveOptions();
        // Save HTML to Markdown
        doc.Save(storage, options);
        // Memory stream containing converted Markdown
        var mdStream = storage.Streams[0];
    }

    class MyStorage : IOutputStorage
    {
        public MyStorage()
        {
            Streams = new List<MemoryStream>();
        }
        // This property will contain all saved MemoryStreams.
        public List<MemoryStream> Streams { get; }

        public OutputStream CreateStream(OutputStreamContext context)
        {
            // Create stream for the saved resource
            var stream = new MemoryStream();
            Streams.Add(stream);

            if (Streams.Count == 1)
                // The first resource is always a main document.
                return new OutputStream(stream, "file:///out.md");
            // All subsequent resources are images, scripts, referenced pages etc.
            return new OutputStream(stream, "file:///resource" + (Streams.Count - 1));
        }

        public void ReleaseStream(OutputStream stream)
        {
        }
    }
}

If you do not need associated resources, then you can disable saving them, as shown in the following example:

internal class Program
{
    static void Main()
    {
        // Open the HTML document to be saved
        using var doc = new HTMLDocument("Input.htm");
        var storage = new MyStorage();
        var options = new MarkdownSaveOptions
        {
            // Disable resource handling
            ResourceHandlingOptions =
            {
                Default = ResourceHandling.Ignore,
                JavaScript = ResourceHandling.Ignore
            }
        };
        // Save HTML to Markdown
        doc.Save(storage, options);
        // Memory stream containing converted Markdown
        var mdStream = storage.MDStream;
    }

    class MyStorage : IOutputStorage
    {
        public MemoryStream MDStream { get; private set; }

        public OutputStream CreateStream(OutputStreamContext context)
        {
            // Create stream for the saved document as there will be no resources
            MDStream = new MemoryStream();
            return new OutputStream(MDStream, "file:///out.md");
        }

        public void ReleaseStream(OutputStream stream)
        {
        }
    }
}

sharook · August 10, 2023, 9:06am

@asad.ali, Thanks for the reply .
In the above example you are saving the converted file in a temporary path and then reading the content of it and converting it to memory stream, That is not my requirement I should get the stream as a result using Aspose method.
example Converter.ConvertHTML() need file path as input parameter for saving the converted file and returns void so in my case i need a method which returns memory stream of the converted file content rather than saving it and then reading the content.

So I don’t want to store the file anywhere temporarily

asad.ali · August 10, 2023, 7:12pm

@sharook

We have updated the ticket information as per your feedback and will investigate from this perspective. We will inform you once we have some updates.

asad.ali · August 16, 2023, 10:53pm

@sharook

In the example above, the files are not stored in a temporary path, they are stored in the Stream passed to the OutputStream constructor, which is a MemoryStream. The path is needed in order to replace the link text in the saved files, and not to save by it. For example, if there is an import “@import url;” in the CSS file, then “url” will be replaced with the path passed to the OutputStream constructor while saving the resource located at this “url”.

aspose.notifier · September 5, 2023, 8:34pm

The issues you have found earlier (filed as HTMLNET-4797) have been fixed in this update. This message was posted using Bugs notification tool by avpavlysh