HTML to RTF - Saving images

I have HTML with images that I want to convert to RTF. When I save the stream in RTF format, I get everything but the images.

Here is my code:

Dim HTML As Byte() = System.Text.Encoding.GetEncoding("iso-8859-1").GetBytes(oEditor.Html)
Dim memoryStream As System.IO.MemoryStream = New System.IO.MemoryStream(HTML)
Dim doc As New Document(memoryStream)
Dim options As New RtfSaveOptions()
Dim dstStream As New MemoryStream()
doc.Save(dstStream, options)
doc.Save("c:\temp\rtf\saved-from-ae.rtf", options)

Hi Tim,

Thanks for your inquiry.

Could you please attach your HTML document here and we will glaly provide you some further feedback.

Thanks,

Hi,

Here is the HTML code (kind-of ugly) which contains a <img> referencing a file on disk:

<html>
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta http-equiv="Content-Style-Type" content="text/css" />
    <meta name="generator" content="Aspose.Words for .NET 9.6.0.0" />
    <title></title>
</head>
<body>
    <div>
        <p style="margin:0pt">
            <span style="color:#ff0000; font-family:Cambria; font-size:12pt; font-weight:bold">Evaluation Only. Created with Aspose.Words. Copyright 2003-2010 Aspose Pty Ltd.</span>
        </p>
        <h2 style="font-weight:normal; margin:10pt 0pt 0pt; page-break-after:avoid; page-break-inside:avoid">
            <span style="color:#4f81bd; font-family:Cambria; font-size:13pt; font-weight:bold">Testing that the Applications are Prepared for HTTPS Communication</span>
        </h2>
        <p style="margin:0pt 0pt 10pt 36pt"><span style="color:#000000; font-family:'Times New Roman'; font-size:12pt; font-weight:normal">Replace “yourdomain.com” with the domain name that the SSL certificate was issued for.</span></p><p style="margin:0pt 0pt 10pt 36pt"><img src="/Adept8.3/tmp/ADM/Aspose.Words.3173c47b-e356-42c3-9ee9-240f37134dc6.001.png" width="290" height="174" alt="" /></p><p style="margin:0pt 0pt 0pt 72pt; text-indent:-18pt">
            <span style="color:#7030a0; font-family:'Courier New'; font-size:12pt; font-weight:normal">o</span><span style="color:#7030a0; font-family:'Courier New'; font-size:12pt; font-weight:normal">              </span><span style="color:#7030a0; font-family:'Times New Roman'; font-size:12pt; font-weight:normal">TEST: GlassFish</span><br />
            <span style="color:#0000ff; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:underline">https://yourdomain.com:8181/wsclient/servlet/DMS &lt;https://localhost:8181/wsclient/servlet/DMS&gt;</span><span style="color:#000000; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:none"> - The page should load without error and the certificate should not cause a warning.</span>
        </p>
        <p style="margin:0pt 0pt 0pt 72pt; text-indent:-18pt">
            <span style="color:#7030a0; font-family:'Courier New'; font-size:12pt; font-weight:normal; text-decoration:none">o</span><span style="color:#7030a0; font-family:'Courier New'; font-size:12pt; font-weight:normal; text-decoration:none">              </span><span style="color:#7030a0; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:none">TEST: GlassFish</span><br /><span style="color:#0000ff; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:underline">https://yourdomain.com:8181/wsclient/servlet/VueServlet &lt;https://localhost:8181/wsclient/servlet/VueServlet&gt;</span><span style="color:#000000; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:none"> - The page should load without error and the certificate should not cause a warning.</span>
        </p>
        <p style="margin:0pt 0pt 0pt 72pt; text-indent:-18pt">
            <span style="color:#7030a0; font-family:'Courier New'; font-size:12pt; font-weight:normal; text-decoration:none">o</span><span style="color:#7030a0; font-family:'Courier New'; font-size:12pt; font-weight:normal; text-decoration:none">              </span><span style="color:#7030a0; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:none">TEST: Adept Web Services</span><br /><span style="color:#0000ff; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:underline">https://yourdomain.com/Adept/BluePrintWebService/BPWS.asmx?op=getDmsConfig &lt;https://localhost/Adept/BluePrintWebService/BPWS.asmx?op=getDmsConfig&gt;</span><span style="color:#000000; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:none"> - The page should load without error and the certificate should not cause a warning. </span>
        </p>
        <p style="margin:0pt 0pt 0pt 72pt; text-indent:-18pt">
            <span style="color:#7030a0; font-family:'Courier New'; font-size:12pt; font-weight:normal; text-decoration:none">o</span><span style="color:#7030a0; font-family:'Courier New'; font-size:12pt; font-weight:normal; text-decoration:none">              </span><span style="color:#7030a0; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:none">TEST: Adept Web Services</span><br /><span style="color:#0000ff; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:underline">https://yourdomain.com/Adept/Service_Adept.asmx?op=HelloWorld &lt;https://localhost/Adept/Service_Adept.asmx?op=HelloWorld&gt;</span><span style="color:#000000; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:none"> </span>
        </p>
        <p style="margin:0pt 0pt 10pt 108pt; text-indent:-18pt">
            <span style="color:#000000; font-family:Wingdings; font-size:12pt; font-weight:normal; text-decoration:none">§</span><span style="color:#000000; font-family:Wingdings; font-size:12pt; font-weight:normal; text-decoration:none">              </span><span style="color:#000000; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:none">Login with AE and View a file.</span>
        </p>
        <p style="margin:0pt">
            <span style="color:#7030a0; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:none">TEST: AutoVue.  This test will not load a file, but will test whether the viewer has been configured properly.</span><span style="color:#000000; font-family:'Times New Roman'; font-size:12pt; font-weight:normal; text-decoration:none"> </span>
        </p>
    </div>
</body>
</html>

Hi

Thanks for your request. I suppose the problem occurs because Aspose.Words cannot find the image in the specified location. Moreover, path to image is relative. Have you tried to specify full path to image? This should fix the problem.

Best regards,

Let me give more background on what I’m testing.

I have a small RTF document stored in a database (it’s raw text is stored in a varchar column in a table).

Using your component, I’m converting the RTF to HTML, so I can display it in an HTML editor control:

Dim sMemo As String = docrec.GetTextMemoValue()
Dim RTF As Byte() = System.Text.Encoding.GetEncoding("iso-8859-1").GetBytes(docrec.GetTextMemoValue())
Dim memoryStream As System.IO.MemoryStream = New System.IO.MemoryStream(RTF)
Dim doc As New Document(memoryStream)
' Create and pass the object which implements the handler methods.
Dim options As New HtmlSaveOptions(SaveFormat.Html)
options.ExportTextInputFormFieldAsText = True
options.ImagesFolder = "C:\inetpub\wwwroot\Adept8.3\tmp\ADM"
options.ImagesFolderAlias = "/[Adept8.3/tmp/ADM/](http://localhost/Adept8.3/tmp/ADM/)"
Dim dstStream As New MemoryStream()
doc.Save(dstStream, options)
Dim pos = dstStream.Position
dstStream.Position = 0
Dim reader As New StreamReader(dstStream)
Dim str = reader.ReadToEnd()
oEditor.Html = str
oEditor.ID = "MemoField"

The user can edit the text in the HTML editor and when they click “Save” I need to convert the HTML back to RTF:

Dim HTML As Byte() = System.Text.Encoding.GetEncoding("iso-8859-1").GetBytes(oEditor.Html)
Dim memoryStream As System.IO.MemoryStream = New System.IO.MemoryStream(HTML)
Dim doc As New Document(memoryStream)
Dim options As New RtfSaveOptions()
Dim dstStream As New MemoryStream()
doc.Save(dstStream, options)
doc.Save("c:\temp\rtf\saved-from-ae.rtf", options)

The problem is that the images are saved to disk by your component:

options.ImagesFolder = "C:\inetpub\wwwroot\Adept8.3\tmp\ADM"
options.ImagesFolderAlias = "/Adept8.3/tmp/ADM/"

If I change the path of the imageFolderAlias to:

options.ImagesFolderAlias = "http://localhost/Adept8.3/tmp/ADM/"

I’m could get burned down the road, because I cannot guarantee that the image path is http://localhost/. It could be a fully-qualified domain name or it could be https.

Is there a way to specify the directory where images are located?

Hi Tim,

Thanks for your inquiry.

I think you need to set the BaseUri in the LoadOptions when loading the HTML document. Please see the API page here for details.

Thanks,

Working with the Aspose.Words component to convert HTML to RTF. Local images are not converting properly. The image is located physically in c:\inetpub\wwwroot. The resulting RTF shows a broken image. Please advise.

string sHtml = "";
string sText = string.Empty;
string sBaseUrl = null;
System.Web.HttpRequest oRequester = HttpContext.Current.Request;
#region Convert HTML to RTF with Aspose.Words
byte[] HTML = System.Text.Encoding.GetEncoding("iso-8859-1").GetBytes(sHtml);
System.IO.MemoryStream memoryStream = new System.IO.MemoryStream(HTML);
LoadOptions loadOptions = new LoadOptions(Aspose.Words.LoadFormat.Html, "", "http://localhost/");
Document doc = new Document(memoryStream, loadOptions);
RtfSaveOptions options = new RtfSaveOptions();
MemoryStream dstStream = new MemoryStream();
doc.Save(dstStream, options);
dstStream.Position = 0;
StreamReader reader = new StreamReader(dstStream);
sText = reader.ReadToEnd();
{\rtf1\ansi\ansicpg1252\uc0\stshfdbch0\stshfloch0\stshfhich0\stshfbi0\deff0\adeff0{\fonttbl{\f0\fnil\fcharset0 Times New Roman;}}{\colortbl;}{\stylesheet{\s0\snext0\styrsid8412110\sqformat\spriority0\ltrpar\li0\lin0\ri0\rin0\ql\faauto\rtlch\afs24\ltrch\fs24 Normal;}{\*\cs10\additive\ssemihidden\spriority0 Default Paragraph Font;}}{\*\generator Aspose.Words for .NET 9.6.0.0;}{\info\version0\edmins0\nofpages0\nofwords0\nofchars0\nofcharsws0}\deflang1033\deflangfe2052\adeflang1025\jexpand\showxmlerrors1\validatexml1\viewscale100\fet0\widowctrl\nospaceforul\nolnhtadjtbl\alntblind\lyttblrtgr\nogrowautofit\dntblnsbdb\noxlattoyen\wrppunct\nobrkwrptbl\expshrtn\snaptogridincell\asianbrkrule\htmautsp\noultrlspc\useltbaln\splytwnine\ftnlytwnine\lytcalctblwd\allowfieldendsel\newtblstyruls\lnbrkrule\formshade\nojkernpunct\dghspace180\dgvspace180\dghorigin1800\dgvorigin1440\dghshow1\dgvshow1\dgmargin\pgbrdrhead\pgbrdrfoot\sectd\ltrsect\sectdefaultcl\pard\plain\itap0\s0\ltrpar\li0\lin0\ri0\rin0\ql\faauto\rtlch\afs24\ltrch\fs24{\rtlch\afs24\ltrch\fs24{\*\shppict{\pict{\*\picprop\shplid1025{\sp{\sn fLayoutInCell}{\sv 1}}{\sp{\sn posrelh}{\sv 2}}{\sp{\sn posrelv}{\sv 2}}{\sp{\sn shapeType}{\sv 75}}}\pngblip\picw847\pich847\picwgoal480\pichgoal480\picscalex100\picscaley100\piccropl0\piccropr0\piccropt0\piccropb0\bliptag1766401221{\*\blipuid 694924c50c2e81711f287ad60b8ef0e5}89504e470d0a1a0a0000000d494844520000002000000020080300000044a48ac600000300504c5445000000ffffff808080c0c0c0ffd84498c0000000097048597300000ec300000ec301c76fa8640000004249444154789c63602400188699022614804d01923666868153c0c202c1b84d606101c9e3b3022c4f810984dd40c817b456c08c04b0296040015814e000a30a20000034d404691ed77d330000000049454e44ae426082}}{\nonshppict{\pict\pngblip\picw847\pich847\picwgoal480\pichgoal480\picscalex100\picscaley100\piccropl0\piccropr0\piccropt0\piccropb0\bliptag1766401221{\*\blipuid 694924c50c2e81711f287ad60b8ef0e5}89504e470d0a1a0a0000000d494844520000002000000020080300000044a48ac600000300504c5445000000ffffff808080c0c0c0ffd84498c0000000097048597300000ec300000ec301c76fa8640000004249444154789c63602400188699022614804d01923666868153c0c202c1b84d606101c9e3b3022c4f810984dd40c817b456c08c04b0296040015814e000a30a20000034d404691ed77d330000000049454e44ae426082}}}{\rtlch\afs24\ltrch\fs24\par}{\*\latentstyles\lsdstimax267\lsdlockeddef0\lsdsemihiddendef1\lsdunhideuseddef1\lsdqformatdef0\lsdprioritydef99{\lsdlockedexcept}}}

Hello

Thanks for your inquiry. Could you please attach your input HTML and output RTF here for testing? I’ll check the problem on my side and provide you more information.

Best regards,

Both are already posted. The html is in a string (very simple) and the resulting RTF is in a comment at the end.

Hello

Thank you for additional information. I cannot reproduce the problem on my side using the latest version of Aspose.Words (10.6.0). Here are my steps:

  1. I have created virtual directory on my machine.
  2. Then I have added images to this directory
  3. And then I have converted HTML to RTF.

Moreover I have tried using the following code, and it works fine too:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
string baseUri = "";
builder.InsertHtml(baseUri + "<img alt='' src='../images/test.jpg />");
doc.Save("C:\\Temp\\out.rtf");

Best regards,

I found that your sample code DOES work (though it has a typo), but I am using a Web Application that has explicit permissions, which is quite different than a virtual directory.

I’m confident that this is a permissions issue, as images from other domains and virtual directories do work, but images in my web application do not.

Can you please help resolve this?

Quite simply, the virtual directory is converted to a web app, with the ASP.Net v4.0 Classic App Pool, and a set user using Windows authentication.

Hi

Thank you for additional information. I think, you can use the approach suggested here to work the problem around:

https://forum.aspose.com/t/58948

Hope this helps.Best regards,

Thank you for the suggestion; however, it doesn’t seem to work: the regular expression doesn’t find any of the image tags in my HTML.

I’m not great at regex and I tried a couple of variations:

<img[^>]+src[\\s='\"]+([^\"'>\\s]+)/is

"<img\\s+src\\s* =\\s*[\"']([^\"']+)[\"']\\s*/*>"

So, with the issue at hand, are you saying that there is a known problem with websites that use authentication?

If I get your sample working, won’t I have to manage each of the image types, e.g. gif, png, jpg, bmp, etc.? Why doesn’t the component handle this automatically?

Ok, forget my last post - I figured out how to make it work using some of your suggestions.

Let’s consider this resolved.

Hi

It is perfect that you managed to resolve the problem. Please let us know if you need more assistance, we will be glad to help you.

Best regards,