Problems with converting a .doc to html

Dim DocPath As String = Me.Page.Request.QueryString("PathFile")
Dim Document As Aspose.Words.Document = New Document(DocPath) ' a word document with table and images

I have problems with converting a .doc to html. I use the following code:-

Document.MailMerge.Execute(MyRow) ' fill the original word doc with data
DocPath = Left(DocPath, DocPath.Length - 4) ' ' strip the extension
DocPath = DocPath + ".htm" 'add an extension
Document.Save(DocPath, SaveFormat.FormatHtml)
email.AddAttachment(DocPath)
' sent it

The received document does not contain images! the html file even does not refer to them. See attachment.

The manual of aspose words says that the included pictures are saved in the same directory as the original document. But when inspecting that directory there are no pictures at all. this directory is created by an asp.net program. Networkservice has full rights.

Pls advice .

Somehow, the document failed to be attached. Please reattach it.

Best regards,

Vladimir,

I do not understand you … the doc is uploaded to the site, afterwards saved as htm

the first version on the clientside (a .doc) is ok

the next version (a .doc) (after upload) on the server side is ok

opening with word and saving as htm gives a good version (.htm). This htm file differs much from the version which is produced by Aspose Words. Even not emailing the htm but storing it is going wrong. the pictures are lost.

What is your advice …

Please attach the files before and after conversion. I mean the initial doc file and the html file converted from the initial file by Aspose.Words. After seeing these files I can check what is wrong and give recommendations.

In you first letter you said: “See attachment.”. But there is no attachment to your letter. And we need the files illustrating the problem before I could give any advice.

Best regards,

herewith the files.

The pictures in your document are ‘included’, which means that they are stored externally, in some files outside the document. Our current implementation of HTML export copies thes files locally naming them xxx.001.jpeg, xxx.002.jpeg, etc., where xxx is the name of the resulting HTML file without extension. To have the resulting HTML display correctly after being e-mailed, you have to attach these picture files to e-mail together with html file.

It will perhaps be more logical if we just copied the reference paths (C:\htmlemail_files\image002.jpg) from the document to the resulting html. But this will be even worse for your scenario, as your correspondent will receive html referencing picture files, which are stored locally on your computer.

MS Word also does not handle this situation nicely. It actually converts absolute picture file references in your document, like "C:\htmlemail_files\image002.jpg" to relative references like “…/…/…/…/…/htmlemail_files/image001.jpg” which is also not good for your scenario.

So, all in all it seems that current implementation of included pictures export to HTML is the best that you can get. You need only to include the created picture files in the e-mail.

Please note that I have tested all described above in Aspose.Words 4.0, which is released today. I am not sure that the previous versions were exporting included pictures to HTML correctly. So to get correct results please download the latest version.

Best regards,

Vladimir,

Thanks very much for your very fast reply. As allways, i’m impressed by your service. I have still one question. In my situation the docfile from which the htm file is derived is actually stored in a directory structure like:-C:\Inetpub\wwwroot\appname\mailingupload\1. So each client belonging to the domain 1 stores all its files below subdir …\1.

What you saying about the c:\htmlemail_files is (as i see it) only valid if document.doc is converted by Word. I mentioned this scenario only as a way to proof that aspose words (my version) does not handle correctly the conversion process. But this step shall never occurs in my scenario:-

a. upload .doc by user into C:\Inetpub\wwwroot\appname\mailingupload\1\document.doc

b. fetching the doc by AsposeWords and storing it as \Inetpub\wwwroot\appname\mailingupload\1\.htmdocument.htm

c. mailing it by AsposeNetwork.

Does your answer implies that i can refer to xxx.001.jpeg, xxx.002.jpeg as

\Inetpub\wwwroot\appname\mailingupload\1\.htmdocument.htm.001.jpeg ? So that i need to iterate through the directory?

Aat Jan

Well, if you are converting the document with included to HTML with Aspose.Words 4.0 then you have a following scenario:

For example the initial file is called Geachte.doc. It references external pictures in c:\htmlemail_files.

When you convert it to HTML with Aspose.Words 4.0 the file Geachte.html is created. It references the images Geachte.001.jpeg and Geachte.002.jpeg which are copied locally from the images in c:\htmlemail_files to the same dir as where Geachte.html is saved.

Whether you "can refer to xxx.001.jpeg, xxx.002.jpeg as \Inetpub\wwwroot\appname\mailingupload\1.htmdocument.htm.001.jpeg" I cannot say because it is not entirely clear what is nature of this reference. Do you mean reference in user application, or in e-mail, or in html file or somewhere else. Please clarify.

Best regards,

Vladimir, again many thanks for your quick response and hiqh quality explanation!

What i does not understand is how and when the directory c:\htmlemail_files is created. is this one created by Aspose Words or by MsWord? And created during document creation time by Word, or when this is uploaded by the user or when Aspose is doing its job?

This is important, because we want to have sure that pictures which come from other documents which are uploaded may (or shall?) overwrite pictures which are allready in c:\htmlemail_files … so disrupting other documents …

With regards,

Aat Jan

The c:\htmlemail_files is a part of a path to included picture files, which was already set in the document that you have attached to your post, so you should know better how and by whom it was created. You can see it in your document yourself. Just open the doc in MS Word, set the pictures layout to “Inline With Text” and turn field codes on by pressing Alt + F9. You will see the following field code:

INCLUDEPICTURE “C:\htmlemail_files\image002.jpg” \ MERGEFORMAT \d*

Best regards,

Vladimir, it worked as you described but … now the email has tree attachments. And the html file shows an empty box. The two pictures are seperate attachtments without any relation with the html document …

how to say to the html that the attachments are related with content?

With regards,

Aat jan

If you will save html in the same directory as picture files, then it will show them correctly, as their filenames are written inside htmal as src attribute of img element. Please mind that Aspose.Words need to have access to the files stated in the document INCLUDEPICTURE field at the time of conversion. Otherwise the ‘no picture’ images (red crosses) will be shown. Also make sure you are using Aspose.Words version 4.0.

Best regards,

Hi Vladimir, thanks for your explanation. I saw it. The pictures are shown in the generated HTM. They are not shown in the htm which is transferred by email. I guess i need to do the trick of embedding objects as shown in the Aspose Networks manual …

With regards Aat Jan

This does not seem to work. I have had contact with one of your collegues of the Aspose.network product. He shall contact with you. In the meanwhile i tried another option: replace in the produced htm the absolute paths to relative paths with the following code:-

Document.MailMerge.Execute(MyRow)
DocPath = Left(DocPath, DocPath.Length - 4)

DocPath = DocPath + ".htm"
Document.SaveOptions.ExportImagesFolder = DocPath

Document.Save(DocPath, SaveFormat.Html)
' nu is het bestand gepersonificeerd en moeten de plaatjes erin worden
' gezet
' open het bestand in een string
Dim sErr As String
Dim sContents As String = GetFileContents(DocPath, sErr)
Dim ImagePath As String = "c:\inetpub\wwwroot\apoonline3\mailingupload\" + CType(Me.Page.Session("idApotheek"), String) + "\"
Dim AllJpegs As String() = System.IO.Directory.GetFiles(ImagePath, "\*.jpeg")
Dim ArrIndex As Integer
For ArrIndex = 0 To AllJpegs.Length - 1
Document = New Document(DocPath) ' de html wordt geopend
Dim picturename As String = "'cid:picture" + CType(ArrIndex, String) + "'"
Dim ImagePathToReplace As String = ImagePath + "\"
Dim PictureNameNew As String = AlleJpgs(ArrIndex).Replace(ImagePath, "")
Document.Range.Replace(Plaatjenaam, PictureNameNew, False, False)
Next
Document.Save(DocPath, SaveFormat.Html)
Me.Page.Session("statusmailing") = "word documenten eruit"

In one way or another: the saved html still contains absolute paths instead of the cidpicture …

Do you see what i cannot see (anymore)

With regards,

Aat Jan

Sorry, I still don’t understand what you are trying to achieve. Could you please manually edit html file to the state that you want it to be and attach it here for illustration. And please tell me what ‘cidpicture’ means. I am not acquainted with this term.

Best regards,

I took that approach from the aspose.network example, sending an html with pictures by email…:slight_smile:

From the aspose.network manual, part sending email with html attachment which contains pictures:-

'add a single embedded object
Dim embedded As Aspose.Network.Mail.Attachment
embedded = New Attachment("D:\\web\\1.jpg")
embedded.ContentId = "1.jpg"
msg.AddEmbeddedObject(embedded)

' add an array of embedded objects
Dim embeddeds() As Aspose.Network.Mail.Attachment
embeddeds = New Attachment(2) {}
embeddeds(1) = New Attachment("D:\\web\\2.jpg")
embeddeds(1).ContentId = "2.jpg"
embeddeds(2) = New Attachment("D:\\web\\3.jpg")
embeddeds(2).ContentId = "3.jpg"
msg.AddEmbeddedObjects(embeddeds)

Dim html As TextHtmlBody = New TextHtmlBody() 
html.Content = ""
msg.Body = html

Want to replace the in the produced “geachte.htm” to

But: opening the geachte.htm and then replacing it does not seems to work, and the embedds statement also does not work (is confirmed by your collegue, see my posts in the aspose.network forum and sending an email with a html attachment with aspose.network also does not seems to work …

Aat Jan

I have also contact with

Team Lead
Aspose Guangzhou Team

about the mailing part of this issue, good guys…

Well, as html is a simple text file I think you don’t need Aspose.Words to change image paths in it. You can do it with a simple string replacement. For example:

// read html file to string

StreamReader sr = System.IO.File.OpenText(htmlFileName);

string text = sr.ReadToEnd();

sr.Close();

// do necessary replacements

text.Replace(@"<img src="“geachte.001.jpeg”"", “<img src=‘cid:1.jpg’”);

// save string back to html file

StreamWriter sw = System.IO.File.CreateText(htmlFileName);

sw.WriteLine(text);

sw.Close();

Hope this helps,

that also did not works. the names of the jpegs are not replaced. the only thing the code does is inserting ^A^A^A strings at certain places. Is the htm write protected or what …?

Aat Jan

It seems that there was a small mistake in the previous coed. Here is the corrected code:

' read html file to string
Dim sr As StreamReader = System.IO.File.OpenText(htmlFileName) 
Dim text As String = sr.ReadToEnd() 
sr.Close()

' do necessary replacements
text = text.Replace("<img src=\"geachte.001.jpeg\"", "<img src='cid:1.jpg'")

' save string back to html file
Dim sw As StreamWriter = System.IO.File.CreateText(htmlFileName) 
sw.WriteLine(text)
sw.Close()

Best regards,