Link only partially translated from Word to HTML

Greetings,
we have a word document (see attachment) in which a link is given. The link consists of two “parts” separated by a colon. The strange behaviour is, that in the generated html only the first part is a link in html and the second part is underlined but no link. The behaviour is reproducible.
I can’t see anything special about the link so I don’t know what could trigger this.
Your help is appreciated… thanks in advance

Hi Rolf,

Thanks for your inquiry.

While using Aspose.Words for .Net 10.5.0, I was unable to reproduce this issue. However, I would like to suggest you the following link for downloading and using the latest version of Aspose.Words i.e. 10.5.0:

https://releases.aspose.com/words/net

I hope, this will help

Best Regards,

Helo Awais,
thanks for your fast reply and suggestion.
But I’m afraid your suggestion did not work for me. I changed the library from version 10.0.0.0 to 10.5.0.0 as suggested and the result was about the same.
The generated html for version 10.0.0.0 is:

<meta content="text/css" http-equiv="Content-Style-Type" />
<meta content="Aspose.Words for .NET 10.0.0.0" name="generator" />
<div>
 <p style="line-height: 115%; margin: 0pt 0pt 10pt; font-size: 12pt">
  <span style="font-family: arial; color: #ff0000; font-size: 12pt; font-weight: bold">Evaluation Only. Created with Aspose.Words. Copyright 2003-2010 Aspose Pty Ltd.</span></p>
 <p style="margin: 0pt">
  <span style="font-family: arial; font-size: 10pt">&nbsp;</span></p>
 <p style="margin: 0pt">
  <a href="<a href='https://xxxSecretLinkInDocumentxxx.pdf"><span'>https://xxxSecretLinkInDocumentxxx.pdf"><span</a> style="font-family: arial; color: #0000ff; font-size: 10pt; text-decoration: underline">Organisation:</span><span style="font-family: arial; color: #0000ff; font-size: 10pt; text-decoration: underline"> </span><span style="font-family: arial; color: #0000ff; font-size: 10pt; text-decoration: underline">T</span><span style="font-family: arial; color: #0000ff; font-size: 10pt; text-decoration: underline">est Test Test</span></a></p>
 <p style="margin: 0pt">
  <span style="font-family: arial; font-size: 10pt">&nbsp;</span></p>
</div>

The generated html for version 10.5.0.0 is:

<meta content="text/css" http-equiv="Content-Style-Type" />
<meta content="Aspose.Words for .NET 10.5.0.0" name="generator" />
<div>
 <p style="margin: 0pt">
  <span style="font-family: arial; font-size: 10pt">&nbsp;</span></p>
 <p style="margin: 0pt">
  <a href="<a href='https://xxxSecretLinkInDocumentxxx.pdf"><span'>https://xxxSecretLinkInDocumentxxx.pdf"><span</a> style="font-family: arial; color: #0000ff; font-size: 10pt; text-decoration: underline">Organisation</span></a><span style="font-family: arial; color: #0000ff; font-size: 10pt; text-decoration: underline">:</span><span style="font-family: arial; color: #0000ff; font-size: 10pt; text-decoration: underline"> </span><span style="font-family: arial; color: #0000ff; font-size: 10pt; text-decoration: underline">T</span><span style="font-family: arial; color: #0000ff; font-size: 10pt; text-decoration: underline">est Test Test</span></p>
 <p style="margin: 0pt">
  <span style="font-family: arial; font-size: 10pt">&nbsp;</span></p>
</div>

The code I use is just simple:

public string GetHtmlFromWord(Stream stream)
{
    // Load the entire document into memory.
    Document doc = null;
    try
    {
        doc = new Document(stream);
    }
    catch (UnsupportedFileFormatException uffe)
    {
        throw new UnsupportedFileFormatSDLException();
    }
    // You can close the stream now, it is no longer needed because the document is in memory.
    stream.Close();
    // Create a new memory stream.
    MemoryStream outStream = new MemoryStream();
    // Set an option to export form fields as plain text, not as HTML input elements.
    HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.Html);
    // Save the document to stream.
    doc.Save(outStream, options);
    // Convert the document to byte form and in text-string
    byte[] docBytes = outStream.ToArray();
    System.Text.UTF8Encoding enc = new System.Text.UTF8Encoding();
    string html = enc.GetString(docBytes);
    return html;
}

Do you have any more suggestions?
Thanks and Greetings
Rolf

Hi,

Thanks for the additional information.

Using your code, I was unable to reproduce this problem on my side. Please see the attached zip file that contains input (TwoLinks_Input.doc) and output (out.html) files for your reference. The input file contains two links separated by a delimiter (e.g. colon) and the same was observed in output HTML file.

Please share if you mean something else and we shall be happy to discuss and help you out.

Best Regards,

Hi Awais,
using your TwoLinks_Input.doc I also get a link for the whole line, as in your out.html.
I dont’ know what our customers did to generate such a word-file, which makes the problem… I tried to generate a new Word-File to produce the Problem but failed with that.
The Problem only occurs with the delivered file Anschreiben_Test.doc.
When downloading the latest Aspose.Words Version (Aspose.Words_DLLS_10500.zip) from your homepage I am using the dll and xml from the folder net3.5_ClientProfile… perhaps you could reproduce the behaviour with this dll?
Greetings

Hi,

Thanks for the additional information.

I’m afraid I still can’t reproduce the issue using the latest version of Aspose.Words i.e. 10.5.0 and the input document you attached. I even tested it by using the dll that resides inside net3.5_ClientProfile folder.

Moreover, could you please create and attach here a simple little application that enables us to reproduce this issue on our side?

Please let us know if you need more information, We are always glad to help you.

Best Regards,

Greetings Awais,
I tried to reproduce the Issue with a Console-Testprogram and failed, just as you mentioned.
So I could backtrack the cause in the type of stream I used.We are using Aspose in a WebApplication and got an InputStream which was converted to a MemoryStream by the following code:

using(MemoryStream fileMStream = new MemoryStream())
{
    byte[] buffer = new byte[2048];
    int readBytes = 0;
    do {
        readBytes = Request.Files[0].InputStream.Read(buffer, 0, buffer.Length);
        fileMStream.Write(buffer, 0, readBytes);
    } while (readBytes != 0);
    using(MemoryStream fileMStream = new MemoryStream())
    {

After passing the InputStream directly (Request.Files[0].InputStream) the Problem did not occur (can’t remember, why we originally implemented the faulty conversion to MemoryStream).
So the Issue is resolved. Thanks for your good support. Hope this also could help someone other having similar Problems.
Greetings
Rolf

Hello again Awais,

the problem doesn’t seem to be resolved completely.
The processing of the Word document works always fine on my development and our test server, but not on the productive servers of our customer.
It’s very “interesting” that it works in about 40% of the cases. The other 60% of the cases the link is still only partially displayed as link.

Have you any suggestions, what the cause could be or what we could do to resolve the issue?

Greetings
Rolf

Hi Rolf,
Thank you for additional information. However, we have no idea how we can reproduce this problem. Maybe the problem occurs because the links consist of few runs. If so, then calling JoinRunsWithSameFormatting must help you to resolve the problem:
https://reference.aspose.com/words/net/aspose.words/document/joinrunswithsameformatting/
Please let me know if this helps.
Best regards,

Hi Alexey,

thanks for your suggestion and sorry for the delay of my reply (as the error only occurs on our client’s server there’s a “little bit” of overhead).

Sadly the problem still occurs the same way as before

Have you any suggestions what we else could try or what we could do to help you to reproduce the problem?

Greetings
Rolf

Hi Rolf,
Thank you for additional information. Also, the difference can be in culture used on the server and your sides. Could you please check culture used on the server side. Maybe this will help us to reproduce the problem.

Console.WriteLine(Thread.CurrentThread.CurrentCulture);

Best regards,

Hi Alexey,
finally I got the needed feedback from our customer:
Their culture is: de-DE
On our test systems (where the problem does not occur) the culture is also: de-DE
Greetings
Rolf

Hi Rolf,
Thank you for additional information. Unfortunately, I still have no luck. The issue is non-reproducible on my side and unfortunately, I have no clue what can cause this issue on your side.
Best regards,

Hi Alexey,
thanks for your response.

What are the next steps that you suggest to satisfy our customer?
Greetings
Rolf

Hi Rolf,
Thanks for your request. But unfortunately, I do not know what can cause the problem. I tried different ways and still cannot reproduce the problem.
By the way, when you convert the document to other formats, does the hyperlink look correct?
Best regards,

Helo Alexey,

I totally understand, we also can’t reproduce the issue on our side (just occurs on the servers of our customer).
We didn’t try to convert to other file formats and have no possibility to do so in our application.

What are the next steps you propose for us to do?

Greetings
Rolf

Hi Rolf,
Thank you for additional information. Unfortunately, I do not have any more ideas. I think the problem might occur because the document is pre-processed or post-processed by some other tool. Could you please one more time make sure your customer do not pre-process or post-process the document before/after converting it to HTML using Aspose.Words.
Best regards,

Hi Alexey,
thanks for your patience.
I tested the Issue with a Test-Method (code following and as attachment) which loads Data from the local filesystem and writes back to the file system to exclude any pre- or post-processing.

public ActionResult TestLoadLocalWordText()
{
    string tempFilepath = HttpRuntime.AppDomainAppPath + "/Ressources/Anschreiben_Test2.doc";
    bool embeddedObjectFound = false;
    string html = BusinessLayerFactory.CreateUtilityServices().GetHtmlFromWord(tempFilepath, out embeddedObjectFound);
    string guid = Guid.NewGuid().ToString();
    using(StreamWriter file = new System.IO.StreamWriter(HttpRuntime.AppDomainAppPath + "/Ressources/" + guid + ".log"))
    {
        file.Write(html);
    }
    return Json(new ResultContainer
    {
        Success = true, ErrorMessage = null
    }, "text/html");
}

Hope you can see something, which could cause the problems.
Greetings
Rolf

Hi
Thank you for additional information. As I can see your code looks correct. But I would check the produced HTML string before and after sending it as Json string. Maybe the problem occurs when Json response is decoded on the client side.
Best regards,

Hi Alexey,

to verify that the json or any post-processing is not the problem, I implemented the file.Write(html); . In the written file the closing -tag is also at the wrong position in about 50% of the cases. Furthermore I really can’t think of any decoding which could move a closing tag to another position in the DOM-tree.

What are the next steps we could do to resolve the problem at our customer - is there any further possible support-type you could provide?
They are getting “slowly nervous” when the problem will be solved and we want to provide good support to such a worldwide operating corporate group ourselves.

Best Regards
Rolf