Tabs convert to spaces in word > html conversion

I am converting documents from MS Word to HTML using the code below. During the conversion, the tabs are being changed to spaces resulting in the html text not being aligned(see samples below). What can be done to prevent this? Thank you.

private static string GetHtml(Document doc, SaveFormat ThisSaveFormat)
{
    string html = "";
    Document docClone = doc.Clone();
    using (MemoryStream stream = new MemoryStream())
    {
        docClone.Save(stream, ThisSaveFormat);
        html = Encoding.UTF8.GetString(stream.GetBuffer(), 0, (int)stream.Length);
    }
}

The original document text was alighted as follows using a hard tab in the document:
PHYSICAL EXAMINATION:
Height: 151 cm (<3rd percentile).
Weight: 64 kilograms (75th percentile).
Pulse rate: 80/minute.
Blood pressure: 108/70 mmHg.
General: Well appearing and in no distress.
In HTML, the document showed up as follows:
PHYSICAL EXAMINATION:
Height: 151 cm (<3rd percentile).
Weight: 64 kilograms (75th percentile).
Pulse rate: 80/minute.
Blood pressure: 108/70 mmHg.
General: Well appearing and in no distress.

Hi

Thanks for your request. You are right; tabs are converted into sequence of whitespaces during conversion to HTML. This is because there is no equivalent of Tab character in HTML. MS Word also converts tabs into sequence of whitespaces during converting to HTML.
Regarding text position, please attach your source document here for testing. I will check it on my side and provide you more information,
Best regards.

Attached is the original document I’m working with.
Thank you.

Hi Melissa,

Thank you for additional information. I managed to reproduce your problem on my side. Your request has been linked to the appropriate issue. You will be notified as soon as it is resolved.
The only way to work this problem around, I can suggest you at the moment, is refactoring your document. For instance, you can use table instead of using tabs.
Best regards.

Any approximate timeframe as to when we can expect this to be resolved?

Hi Melissa,

Thanks for your request. Unfortunately, I cannot provide you any reliable estimate regarding this issue at the moment. Exporting tabs is complex issue, so I cannot promise you a fast fix.
Best regards.

Hi,

I too am having a similar issue where a document with a series of tab stops isn’t saving to HTML correctly…

Can I first ask exactly what the issue is that this post is linked to? Is it about getting text to align correctly and is this related to a couple of other posts I’ve seen regarding the style attributes for “tab-stops” and “mso-tab-count”?

I know you’ve mentioned that these attributes are MS specific and not standard to HTML, but are you looking to provide an option to use them?

Either way, any progress here as we are trying to save documents to HTML for sending as HTML emails and this may prevent take-up of this functionality by our clients?

Hi

Thanks for your request. The issue, this thread is linked, is about improving quality of exporting tabs positions to HTML. This issue is still unresolved.
Could you please attach the document you are getting problem with? I will investigate the problem on my side and provide you more information.
Best regards,

No problems, attached is a zip containing the docx and the converted html files.

As you can see, we have a header row with about 6 columns at varying tab positions and some corresponding data rows.

The reason we have done it this way and not used a table is because we are using mail merge with a single merge field for the tab separated data. Our clients did this sort of thing with Word mail merging before our server side document management system and they have hundreds of these sorts of templates that they would not be too keen on changing just to work around a formatting issue with html. Documents and PDFs are all formatted fine.

It’s also a lot simpler for us to simply call the Aspose document MailMerge() function instead of programmatically iterating of the data and building up a table

Hi

Thank you for additional information. Could you also attach your template and data source here for testing? I think, in your case, you can easily use Mail Merge With Regions. I will check your template and data and try to provide you a simple code example.
Best regards.

Thanks for the reply…

Attached is a basic example of our word templates that uses a single merge field to simulate a table structure and the below code is essentially what we are doing. Our CreateDocument method normally takes in the string[] and object[] as parameters which are generated from data retrieved from the database and other sources.

public void CreateDocument()
{
    Document doc = new Document("Template.dotx");
    doc.MailMerge.Execute(
        new string[] { "MemberNo", "MembershipPersons" },
        new object[] { "12345678", "Aleisha\tNasser\tMember\tFemale\t14/07/1956\t27/05/1995\r\nPetrina\tRose\tSpouse\tFemale\t25/12/1956\t1/01/2000" },
    );
    doc.UpdateFields();
    doc.Save("Document.docx", SaveFormat.Docx);
}

A second method simply takes this saved document when required and saves it to HTML to embed as an email. It is at this point in the HTML that the tab separated list doesn’t display properly.

Hi

Thank you for additional information. I created a simple code example, which shows how you can achieve the same using Mail Merge with Regions (Attached is the modified template)

// Data comes in format like the following. We will convert string to DataTable.
string dataString = "Aleisha\tNasser\tMember\tFemale\t14/07/1956\t27/05/1995\r\nPetrina\tRose\tSpouse\tFemale\t25/12/1956\t1/01/2000";
string[] rows = dataString.Split(new string[] { "\r\n" }, StringSplitOptions.None);
// Create structure of DataTable.
DataTable data = new DataTable("MyData");
data.Columns.Add("FirstName");
data.Columns.Add("SecondName");
data.Columns.Add("Relationship");
data.Columns.Add("Gender");
data.Columns.Add("BirthDate");
data.Columns.Add("JoinDate");
// Add rows into the table.
foreach(string row in rows)
data.Rows.Add(row.Split(new string[] { "\t" }, StringSplitOptions.None));
// open template and execute mail merge with regions.
Document doc = new Document(@"Test001\in.dotx");
doc.MailMerge.ExecuteWithRegions(data);
doc.Save(@"Test001\out.doc");

Hope this helps.
Best regards,

The issues you have found earlier (filed as WORDSNET-1035) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(3)

This is definitely working better. Thank you for the update. I am still having trouble with numbered lists greater than 10. It comes out like this(10 & subsequent numbers have no space between the number & the word:

  1. Humulin.
  2. Oxycodone.
  3. Norvasc.

Hi Melissa,

Thanks for your inquiry. To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input Word document.
  • Aspose.Words generated output HTML which shows the undesired behavior.
  • Please create a standalone (runnable) console application that helps us reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we’ll start investigation into your issue and provide you more information.

Best regards,