HTML Conversion issues with Word HTML tabs

In my project I have word fromatted html documents in a database the content was written with Free Text Box. I have moved from using word Automation to insert the html into a document template to aspose.words to create the document on the server side.
I am experencing a formatting issue when reading the word formatted html into aspose.words in areas where there should be a tab it is replaced by ~ 8 spaces.

The HTML segment that is giving me issues is:

<P class=MsoPlainText style="MARGIN: 0in 0in 0pt 0.5in; TEXT-INDENT: -0.5in; TEXT-ALIGN: justify"><SPAN style="FONT-SIZE: 12pt; FONT-FAMILY: ‘Times New Roman’">B.<SPAN style="mso-tab-count: 1">         </SPAN>

Any ideas on how to rectify this issue? will aspose.words read the word css?

Hello!
Thank you for your request.
Tabulation and tab positions are always a problem in HTML. Standard HTML 4.01 and CSS 2.1 have nothing to do with them. Microsoft Word maintains such features with its specific attributes such as mso-tab-count. They are not proprietary but other applications may not support them and many people don’t like to deal with Microsoft Office “magic” in HTML. Our main goal is working with “pure” HTML without such “magic”. Yes, we roundtrip some features as exceptions to this rule. For instance we export and import section breaks via mso-break-type. There is no other means to do that. Some other “magic” features are optional, such as exporting document properties.
Possible workaround is pre-processing or post-processing documents. If these tabs are produced by some application or control that you cannot tune up then this would be the only chance.
I have linked your request to the appropriate known issue. We’ll notify you on any progress. Please feel free to share any feedback, it’s much appreciated.
Regards,

Is there any method that I can send a tab /t to aspose.word by modifying words magic html string?

I have also tried converting the document with aspose.word via the SaveType html function but this has even worse formatting. It appers that the SaveType uses FreeTextBox as well but the formatting comes out differntly then when I pass the rich text directly to FTB and convert it to html.
Thanks for the prompt responce.

There is no way to force a tabulation by outputting ‘/t’ character directly to HTML. You can output it but any number of tabs and spaces between words will be replaced by exactly one space on building document layout. Microsoft Word simulates tabs with padding sequences besides writing its own attributes. This is needed to view documents in any browser that has nothing to do with special “magic” attributes. Each padding sequence consists of non-breaking spaces plus one ordinary space. Number of characters is calculated according some rules, in general unknown. I tried simulating tabs the same way in HTML export module but this is a very complex task. To do this we have to calculate all tab positions in a paragraph and measure width of all its parts between tabs. Hopefully you can somehow tune up Free Text Box control or post-process its output to produce similar layout but without tabs. This doesn’t cover all cases but if you show me a sample more than of one line then I would try to invent a programmatic workaround. Maybe table-driven approach could be applied here or you could try spans with “display:inline-block” CSS attribute.
Regards,

Previously I was using word vba to insert the htlm data into the bookmarks. when the html file is opened with word and converted from html the tabs are present in the document. the mso-tab-count is stripped out by aspose.words
Hhere is the html segment we have for the tab

<SPAN style="mso-tab-count: 1">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN>

it is replaced with

I decided just to do a replace after the html is inserted into the document.

doc.Range.Replace(ControlChar.NonBreakingSpace + ControlChar.NonBreakingSpace + ControlChar.NonBreakingSpace + ControlChar.NonBreakingSpace + ControlChar.NonBreakingSpace + ControlChar.NonBreakingSpace + ControlChar.NonBreakingSpace + ControlChar.NonBreakingSpace + ControlChar.SpaceChar, ControlChar.Tab, false, false);

Thanks

Hello!
Thank you for your experience.
This workaround will work until we change the way we output tabs. So you should check the results next time you upgrade Aspose.Words.
Regards,

The issues you have found earlier (filed as WORDSNET-1973) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(15)