Aspose.Words convert to html problem

ikorachredmatch · July 9, 2008, 5:27am

hello, i bought Aspose.Words component yesterday,
what i wanted was to grab a word document text to my db and to convert this document to html,
the content is grabbed good, but the html conversion is not done properly,i can convert but the html is not styled correctly that means the colons are always not as streight as they were in the document or how it was on the word automation conversion. i downloaded the last version of Aspose.words
(Aspose.Words for .NET 5.2.1) but still no change.
i need your help a.s.a.p, i am adding an example html and document to ilustrate the problem, notice that the document which was used to create this html was streighten correctly.

Klepus · July 9, 2008, 7:42am

Hello!
Thank you for choosing Aspose.Words.
Conversion to HTML format has some restrictions. We don’t guarantee full fidelity. MS Word produces closer results but it uses some “magic” uncommon for traditional HTML. Many people complain on HTML output by MS Word. That’s why we avoid or minimize that “magic”.
Let’s consider some particular issues that are important for you. For instance paragraphs with tabs are not well output so we see horizontal misalignment near “Organization” word and similar places. You can replace these fragments with invisible tables as it is usually done in HTML world. Let me know whether you can edit documents manually or only programmatic workarounds are applicable.
See this spreadsheet for information about converter possibilities:
https://releases.aspose.com/words/net
Regards,

ikorachredmatch · July 9, 2008, 8:05am

Hi
Thanks for your reply,
but still i need to convert my documents to html as close as possible to the original,
the main problem is the “colons” : (i guess this is the tab problem) as you can see in the result, this is critical and cant be compromised, i have many docs like these so i need a programmatic solution or workaround, if can help me with this
thanks

Klepus · July 9, 2008, 9:20am

Thank you for clarification.
Programmatic workarounds are good if only you can formalize what you’d like to change. I see that fragments like this need refactoring:
2. Organization : Ranbaxy Laboratories Ltd
(Global Consumer Healthcare Division)
Designation : Area sales Executive
Tenure : Since Nov. 2002
We can assume that such fragment always starts with number followed by a period and a tab character. It ends where first empty paragraph occurs. With these assumptions we can parse contents to perform some “data mining” and create a table with 4 columns. Column width should be tuned experimentally. Number of rows will depend on what we find here: number of colon-delimited pairs. Row height should be automatic. After that the whole fragment can be substituted with the table. Let me know whether this approach is suitable for all (the most of) documents you have.
Regards,