Word To HTML Conversion for document with revisions

Hi,

we use aspose words product at our work. We have a requirement to convert word document containing revisions(without accepting revisions) to html. we are able to convert it but somehow the class for tags in the html(affecting the styles) are not correct.
I’m attaching the samples in zip (to_aspose.zip). The zip contains the below
1.Header.doc is the sample document (note the same happens in .docx as well).
2.Header_aspose_html_conversion.doc.html is the output from aspose word to html
3.Manual_Header_html_conversion.html is the output from opening the document using ms word and file => save as (html ) option (not using aspose)

we can clearly see the difference in bullet between two htmls, also the class for “Sub Item 1” should be “outlinetxt3” and it is “outlinetxt2” instead. These misses are causing a lot of alignment issues for us.

Can someone help us on how to proceed?

to_aspose.zip (12.1 KB)

@gouthamrangarajan
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-26627

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Thank you. While you are working on this item, can you suggest a good workaround for this for now ?

@gouthamrangarajan Unfortunately, currently we cannot suggest you any workaround of the problem. The issue is currently in the queue for analysis, once analysis is done we will be able to provide you more information or a fix. Please accept our apologies for your inconvenience.

If the output HTML is for viewing purposes, i.e. it is not supposed to be edited or processed, you can consider using HtmlFixed format. In this case the output looks exactly the same as it looks in MS Word:

Document doc = new Document(@"C:\Temp\in.doc");

HtmlFixedSaveOptions opt = new HtmlFixedSaveOptions();
opt.ExportEmbeddedCss = true;
opt.ExportEmbeddedFonts = true;
opt.ExportEmbeddedImages = true;
opt.ExportEmbeddedSvg = true;

doc.Save(@"C:\Temp\out.html", opt);

out.zip (32.2 KB)

HtmlFixed format is designed to preserve original document layout for viewing purposes. So if your goal is to display the HTML on page, then this format can be considered as an alternative. But unfortunately, it does not support roundtrip to DOCX at all.

Thank you , will try this ,
also can i understand further on what you mean by “roundtrip to docx” ?
it does not understand docx format ? is it what you mean?

@gouthamrangarajan I mean that HtmlFixed document cannot be converted back to DOCX document.

I understand, Thank you , will check if the code suggested in previous comment will work out for us

1 Like

Hi,
we tried the proposed work around , it did not work out for us. We do have to parse the html for our processing and styles are important for this processing (not just content alone). I do see the issue status for WORDSNET-2667 as Analysis Complete , can I get the output of the analysis ?

@gouthamrangarajan Yes, we have completed analyzing the issue and determined the root cause of the problem. The test document has list items with insert and delete revisions. When Aspose.Words handles a list item, it gets extended paragraph attributes without Revised flag, so incorrect list level is detected for second list item and numbering becomes broken. We will keep you updated and let you know once the issue is resolved. Unfortunately, the issue is not yet scheduled for development, so there are no estimates available right now.

Thank you for respoding

Hi,
I got the quote for paid support and unfortunately it seems high for us .
Can i get a confirmation that WORDSNET-26627 will be fixed with high priority if we reach via paid support ?
If so I can push it with confidence at my work place

@gouthamrangarajan Paid support services allows to push the issues upper in the queue and the issues priority is raised. So our development team work on priority issue at first.

Thanks for responding :ok_hand:

1 Like

Hi,

is there any possibility to get the timeline for resolution (issue fix deployment) once this issue is pushed to priority (when i reach via paid support) ? I’m still working to get the paid support approved at my work place

@gouthamrangarajan I will consult with our development team and provide you more information once get answer from them.

@gouthamrangarajan The fix can be simple, we can localize it and provide revised paragraph attributes just for list’s marker. We are going to check other code for places, where we missed revised paragraph attributes, but we can do this in separate issue. So this particular issue can be resolved in next 24.4 (April) release.

Thanks for responding. Just confirming, after the fix the class name, tags and indentation in the output html will be correct right? (currently it is incorrect). We are not looking for any additional attributes which we have to work on further, also can i get a example if possible on how the output looks after the fix ?

@gouthamrangarajan I will consult with our development team and let you know once get answer from them.

@gouthamrangarajan Here is the output that will be produced after the fix:
Demo.zip (894 Bytes)

Thank you so much , let me look into it

1 Like