Using Aspsose Words for Java, we load up Word documents, save them as XHTML the nrun them through XSLT to prodcue our required outputs.
In a previous version of Aspose.Words there was an issue where the process errored if the Word document did not have all of the revisions in the document “accepted”.
The workaround was to use “AcceptAllRevisions” in the Java Aspose layer prior to exporting to HTML.
While this worked it meant we did not have access to the revision data (e.g. deleted text or markers for inserted text).
Since then you have fixed the issue and documetns with revisions no longer crash. However whe nthe developer ran a test the HTML output contained all revisions (including deleted text) in elements etc but without any indication whether or not they were deletions or insertions.
What we woudl like to acheive is that ability to have class information on the spans or other elements indicating if they are deletions so that we can choose to display them with different formattign (e.g. with strikehthrough and/or in a different colour) or optionally drop the deleted text.
Is there a setting whe ncreating HTML output that can do this?
or is there an easy way to preprocess the revisiosns in the DOM so that the standard HTML output (whcih works well) can pick up on this information?
Hi
Thanks for your inquiry. I will consult with developer responsible for HTML module in Aspose.Words and provide you more information regarding this issue.
Best regards.
I’m looking how MS Word saves revisions in HTML format. They look as this:
<p class=MsoNormal>
<span lang=EN-US style='mso-ansi-language:EN-US'>
This is the
<span class=msoIns>
<ins cite="mailto:Ace%20Target" datetime="2009-02-23T02:15">
second
</ins>
</span><span class=msoDel>
<<?xml :namespace prefix=st1 ns="urn:schemas-microsoft-com:office:smarttags" ?>
<st1:State w:st="on"><st1:place w:st="on">del</st1:place></st1:State> cite="mailto:Ace%20Target"
datetime="2009-02-23T02:15">first </del>
</span>edition.
</span>
</p>
We also can output deletions and insertions in <st1:State w:st="on">del</st1:State><st1:place w:st="on"><st1:State w:st="on">del</st1:State></st1:place> and <ins></ins> and/or use special character styles. I have created a new issue #7611 to support revisions semantically. I cannot give any time estimate since the feature seems to be not so important.
You can work-around this case by assigning some custom styles to delete and insert revisions. To determine revisions in document content you can use IsDeleteRevision and IsInsertRevision of Inline class: https://reference.aspose.com/words/net/aspose.words/inline/
Thanks for the prompt response, we will try the workaround, I will post here if we have any issues.
When you come to develop the issue #7611 please make sure that you provide an option to have the semantic information rather than simply using strikethrough/underline charcater styles as if those are used we will not be able to distinguish text that has been deleted from text that has manual strike through applied
Hi!
Thank you. That’s right. Of course we’ll output them semantically. Ideally all document elements having different designation should have different styles even if they coincide in moment. I showed how MS Word outputs revisions. We’ll basically utilize the same ideas but avoiding MS Office specific magic in HTML output. Please let us know if you have any other ideas.
Regards,
Reflecting MS markup should be fine (providing that all attributes are quoted and it works for HTML/XHTML output.
I am, not sure why in previous example the Microsoft markup has both <st1:state w:st="on"><st1:place w:st="on">del as this will make post-processing more complex. I am happy to have a span with a known class (e.g. msoDel or an element “del” but cannot see the need for both.