On comparing xHTML using Aspose.Words, the compared output shows that Aspose.Words is changing the xHTML; and removing classnames… (Besides adding hundreds of unecesary style attributes on all html elements (which I have generically removed from these elements as they add nothing to the styling of the elements and are completely useless)) …
Examples of these are as follows:
1 - <em>
and/or <strong>
tags are changed to be <span style="font-weight:bold">some strong</span>
and/or <span style="font-style:italic">and italic as em</span>
while this does not necessarily change the way it is presented on the page in a browser, it does have a large impact on screen readers as it would then be presenting a completely different experience that is not reader-friendly.
QUESTION: Is there a flag/setting/configuration that tells Aspose to NOT change html elements?
2 - IDs and current style attributes are removed.
QUESTION: Is there a flag/setting/configuration that tells Aspose to NOT change html elements?
3 - LI items are split out of a UL if they are different (see image)
Other questions in the images…
OLD VERSION:
<div>
<div class="p" id="thisWillBeRemoved">
[heading 3]
<ul id="UL_TT1_DDR_PKB">
<li id="SL18311645-100489">[First list item]</li>
<li id="SL18311647-100489">[Second list item]</li>
<li id="SL18311649-100489">[Third list item] <fn>example: <em>emphasis</em></fn></li>
<li id="SL18311647-100489">[Fourth list item]</li>
<li id="SL18311650-100489">[5th list item] <a href="url" data-scope="internal">Link</a> bottom of the page</li>
<li id="SL18311650-100489">[Last list item]</li>
</ul>
</div>
<div class="p">link to symbol <a href="url" data-scope="internal">@#$%^&*()</a> </div>
<div class="p">[link] <fn><a href="javascript:;" target="_blank" data-scope="external">Target Space</a>, <em id="GUID-67535493-B6CC-4952-95F3-4FB9807480C9">emphasis</em>.</fn></div>
<div class="p">Only <a href="url" data-scope="internal">Link to SPACE</a></div>
<div class="p">Lorem ipsum dolor sit amet, <strong>some strong</strong> aecenas aliquam justo et neque eleifend, id vulputate ligula dictum. Maecenas eget lacinia est.</div>
<div class="p">Fusce iaculis pharetra ex, <em>and italic as em</em> et vestibulum metus fringilla et. Sed condimentum risus vitae dapibus congue.</div>
<div class="p">This is ONLY in the OLD version. Duis molestie velit eu ligula venenatis, ac tincidunt massa semper. Ut ultrices risus orci, facilisis sollicitudin tellus pretium et. Donec a velit eleifend,</div>
<div class="p">This is changed in each version, Vestibulum at congue. Quisque non massa id nibh ornare vel eget quam.</div>
<ul>
<li>Same. Fusce malesuada ligula eu nisl finibus, ut semper metus rhoncus.</li>
<li>ONLY in OLD. Lorem ipsum dolor sit amet, consectetur adipiscing elit.</li>
<li>Changed in both Vestibulum ex suscipit ante convallis.</li>
<li>Same. Fusce malesuada ligula eu nisl finibus, ut semper metus rhoncus.</li>
</ul>
<div id="imgblock">
<div class="figure-block-body">
<figcaption>Image Caption</figcaption>
<img src="2dcee8b-af1b-4cdd-9430-0195c52297e2_1_en-us.jpg" id="image1" alt="Alternate text for image" />
</div>
</div>
<div class="p">In closing, we are done.</div>
</div>
NEW VERSION:
<div>
<div class="p" id="thisWillBeRemoved">
[heading 3]
<ul id="UL_TT1_DDR_PKB">
<li id="SL18311645-100489">[First list item]</li>
<li id="SL18311647-100489">[Second list item]</li>
<li id="SL18311649-100489">[Third list item] <fn>example: <em>emphasis</em></fn></li>
<li id="SL18311647-100489">[Fourth list item]</li>
<li id="SL18311650-100489">[Last list item]</li>
</ul>
</div>
<div class="p">[link] <fn><a href="javascript:;" target="_blank" data-scope="external">Target Space</a>, <em id="GUID-67535493-B6CC-4952-95F3-4FB9807480C9">emphasis</em>.</fn></div>
<div class="p">Lorem ipsum dolor sit amet, <strong>some strong</strong> aecenas aliquam justo et neque eleifend, id vulputate ligula dictum. Maecenas eget lacinia est.</div>
<div class="p">Fusce iaculis pharetra ex, <em>and italic as em</em> et vestibulum metus fringilla et. Sed condimentum risus vitae dapibus congue.</div>
<div class="p">This is ONLY in the NEW version. Suspendisse viverra, elit nec porttitor porta, arcu sem suscipit turpis, non viverra turpis neque ac nisi.</div>
<div class="p">This is changed in each version, Vestibulum at ligula. Quisque massa id nibh ornare pellentesque vel eget quam.</div>
<ul>
<li>Same. Fusce malesuada ligula eu nisl finibus, ut semper metus rhoncus.</li>
<li>ONLY in NEW. Donec finibus arcu ac feugiat iaculis.</li>
<li>Changed in both Vestibulum eleifend ex ante condimentum.</li>
<li>Same. Fusce malesuada ligula eu nisl finibus, ut semper metus rhoncus.</li>
</ul>
<div id="imgblock">
<div class="figure-block-body">
<figcaption>Image Caption</figcaption>
<img src="1122a87f-5953-48a3-93f8-e9d1cfc85e0f_1_en-us.jpg" id="image2" alt="Alternate text for image" />
</div>
</div>
<div class="p">In closing, we are done.</div>
</div>
OUTPUT:
<div>
<p>
<span>[heading 3] </span>
</p>
<ul>
<li>
<span>[First list item]</span>
</li>
<li>
<span>[Second list item]</span>
</li>
<li>
<span>[Third list item] example: </span><span>emphasis</span>
</li>
<li>
<span>[Fourth list item]</span>
</li>
</ul>
<p>
<del><span style="font-family:Symbol"></span></del><span>&#xa0;&#xa0; </span><span>[</span><del><span>5th list item] </span></del>
<del><span style="text-decoration:underline">Link</span></del><del><span style="-aw-import:spaces">&#xa0;</span><span>bottom of the page</span></del>
</p>
<ul>
<li>
<del><span>[</span></del><span>Last list item]</span>
</li>
</ul>
<p>
<del><span>link to symbol </span></del><del><span style="text-decoration:underline">@#$%^&*()</span></del>
</p>
<p>
<span>[link] </span><a href="javascript:;" target="_blank" style="text-decoration:none"><span style="text-decoration:underline">Target Space</span></a><span>, </span> <span style="font-style:italic">emphasis</span><span>.</span>
</p>
<p>
<del><span>Only </span></del><del><span style="text-decoration:underline">Link to SPACE</span></del>
</p>
<p>
<span>Lorem ipsum dolor sit amet, </span><span style="font-weight:bold">some strong</span><span> aecenas aliquam justo et neque eleifend, id vulputate ligula dictum. Maecenas eget lacinia est.</span>
</p>
<p>
<span>Fusce iaculis pharetra ex, </span><span style="font-style:italic">and italic as em</span><span> et vestibulum metus fringilla et. Sed condimentum risus vitae dapibus congue.</span>
</p>
<p>
<span>This is ONLY in the </span><del><span>OLD</span></del><ins><span>NEW</span></ins><span> version. </span><del><span>Duis molesti</span></del><ins><span>Suspendiss</span></ins><span>e v</span><del><span>elit eu ligula venenatis, ac tincidunt massa semper. Ut ultrices ri</span></del><ins><span>iverra, elit nec porttitor porta, arcu sem </span></ins><span>sus</span><del><span style="-aw-import:spaces">&#xa0;</span><span>orci, facilisis sollicitudin tellus pretium et. Donec a velit eleifend,</span></del><ins><span>cipit turpis, non viverra turpis neque ac nisi.</span></ins>
</p>
<p>
<span>This is changed in each version, Vestibulum at </span><del><span>congue</span></del><ins><span>ligula</span></ins><span>. Quisque </span><del><span>non </span></del><span>massa id nibh ornare </span><ins><span>pellentesque </span></ins><span>vel eget quam.</span>
</p>
<ul>
<li>
<span>Same. Fusce malesuada ligula eu nisl finibus, ut semper metus rhoncus.</span>
</li>
<li>
<span>ONLY in </span><del><span>OLD. Lorem ipsum dolor sit amet, consectetur adipiscing elit</span></del><ins><span>NEW. Donec finibus arcu ac feugiat iaculis</span></ins><span>.</span>
</li>
<li>
<span>Changed in both Vestibulum e</span><del><span>x suscipit </span></del><ins><span>leifend ex </span></ins><span>ante con</span><del><span>vallis</span></del><ins><span>dimentum</span></ins><span>.</span>
</li>
<li>
<span>Same. Fusce malesuada ligula eu nisl finibus, ut semper metus rhoncus.</span>
</li>
</ul>
<p>
<span>Image Caption</span>
</p>
<p>
<ins><img src="/images/cache/diff/Aspose.Words.34251782-4ac9-421b-a55a-60d03c25e90d.001.jpeg" width="624" height="110" alt="Alternate text for image" style="-aw-left-pos:0pt; -aw-rel-hpos:column; -aw-rel-vpos:paragraph; -aw-top-pos:0pt; -aw-wrap-type:inline" /></ins><del><img src="/images/cache/diff/Aspose.Words.34251782-4ac9-421b-a55a-60d03c25e90d.002.jpeg" width="800" height="92" alt="Alternate text for image" style="-aw-left-pos:0pt; -aw-rel-hpos:column; -aw-rel-vpos:paragraph; -aw-top-pos:0pt; -aw-wrap-type:inline" /></del>
</p>
<p>
<span>In closing, we are done.</span>
</p>
</div>
CODE: (Split out in functions, but together here to show options)
Document asposeDocument;
using (var stream = new MemoryStream())
{
htmlDocument.Save(stream);
stream.Position = 0;
asposeDocument = new Document(stream);
asposeDocument.AutomaticallyUpdateStyles = false;
}
return asposeDocument;
CompareOptions compareOptions = new CompareOptions();
compareOptions.Granularity = Granularity.CharLevel;
docOld.Compare(docNew, "Compare", DateTime.Now, compareOptions);
HtmlDocument htmlDocument = new HtmlDocument();
htmlDocument.OptionOutputAsXml = true;
HtmlSaveOptions options = new HtmlSaveOptions();
options.HtmlVersion = HtmlVersion.Xhtml;
options.CssStyleSheetType = CssStyleSheetType.Inline;
options.ExportHeadersFootersMode = ExportHeadersFootersMode.None;
options.ExportImagesAsBase64 = false;
options.ExportOriginalUrlForLinkedImages = true;
options.ExportPageMargins = false;
options.ExportXhtmlTransitional = true;
options.ImagesFolderAlias = AsposeImagesPath;
options.ImagesFolder = $"{BaseFilePath}{AsposeImagesPath}";
options.PrettyFormat = true; //Can disable this later!
options.SaveFormat = SaveFormat.Html;
options.ScaleImageToShapeSize = false;
docOld.Save(streamCompare, options);
COMPARISON IMAGES WITH ISSUES/QUESTIONS:
aspose-issues.jpg (914.1 KB)
aspose-img.jpg (605.8 KB)