Hello,
when saving a document to HTML, the link changes.
Is the link wrong? Can you take a look at it for me…
change & to & even though it’s a parameter separator and then the link doesn’t work
Word LINK:
https://linkproxy.cezdata.corp/SSD_ISSPDEDU_DOC?pSSDArray=%7B9999;2021-10-12T10:17:13%7D&docID=3000&url=spd%2Fdocmgr%2FdetailDoc.do%3Bjsessionid%3DPNOHNMOPBCAB%3FdocID%3D3000%26amp%3Bview%3Dnull
Link in HTML
[type or paste code here](https://linkproxy.cezdata.corp/SSD_ISSPDEDU_DOC?pSSDArray=%7B9999;2021-10-12T10:17:13%7D&docID=3001&url=spd%2Fdocmgr%2FdetailDoc.do%3Bjsessionid%3DPNOHNMOPBCAB%3FdocID%3D3001%26amp%3Bview%3Dnull)
@benestom
Summary
When Aspose.Words saves a document as HTML it HTML‑encodes characters inside hyperlink URLs (e.g., & becomes &). The encoded URL works in a browser, but if you copy the raw <a href="…"> markup into Markdown or another plain‑text format the entity remains literal and the link breaks.
Why it happens
HtmlSaveOptions has the property ExportUrlEncoded set to true by default. With this setting Aspose.Words follows the HTML specification and escapes characters that have special meaning in HTML markup (&, <, >, etc.). The resulting HTML therefore contains & inside the href attribute. Browsers decode the entity automatically, but downstream processors that treat the markup as plain text do not, leaving the URL malformed.
What to do next
- Disable URL‑encoding when saving – this writes the hyperlink exactly as it appears in the source document.
Load the source document
var doc = new Aspose.Words.Document(@"C:\Input\MyDoc.docx");
Configure HTML save options
var htmlOptions = new Aspose.Words.Saving.HtmlSaveOptions
{
Prevent Aspose.Words from converting '&' to '&' etc.
ExportUrlEncoded = false
};
Save as HTML
doc.Save(@"C:\Output\MyDoc.html", htmlOptions);
- If you need to keep the default encoding (e.g., for strict HTML compliance) and only require a clean URL for later processing, post‑process the generated file:
string html = File.ReadAllText(@"C:\Output\MyDoc.html");
Replace the most common entity; extend the logic if other entities appear
html = html.Replace("&", "&");
File.WriteAllText(@"C:\Output\MyDoc_clean.html", html);
- Validate the output
- Open the saved HTML in a web browser – the link should work as before.
- Copy the
<a href="..."> tag into your Markdown document; the URL should now contain plain & characters and function correctly.
If you notice any other encoded characters, add corresponding replacements in step 2.
These steps let you control whether hyperlink URLs are HTML‑encoded, ensuring they remain usable in downstream plain‑text contexts.
@benestom Could you please attach the problematic input and output documents here for testing? We will check the issue and provide you more information.