We have a defect on HTML to JSON conversion for hyperlinks

Hi Team,
When we are converting HTML to JSON, we are seeing href’s is missing for hyperlinks in converted data. sample example is given below.

HTML:

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
CVE IDAffected Third-Party ComponentAffected VersionsSummarySeverityMitigation
CVE-2022-212332022.2 IPU - Intel® Processor AdvisoryDell Server PowerEdge BIOS versions prior to 1.7.5INTEL-SA-00657MediumNo

JSON:
{
“CVE ID”: “CVE-2022-21233”,
“Affected Third-Party Component”: “2022.2 IPU - Intel? Processor Advisory”,
“Affected Versions”: “Dell Server PowerEdge BIOS versions prior to 1.7.5”,
“Summary”: “INTEL-SA-00657”,
“Severity”: “Medium”,
“Mitigation”: “No”
}

INTEL-SA-00657 contains hyperlink with url “https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00657.html
This hyperlink is missing when converted to JSON.

same issue is happening for all anchor tags with text, the URLs are missing during the convertion process.

Input

Dell customization of the operating system recovery image includes Windows, Ubuntu, or Linux operating system and all the factory-installed device drivers for that specific platform.

\n
\n Note: The Dell operating system recovery image must be used on Dell computers only. It is not designed or tested for use on non-Dell computers.\n
\n

To learn about downloading the operating system recovery image using a non-Windows computer, see the Dell knowledge base article How to Download and Use the Dell Operating System Recovery Image in Ubuntu or Linux.

output
[
{
“Dell customization of the operating system recovery image includes Windows, Ubuntu, or Linux operating system and all the factory-installed device drivers for that specific platform.”: "\n \n Note: The Dell operating system recovery image must be used on Dell computers only. It is not designed or tested for use on non-Dell computers.\n "
},
{
“Dell customization of the operating system recovery image includes Windows, Ubuntu, or Linux operating system and all the factory-installed device drivers for that specific platform.”: “\n To learn about downloading the operating system recovery image using a non-Windows computer, see the Dell knowledge base article How to Download and Use the Dell Operating System Recovery Image in Ubuntu or Linux.”
}
]

missing the hyper link (Laptops, PC, Desktop Computers & Monitors | Dell India) here

@JagdishPadala, @KKRIYAZ,

How you are importing/converting HTMLs to JSON by Aspose.Cells for .NET APIs? Could you please share the sample code (runnable) that you are using to get undesired JSON data, we will check it soon.

@Amjad_Sahi:
We are using the same code that is present in website.

using Aspose.Cells;
var workbook = new Workbook(“tab.html”);
workbook.Save(“tab.json”);

tab.html file contains

Dell customization of the operating system recovery image includes Windows, Ubuntu, or Linux operating system and all the factory-installed device drivers for that specific platform.

\n
\n Note: The Dell operating system recovery image must be used on Dell computers only. It is not designed or tested for use on non-Dell computers.\n
\n

To learn about downloading the operating system recovery image using a non-Windows computer, see the Dell knowledge base article How to Download and Use the Dell Operating System Recovery Image in Ubuntu or Linux.

Here “How to Download and Use the Dell Operating System Recovery Image in Ubuntu or Linux” has href as “How to Download and Use the Dell Operating System Recovery Image in Ubuntu or Linux | Dell US”.

As HTML file format is not a valid file to be uploaded, I have copied and pasted the html code above.

using the same snippet. creating the workbook and doing.
tab.zip (543 Bytes)

using Aspose.Cells; var workbook = new Workbook(“tab.html”); workbook.Save(“tab.json”);

@KKRIYAZ, @JagdishPadala,

I did test your scenario/case using your sample tab.html file and noticed the hyperlink (url) is missing. Could you also share your expected JSON data. We will then log appropriate ticket for your requirements/issue into our database for investigation/support.

we are giving two examples here.
1.
Input

Dell customization of the operating system recovery image includes Windows, Ubuntu, or Linux operating system and all the factory-installed device drivers for that specific platform.

\n
\n Note: The Dell operating system recovery image must be used on Dell computers only. It is not designed or tested for use on non-Dell computers.\n
\n

To learn about downloading the operating system recovery image using a non-Windows computer, see the Dell knowledge base article How to Download and Use the Dell Operating System Recovery Image in Ubuntu or Linux.

expected output
[
{
“Dell customization of the operating system recovery image includes Windows, Ubuntu, or Linux operating system and all the factory-installed device drivers for that specific platform.”: “\n \n Note: The Dell operating system recovery image must be used on Dell computers only. It is not designed or tested for use on non-Dell computers.\n "
},
{
“Dell customization of the operating system recovery image includes Windows, Ubuntu, or Linux operating system and all the factory-installed device drivers for that specific platform.”: “\n To learn about downloading the operating system recovery image using a non-Windows computer, see the Dell knowledge base article <a data-lightning-target="_subtab" href="https://www.dell.com/support/kbdoc/000132294/how-to-use-the-dell-hosted-recovery-image-of-linux\” target="_blank"> How to Download and Use the Dell Operating System Recovery Image in Ubuntu or Linux</a>.”
}
]

2
input

\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
CVE IDAffected Third-Party ComponentAffected VersionsSummarySeverityMitigation
CVE-2022-212332022.2 IPU - Intel® Processor AdvisoryDell Server PowerEdge BIOS versions prior to 1.7.5INTEL-SA-00657MediumNo

expected output
{
“CVE ID”: “CVE-2022-21233”,
“Affected Third-Party Component”: “2022.2 IPU - Intel? Processor Advisory”,
“Affected Versions”: “Dell Server PowerEdge BIOS versions prior to 1.7.5”,
“Summary”: “<a href=\"https://www.intel.com/content/www/us/en/security-center/advisory/intel-sa-00657.html\">INTEL-SA-00657 </a>”,
“Severity”: “Medium”,
“Mitigation”: “No”
}

@KKRIYAZ,

Thanks for the expected JSON data.

Since we already reproduced the issue as you mentioned by using your sample HTML file(s) to convert to JSON data. We noticed hyperlinks are missing in HTML to JSON conversion. We have logged a ticket with an id “CELLSNET-52344” for your issue. We will evaluate your issue in details and try to figure it out (if possible) soon.

Once we have an update on it, we will let you know.

@KKRIYAZ @JagdishPadala

  1. In Excel, Hyperlink can only be applied to the whole cell, so if hyperlink is a part of the cell, it will be removed when loading into Excel.
    Please arrange your file as sheet001.zip (1.5 KB)

  2. There is no hyperlink definition in JSON, so there is not such feature availabe in exporting Excel to JSON now.
    Though we can try to export hyperlink as your request with specific settings but we need more time to check whether this feature has been requested by many people.

@simon.zhao @Amjad_Sahi
cc: @KKRIYAZ

We are facing issue with converting the below html text to json.

newtest.zip (307 Bytes)

We have attached the html file which we are using to convert to json.

Pasted html code below for reference.

Affected products 
EMC Data Protection Advisor versions prior to 6.2.3 patch 359


Summary 
EMC Data Protection Advisor 6.2.3 patch 359 contains fixes for multiple Oracle Java Runtime Environment (JRE) security vulnerabilities. 

Please keep us posted.

@JagdishPadala,

We evaluated your demanded feature in details. We are sorry but we cannot support your needs because we cannot simply find JSON tag if value is vertically aligned.

@KKRIYAZ @JagdishPadala
We can not detect Affected products and Summary as the names of the json’s properties.
We only can simply think the first row as header row now.

thanks for the response, could you please have a look into the issue we spoke early (hyper link with text).

@KKRIYAZ,

As we told you earlier, there is no hyperlink definition in JSON, so there is no such feature available while exporting Excel to JSON. However, we will try to export hyperlink as per your request with specific settings but this will surely take sometime to implement it.

Once we have any new information available, we will update you.

@KKRIYAZ
We added JsonSaveOptions.ExportHyperlinkType to check how to export hyperlink in the next version 22.12 as the following codes:

Workbook workbook = new Workbook(@"D:\FileTemp\sheet001.htm");
JsonSaveOptions saveOptions = new JsonSaveOptions();
saveOptions.ExportHyperlinkType = Aspose.Cells.Json.JsonExportHyperlinkType.HtmlString;
workbook.Save(dir + "dest.json",saveOptions);

And the data of html should be as a table as the attached sheet001.zip (1.2 KB)
.

Thanks @simon.zhao fot the quick turnarround.
could you keep us posted when 22.12 will be available for us to use.

@KKRIYAZ,

Sure, we will keep you posted with notification on latest version (once available).

@Amjad_Sahi Hello Amjad, when exactly is the new version release planned? Do you have a timeline which you can share?

@LakshmiLeeladhar,

We plan to publish the next release in this week. So, we will try to publish the new version before the end of this week or in the next week early.

@Amjad_Sahi That’s great news. I hope the new release will contain the fix for the discussed issue?