We have a requirement to read some legacy reports pushed out in HTML and convert them to Excel. I don't see this feature available in Aspose.Cells, but was wondering if you could offer any guidance to accomplish this task.
We're open to purchasing other Aspose products to assist with the conversion if required. Obviously, we need to preserve the formatting of the text in the HTML.
Thanks for a quick reply. By way of background, our goal is to take HTML reports from a legacy system and use Aspose.Cells to convert those reports to Excel. We DO NOT want to use Microsoft Office to avoid license costs as these conversions must take place on a server farm.
We conducted a test with 7.0.3.4 as you suggested. The newer version does a better job. However, there were some issues. Attached is a zip file with two files: (1) the input HTML file (7.html) and the resulting output produced by Aspose.Cells (Output7.html.xlsx).
As you can see, the table values made it to Excel. However:
The underlining on the column headers were removed.
The number formats were removed.
The HTML header values are not included in the Excel file.
Are the missing items a result of bugs in Aspose.Cells, unsupported features or was there something wrong with our test?
We will look into it if your html report can be fixed while converting it to xlsx report. We have logged this issue in our database. We will update you asap.
I noticed the original post in this thread indicates that issue CELLSNET-40183 has been Resolved. I didn't get notified, please confirm that this problem is in fact resolved.
Secondly, the CELLSNET-40183 hyperlink takes me to a log-in page for a Nanjing JIRA bug tracking system. Is this just for Aspose personnel or do clients have access to the system?
Thank you for the quick reply to this issue. As you suggested, I ran a test with the latest 7.1 version. The test demonstrated good progress. You guys did address the three explicit issues I had identified earlier. However, I believe your conversion logic still requires a little tweaking. For example,
The numeric columns in the spreadsheet (C, D & E) lost their formatting. They were comma delimited in the HTML. The cells should be of Number type.
The percentage column (F) did not convert as a Percentage type.
With respect to item 1, the conversion should respect the prevailing culture setting of the thread performing the work. It should recognize numeric values even if the number is delimited by something other than a comma (e.g. some cultures use periods as thousand-delimiters). For the record, I didn’t test this but want to make sure it’s covered on your end.
For your convenience, I am attaching the test HTML input file and XLSX output file from my test.
Again, I appreciate the improvements made by 7.1 and look forward to you guys addressing the aforementioned issues.
Kindly confirm that the above will be issues you will look into. If you feel these are not valid issues please let me know.
I noticed the CELLSNET-40183 issued is flagged as RESOLVED. Does Aspose still plan to look into, and hopefully resolve, the remaining issues reported (see previous post)? If so, why is the issued flagged as Resolved? I have a pending project kind of waiting on the ability of Aspose.Cells to accurately convert HTML reports to Excel.
Judging from the output it looks like you addressed the formatting issues I previously raised.
The only item I noticed is the "Total:" literal on column "A", row "11". On the spreadsheet it's normal font even though in the source HTML it's bold. I didn't pick up on this in my last review and neither did your developers. Kindly fix this bug. Nevertheless, this is forward progress.
The v7.1.0.1 link doesn't work for me. I presume this is because it is not yet publically available. I can wait for the next release, especially if you can address the aforementioned font issue.
Thanks for providing access to 7.1.0.2. I ran a test. Unfortunately, the field in question is still not being bold in the spreadsheet as in the original source HTML file.
Attached is a zip file containing (1) the original HTML, (2) a JPG with the field circled in red, and (3) the XLSX output file annotated to again highlight the erroneous field.
Let me know if I can be of any further assistance to resolve the problem.
That's great news. I appreciate all your help. The tool is at a point that it's definitely usable for us.
FYI, there's one other cosmetic issue that was brought to my attention yesterday by a fellow team member. The Aspose.Cells HTML-to-Excel conversion apparently is ignoring the HTML table cell alignment attribute (e.g. align="right" ). In the sample report I gave you, in my previous post, all of the column headings have explicit align values, but they're not aligned accordingly in the Excel output. This won't be a show stopper for us, but that added fidelity and attention to detail just makes the product that much better.
I look forward to the next release with the stated bug fixes.