Html to Excel conversion (in java) not retaining the richtext formats in excel output

I am trying to convert a simple html file to excel file using the following 4 lines of code. It does convert but the output excel doesn’t have the formattings same as html. For example formatting like bold/italics/background color are lost in Excel

     File file = new File("test1.html");
    //Create byte array input stream from the byte array
   ByteArrayInputStream baisHtml = new ByteArrayInputStream(FileUtils.readFileToByteArray(file));

    //Create html load options
   HtmlLoadOptions opts = new HtmlLoadOptions();
   opts .setAutoFitColsAndRows(true);
 //Create workbook from your HTML string
   Workbook wb = new Workbook(baisHtml, opts);

//Save the workbook in output format
   wb.save("test1.xlsx");

We are evaluating your product, this would be one of the decision criteria for us. Hoping to see the response soon

@sunithaprabhu,

Thanks for the code segment and details.

To evaluate your issue precisely, we need your sample HTML and output Excel file. Please zip your html file and output file and attach the archive, we will check it soon.

Archive.zip (11.6 KB)

Attached the html and the corresponding xlsx output file. I would want the formatting as well as the cells borders to be enabled in excel output as how the usual excel looks like.

Please note this is a conventional html not the MS excel specification Html (as I have seen you mentioned in other blogs). I want the conventional html to be converted to Excel. Thanks in advance

Appreciate your quick response
Sunitha

@sunithaprabhu,

Thanks for the zipped archive.

Generally, Aspose.Cells supports MS Excel oriented HTML files for rendering or parsing. It follows Ms Excel standards and specifications when parsing or rendering HTMLs. When I opened your test2.html file manually in MS Excel, it does not display precisely (it does not display same as browser type display). So, it is not an issue with Aspose.Cells. See the screenshot for your reference:
sc_shot1.png (73.1 KB)

Thanks for the quick response.

Last html I sent was kind of created from aspose. But this time I created a html file in conventional way and created excel using aspose api as mentioned earlier. It does seem like preserving the formats to some extent (except the background color not exactly grey and the foreground color). So it seems like working with minor issues. Please see the attached files. FYI this html file doesn’t open in excel.

Why do you say it works only for MS excel oriented HTML file? Is there any reason for it to support only MS excel oriented HTML (although I do see it works for regular html files)

In either way, is it possible to have the ‘grids on’ on the output excel sheet?

thanks in advace
SunithaArchive 2.zip (27.0 KB)

Awaiting for your response

Would you please respond to my last message…
Why does aspose works only for MS excel oriented HTML file? Is there any reason for it to support only MS excel oriented HTML (although I do see it works for regular html files)

In either way, is it possible to have the ‘grids on’ on the output excel sheet?

@sunithaprabhu,
We are analyzing your query and will share our feedback soon.

@sunithaprabhu,

Your second html file has base64 image which MS Excel may not support so it could not open the file properly into it. If we remove it from the source (i.e., “<p><!--<img src="data:image/png;base64......”) and re-save it, MS Excel will open it fine. As I said Aspose.Cells mostly supports MS Excel oriented HTML files. For your question on general/common htmls support, Aspose.Cells does not support all HTML tags, similar limits are there enforced by MS Excel. Although Aspose.Cells still incorporates some common htmls for parsing on some users’ demand from time to time but this is not always feasible for us.

Anyways, for your requirements, I have logged an investigation ticket with an id “CELLSJAVA-43164” for your issue. We will try to figure out your issue (if possible) to support your needs.

Once we have an update on it, we will let you know.

You may try the following line of code before rendering to Excel file:
wb.getWorksheets().get(0).getPageSetup().setPrintGridlines(true);

The gridlines will be shown once you take the print or print preview of the sheet. Moreover, by default, there will be no grid lines in output Excel file if there are no grids in the source HTML file.

@sunithaprabhu,
We have investigated it more and noticed that the first file puts all the styles at the end of the file, because we deal with file streams in sequence, so we recommend putting all the styles at the head of the file.
About the rich text formats, we will provide the fix version before April 23, 2020.

First file was not the ideal file as I pulled that info from the html generated by aspose. If you can figure out the issue on the second file which is the conventional Html file, that would be great. Thanks for filing ticket and working on it. Appreciate it.
By the way the your recommendation of grid on didn’t work as I see in the below code, Workbook object has html content, only while saving it saves as excel. If there is a way to provide this option while saving that might work I think. But with the below code it didn’t work.

    HtmlLoadOptions opts = new HtmlLoadOptions();
   opts.setAutoFitColsAndRows(true);
   //Create workbook from your HTML string
   Workbook wb = new Workbook(baisHtml, opts);
   wb.getWorksheets().get(0).getPageSetup().setPrintGridlines(true);
  //Save the workbook in excel format
   wb.save(destFile);

@sunithaprabhu,
Thank you for the feedback. We will consider it while working on this issue.

As I told you. The gridlines will be shown only once you take the print or print preview of the sheet. Moreover, by default, there will be no grid lines in output Excel file if there are no grids in the source HTML file.

@sunithaprabhu,
You can use the following code to open/close the gridlines.

wb.Worksheets[0].IsGridlinesVisible = true;

If you want to set gridlines automatically when importing html to excel file, please let us know your feedback.

@sunithaprabhu,

This is to inform you that we have fixed your issue (logged earlier as “CELLSJAVA-43164”). We will soon provide you the fixed version after performing QA and incorporating other enhancements and fixes.

Thanks Appreciate your effort. By the way can you send me a code snippet to convert an html file to doc and html to pdf?

It complains there is no reference ‘Worksheets’ in wb for the code ‘wb.Worksheets[0].IsGridlinesVisible = true;’

Tried converting html to doc using the following snippet which didn’t work. Attached both files for your reference. Expecting response.
com.aspose.words.Document doc = new Document(htmlfile);
doc.save(destFile, SaveFormat.DOCX);Archive 3.zip (52.9 KB)

@sunithaprabhu,
Please give a try to the following sample code which adds grid lines to MS Excel file which are visible in the printout and print preview of the MS Excel file. Similarly it uses Aspose.Words for Java which converts the HTML file to DOCX and PDF.

File file = new File("test2.html");
byte[] array = Files.readAllBytes(file.toPath());
//Create byte array input stream from the byte array
ByteArrayInputStream baisHtml = new ByteArrayInputStream(array);
   

//Create html load options
HtmlLoadOptions opts = new HtmlLoadOptions();
opts .setAutoFitColsAndRows(true);
//Create workbook from your HTML string
Workbook wb = new Workbook(baisHtml, opts);
wb.getWorksheets().get(0).getPageSetup().setPrintGridlines(true);
//Save the workbook in output format
wb.save("test2.xlsx");

Document doc = new Document("test2.html");
doc.save("test2.docx", com.aspose.words.SaveFormat.DOCX);
doc.save("test2.pdf", com.aspose.words.SaveFormat.PDF);