Free Support Forum - aspose.com

HTML to Excel Conversion missing text

When converting HTML to Excel using below method; certain text is missed.
Attached is the html file used for input and output excel file generated

Check cell A2 in the excel. This is missing text.

The same is tested with Aspose Java version 20.11 and the issue exists.


{
License license = new License();
license.setLicense(“C:\export\rschapps\rschdistpreview\config\Aspose.Total.Java.lic”);

		String inputHtmlPath = "C:\\export\\rschapps\\test\\HTML_to_Excel.html";
		String outputExcelPath = "C:\\export\\rschapps\\test\\HTML_to_Excel.xlsx";
		LoadOptions loadOpts = new LoadOptions(LoadFormat.HTML);
		Workbook disclosureWorkbook = new Workbook(inputHtmlPath, loadOpts);
		disclosureWorkbook.save(outputExcelPath);

}

html_excel_test.zip (10.2 KB)

@jinesh.parikhmca1983,
You may please give a try to the following sample code and share the feedback.

String inputHtmlPath = "HTML_to_Excel.html";
String outputExcelPath = "HTML_to_Excel_Java.xlsx";
LoadOptions loadOpts = new LoadOptions(LoadFormat. HTML );
Workbook disclosureWorkbook = new Workbook(inputHtmlPath, loadOpts);
Worksheet _worksheet = disclosureWorkbook.getWorksheets().get(0);

// Create a Style object using CellsFactory class
CellsFactory cf = new CellsFactory();
Style st = cf.createStyle();
st.setTextWrapped( true );
StyleFlag styleFlag = new StyleFlag();
styleFlag.setWrapText( true );
_worksheet.getCells().getColumns().get(0).applyStyle(st, styleFlag);
_worksheet.getCells().getColumns().get(0).setWidth(250);
_worksheet.autoFitRows();
disclosureWorkbook.save(outputExcelPath);

Thanks for the suggestion. However, it doesn’t resolve the problem.

Please note the issue is not with formatting but the missing content itself.
to verify, below lines of code can be executed
{
LoadOptions loadOpts = new LoadOptions(LoadFormat.HTML);
Workbook disclosureWorkbook = new Workbook(inputHtmlPath, loadOpts);

		Worksheet inputWs = disclosureWorkbook.getWorksheets().get(0);
		System.out.println(inputWs.getCells().get("A2").getValue());

}

the expected output would be:
“Unless specified to the contrary,… Firm’s proprietary electronic distribution platforms.”
2322 characters

However, the actual output is as below :
"Unless specified to the contrary, within EU Member States, the Product is made available by Citigroup Global Markets Limited, which is authorised by the PRA and regulated by the FCA and the PRA. "
195 characters

Same can be verified by comparing the text that shows up on the excel vs the html; even after wrap text and auto-fit

@jinesh.parikhmca1983,
We will look into these details and share our feedback after detailed analysis.

@jinesh.parikhmca1983,
We have observed the scenario where text is missing during HTML to Excel conversion. This issue is reproduced and logged in our database for further investigation. You will be notified here once any update is ready for sharing.

This issue is logged as:
CELLSJAVA-43358 - Text is missing while HTML to Excel conversion

@jinesh.parikhmca1983,

This is to inform you that we have fixed your issue now. We will soon provide you the fixed version after performing QA and incorporating other enhancements and fixes.

The issues you have found earlier (filed as CELLSJAVA-43358) have been fixed in this update. This message was posted using Bugs notification tool by johnson.shi