Converting HTML into CSV in Java

Hi Team ,

I want to check if there is any way to convert HTML into CSV by using Aspose .
Which Productto use ? will Aspose Cells work ? Could you please share some sample code .

Thanks.

@nvn16,

Yes, Aspose.Cells does support reading, writing and converting HTML and CSV file formats, so you may try it. Let us know if you find any issue or have some other queries, we will be happy to assist you soon.

Hi Amjad ,

could you please share code sample ?

Thanks,
Akshay

@nvn16,

See the simplest lines of code to convert an HTML file to CSV file format for your reference:
e.g.
Sample code:

HTMLLoadOptions options = new HTMLLoadOptions();//you may set different HTMLLoadOptions properties (if you want).
Workbook workbook = new Workbook(filePath, options); 
TxtSaveOptions opts = new TxtSaveOptions();//you may set different attributes of TxtSaveOptions for your needs.
workbook.save("C:\\Users\\sg\\Desktop\\test.csv", opts);

Hope, this helps a bit.

Hi Amjad ,

Below code worked for very simple html file .
But It is not able to convert complex html file having data consisting of chinese characters .

try {
String path = “D:/Akshay/”;
HtmlLoadOptions options = new HtmlLoadOptions();
options.setAutoFitColsAndRows(true);
options.setCheckDataValid(false);
options.setEncoding(Encoding.getUnicode());

	//options.setDeleteRedundantSpaces(true);
	Workbook workbook = new Workbook(path + "sample.html", options);
	TxtSaveOptions opts = new TxtSaveOptions(SaveFormat.CSV);
	workbook.save(path + "outputchinese.csv",opts);
	} catch (Exception e) {
		// TODO Auto-generated catch block
		e.printStackTrace();
	}

I am not getting any exception but output file is blank . Any additional configuration for complex html file with chinese data .

@nvn16,

Please note, Aspose.Cells can parse/convert MS Excel oriented HTML. I mean any HTML which MS Excel can open it properly or render to Web Page. Complex HTML might not be supported in MS Excel either. Anyways, for your issue, please make sure that Chinese fonts are installed on your machine before using your sample code. If you still find the issue, kindly zip and attach your template HTML file and output CSV file, we will check it soon.

sample1.zip (265 Bytes)

please find attached html file . It has chinese data . Please refer code in above comment . Let us know if we need to make any changes .

@nvn16,

I checked your encoding type is not right for your file. That’s why you are getting blank data. I removed the following line and it works fine and Chinese data is rendered fine.

options.setEncoding(Encoding.getUnicode());

Thanks Amjad .

For complex HTML - i am getting
java.lang.IllegalArgumentException: Invalid column index.
at com.aspose.cells.zarg.a(Unknown Source)
at com.aspose.cells.Cells.get(Unknown Source)
at com.aspose.cells.zakl.n(Unknown Source)
at com.aspose.cells.zakl.a(Unknown Source)
at com.aspose.cells.zakm.a(Unknown Source)
at com.aspose.cells.zaks.o(Unknown Source)
at com.aspose.cells.zaks.l(Unknown Source)
at com.aspose.cells.zakt.a(Unknown Source)
at com.aspose.cells.zjw.a(Unknown Source)
at com.aspose.cells.Workbook.a(Unknown Source)
at com.aspose.cells.Workbook.(Unknown Source)
at TestAspose.main(TestAspose.java:19)

LINE19 =>Workbook workbook = new Workbook(path + “sample.html”, options);

can we convert html having nested table tags into csv ?
If yes can you share sample code or any reference .

TIA

@nvn16,
You may load your complex HTML file into MS Excel and check if it can convert it to CSV as per your desire. If yes, please share your Sample Html file with us. We will try to provide you alternate code in Aspose.Cells as it mimics the behavior of MS Excel.

Apparently no such information/example is available that can be used to read/write nested table from HTML, however we will investigate it further once your feedback is available.

hi ,
Please find attached html file and please help to convert to csv using Aspose cells .

Please share code .sample.zip (194.2 KB)

TIA

@nvn16,

There are some scripts in the file. I tried to open your file into IE. IE first restricted scripts in the file and finally shows blank page in it. Could you please tell which encoding type I should set to view the file properly into IE?

File Encoding - HTML document, Non-ISO extended-ASCII text, with CRLF line terminators

@nvn16,

How to set and which “Encoding” type to select to view the file into IE properly?

I do not set any Encoding . I just double click to open this file in Chrome . I dont see any error issue while opening file .

Can we have a call ? I can share my screen to show you .

Please let me know .

@nvn16,

Although IE browser does not display the file normally but Google chrome displays it ok. I have logged a ticket with an id “CELLSJAVA-43724” for your issue. We will look into it to figure it out soon.

Once we have an update on it, we will let you know.

We do not provide technical support via phone in free forum support. If we require more details, we will ask you here.

Any Update on this ?

@nvn16,
This issue is logged too recently and is in the queue for detailed analysis. It takes 3 to 5 days for normal issues and more time for complex ones. We will write back here once any update about your issue is ready for sharing.

Hello Team ,

Any update ?

@nvn16,
We are collecting information in this regard and will share our feedback soon.