Html to Excel and PDF - Column width and row height not proper/ content chopped out after conversion

I am using Aspose.Cells 19.12 and Aspose.PDF 19.8, after conversion html to excel and pdf, columns width and row height being altered, and it doesn’t look good. Please refer below challenges I am facing.

Challenges facing :
Xls :

  1. Column width is being altered.
  2. Row height is suddenly changing for few rows.
  3. At some places borders of the table are not showing
  4. At some places losing content written in 2 lines in one cell.
  5. The whole content is being so wide, so unable get under the print area.
    Pdf :
  6. In Pdf content is being chopped out.
  7. At some places borders of the table are not showing
  8. The whole content is being so wide, not good for print.

I am using both the options i.e. Html to Xls and Html to Pdf, kindly refer attached html file for the same.

Note : Html looks good in browser but when converted to xls and pdf both files are not looking good.

FYI : I’m using .net framework 4.6.1
Request you to help here.

Thanks in advance…

aspose_sample_html.zip (4.0 KB)

@SutarM

Thanks for contacting support.

We were able to replicate the issue in our environment while converting HTML to PDF using Aspose.PDF for .NET 20.1. We have logged it as PDFNET-47649 in our issue tracking system for further investigation and will surely keep you posted with the status of its correction. Please spare us some time.

We are sorry for the inconvenience.

Following code was used for testing using Aspose.PDF:

StringBuilder htmlPage = new StringBuilder();
htmlPage.Append(File.ReadAllText(dataDir + "aspose_sample_html.html"));
byte[] bytes = Encoding.UTF8.GetBytes(htmlPage.ToString());
var streamHtml = new MemoryStream(bytes);
var objLoadOptions = new Aspose.Pdf.HtmlLoadOptions(dataDir);
// Set Page Margins
objLoadOptions.PageInfo.Margin = new MarginInfo(50, 50, 50, 50);
// You can also set Page Height/Widht
objLoadOptions.PageInfo.Height = PageSize.PageLetter.Width;
objLoadOptions.PageInfo.Width = PageSize.PageLetter.Height;
doc.Save(dataDir + "out20.1.pdf");

Concerning to Aspose.Cells, we are testing the scenario and will share our findings with you shortly.

Thanks for quick update on Aspose.PDF. and hope Aspose.Cells also will have an update soon.

@SutarM,

Regarding HTML to Excel by Aspose.Cells APIs, please try the following sample code, it may suit your needs:
e.g
Sample code:

HtmlLoadOptions opts = new HtmlLoadOptions(LoadFormat.Html);
            opts.AutoFitColsAndRows = true;
            Workbook workbook = new Workbook("e:\\test2\\aspose_sample_html.html", opts);
            string output = "e:\\test2\\out1.xlsx";
            workbook.Save(output); 

Even when you open the HTML file into Ms Excel manually, you will see some data in cells are cut or not visible properly. We added HtmlLoadOptions.AutoFitColsAndRows attribute to display the content fully and elegantly, so you may try it.

Hope, this helps a bit.

Hi Team,

Thank you for the help so far. :slight_smile:
I tried with your suggestions for Aspose.Cells, by commenting my code little bit i.e.below 2 statements,

options.AutoFitterOptions = new AutoFitterOptions
{
AutoFitMergedCellsType = AutoFitMergedCellsType.EachLine
};
options.SetPaperSize(PaperSizeType.PaperA4);

but still I can see the row heights are too much when there is more content in 2nd column(“INQUIT NOMINE” in html), also there are few lines where extra height is there, those are 6th and 7th rows, row no 10 and 13, row no 15 : last column heading is in 2 lines which is not showing in 2 lines (same applies to 2nd table below).

Below is the code snippet I’m using for the same(before suggestions)

string postData = await Request.Content.ReadAsStringAsync();
MemoryStream inputStream = new MemoryStream(Encoding.UTF8.GetBytes(postData));
Aspose.Cells.HtmlLoadOptions options = new Aspose.Cells.HtmlLoadOptions(Aspose.Cells.LoadFormat.Html);
options.AutoFitColsAndRows = true;
options.AutoFitterOptions = new AutoFitterOptions
{
AutoFitMergedCellsType = AutoFitMergedCellsType.EachLine
};
options.SetPaperSize(PaperSizeType.PaperA4);

Workbook workbook = new Workbook(inputStream, options);
workbook.Save(@“C:/abc.xls”, Aspose.Cells.SaveFormat.Excel97To2003);

Kindly help me…

@SutarM,

As we told you even MS Excel cannot accomplish the task accurately. When you use HtmlLoadOptions.AutoFitColsAndRows attribute, the row heights or cols widths might be extended a bit but you may live with it, this would be far better to get chopped or trimmed contents in the output. If you still think this is an issue, please provide the following:

  1. You output Excel file by Aspose.Cells.
  2. Your expected Excel file (you may use MS Excel to open the HTML file into MS Excel and save it as Web page (.html)).

PS. please zip the files prior attaching.

@Amjad_Sahi,

Please find attached zip for requested files, i.e. output excel file by Aspose.Cells and expected excel file.
Thanks in advance…

xlsOrg_Exp_files.zip (17.8 KB)

@SutarM,

Thanks for the sample files.

When I open your sample Html file into MS Excel and then save to XLSX file format, it is not good either, see the attached file here.
files1.zip (13.3 KB)

I think you have manually updated your desired (expected) file by extending the rows (e.g 19 - 31, etc.) to display the trimmed data/contents. In short, there is no automatic operation in MS Excel to display the contents fully and precisely in one go.

We recommend you to kindly try the following sample code, it will almost suit your needs. We might not mimic 100% your customized (manual) expected file output:
e.g
Sample code:

HtmlLoadOptions opts = new HtmlLoadOptions(LoadFormat.Html);
            opts.AutoFitColsAndRows = true;
            Workbook workbook = new Workbook("e:\\test2\\aspose_sample_html.html", opts);

            AutoFitterOptions options = new AutoFitterOptions();
            options.AutoFitMergedCellsType = AutoFitMergedCellsType.EachLine;
            options.AutoFitWrappedTextType = AutoFitWrappedTextType.Paragraph;
            workbook.Worksheets[0].AutoFitRows(options);
            string output = "e:\\test2\\out1.xlsx";
            workbook.Save(output);

@Amjad_Sahi,

Thank you for the help so far. :slight_smile:

And yes, the latest code snippet displays data much better.
It doesn’t trims/hides content in display, it displays all the contents.

Could you please update ticket status PDFNET-47649?

Again thanks team for the efforts you put in analysis.

@SutarM,

Good to know that the code segment works for your needs.

Regarding PDFNET-47649, we will update you soon.

@SutarM,

I like to inform that ticket with ID PDFNET-47649 has been added recently in our issue tracking system and as per company policy, the first priority for investigation is given to the Paid Support i.e. Enterprise and Priority Support. After that the issues from normal support forum are scheduled for investigation on first come first serve basis. I request for your patience and we will share good news with you soon.

Hi Team,

Any update on this?

@SutarM,

I am afraid, the issue is not resolved yet. There are some other priority tasks/issues (to be sorted out) on hands and your issue will be addressed later on.

Once we have an update on it, we will let you know.

Hi Team,

Did you get a chance to look at PDF issue?
Request you to update here.

@SutarM,

I like to inform this issue has been added recently in our issue tracking system and as per our company policy, the first priority for investigation is given to the Paid Support i.e. Enterprise and Priority Support on first come first serve basis. After that the issues from normal support forum are scheduled for investigation on first come first serve basis. I request for your patience and we will share good news with you soon.

@Adnan.Ahmad,

It’s been 1 month since I am taking follow up for this.
Could you please let me know the tentative time would be taken to resolve this?
So that I can plan accordingly.

@SutarM,

I like to inform that we are looking into this issue and we will share good news with you soon. I request for your patience.