We are currently using Aspose.words 15.8.0 jar to extract data from an html file using Java. Upon checking with a sample document, we found that when a table in html doc contains only chinese characters, cell width value returned is improper in aspose API. Similary when I try the same with english content, cells width are fine & it equals total table width, where as for Chinese text cell width exceeds way above actual table width specified.
Code used to achieve this is given below:
import java.io.InputStream;
import com.aspose.words.Cell;
import com.aspose.words.CellCollection;
import com.aspose.words.Document;
import com.aspose.words.LoadOptions;
import com.aspose.words.NodeList;
import com.aspose.words.RowCollection;
import com.aspose.words.Table;
public class AsposeChineseCellWidthIssue {
public static void main(String[] args){
try {
//Read document from path & convert it to stream
LoadOptions loadOptions = new LoadOptions();
Document doc = new Document(documentStream, loadOptions);
NodeList tableNodes = doc.selectNodes("//Table");//No I18N
for(Table table : tableNodes){
RowCollection tableRows = table.getRows();
for(int i=0; i<tableRows.getCount(); i++){
CellCollection cellCols = tableRows.get(i).getCells();
double totalCellWidth = 0;
for(int j=0; j < cellCols.getCount(); j++){
Cell curCell = (Cell) cellCols.get(j);
double curCellWidth = curCell.getCellFormat().getWidth();
totalCellWidth += curCellWidth;
}
System.out.println(“Total cell width is::” + totalCellWidth + " pts But actual table width is::" + table.getPreferredWidth().getValue() + " pts");
}
}
}catch(Exception e) {
}
}
}
FYI, documents attached.files.zip (2.3 KB)
@Anbu2Win In your code you compare PreferredWidth of the table and sum of cell widths. Explicitly specified width in HTML style="width:2in"
is imported as PreferredWidth, so instead of using curCell.getCellFormat().getWidth()
you should use curCell.getCellFormat().getPreferredWidth().getValue()
.
You can learn more about ways of specifying table width in Aspose.Words Documentation.
Upon checking with getPreferredWidth sum of cell width & table width differs slightly. What is the reason ?
Cell width when rounded off atleast comes to be around 6.5"(actual width in html) in whereas table width is slightly greater why is that so ?
@Anbu2Win The reason is because Aspose.Words considers padding and border width in cell width computing algorithm during import HTML tables.
So, as per hierarchy, first perference is for preferred width which denotes actual specified cell width & in case if it’s not specified/auto, then we can use getCellFormat().getWidth() API to obtain auto rendered cell width based on content. Is this correct ?
@Anbu2Win CellFormat.getWidth() is calculated on document load, but its value might be not accurate. Please see API Reference for more information.
Since Aspose doesn’t have API to get GridColumns, I am trying to get Columns count based on number of unique cell widths in Table.
But in case of html document, generally only table width will be specified & not cell width. Upon checking getPreferredWidth for such documents, cell Width is being returned as 0.
Also, consider below case where first cell in the column has 2in width & 2nd cell in the same column as auto width, in this case 2nd cell Preferred Width will be zero. So, column count will be 2 based on unique cell counts(where actually it is one).
So, Preferred Width will still cause issue. Any fix for this on how to get CellWidth in case of Auto, (or) if it’s not mentioned, etc… ?
@Anbu2Win Auto cell with is calculated on the fly, for example, while building document layout. You can try using LayoutEnumerator class to get the calculated with of cells in the tables. See LayoutEntityType.CELL.
But you should note that if the table in your document is rendered improperly to PDF, for example, LayoutEnumerator will also return incorrect values for cell width.
My queries are:
So using LayoutEnumerator to get CellWidth is similar to using cell.getCellFormat().getWidth() right ? Both will return rendered width of cell on document load right ?
So, Preferred width will not suitable to get Cell Width in case if Width is not specified (or) Auto right ? Only option is to go width with LayoutEnumerator (or) cell.getCellFormat().getWidth() API right ?
Requirement:
Is there anyway to get Grid Columns & their Widths for Table ?
@Anbu2Win
No, Unfortunately, there is no way to achieve this.
Not actually, cell.getCellFormat().getWidth()
is calculated on load and value returned by LayoutEnumerator
is calculated by Aspose.Words layout engine, which uses different algorithm. In some cases the values might be the same.
Yes, preferred width is optional and if it is omitted, width of the cell must be calculated.
There were several attempts to improve the algorithms that calculates width of cells in the table (table grid). One of the attempts is Document.updateTableLayout()
method, but it not always works properly so it is not normally recommended to use it. The most accurate algorithm at the moment is used in Aspose.Words layout engine, i.e. the values returned by LayoutEnumerator
. Table grid calculation is extremally complex task and we continuously work on improving our grid calculation algorithm.