Hello,
I am attempting to stop long HTML tables from repeating headers if the spill to many pages.
This is successful for both HTML and PDF output, as with the code below.
However, when the document is very big (700+ pages in my examples), it works fine for word output, but the headers are still repeating only in PDF output for large documents.
The table itself does not span many pages.
It seems like a bug because it succeeds everywhere except large documents, so it does not seem like a configuration.
Can you help?
To disable header repetition
public static void DisableTableHeaderRepetition(this Document document)
{
var tables = document.GetChildNodes(NodeType.Table, true);
foreach (var table in tables.Cast<Table>())
{
foreach (var row in table.Rows.Cast<Row>())
{
row.RowFormat.HeadingFormat = false;
}
}
}
For saving, using the same document basically to save as:
if (saveDocx)
{
using var docxStream = documentResult.Document.SaveToStream(SaveFormat.Docx);
var fileName = $"{fileNameWithoutExtension}{FileExtensions.DOCX}";
await assetFileService.UploadFileAsync(docxStream, documentVersionId, trainingPackageCode, fileName, fileUploadOptions, cancellationToken);
filesSaved.Add(fileName);
}
if (savePdf)
{
using var pdfStream = documentResult.Document.SaveToStream(SaveFormat.Pdf);
var fileName = $"{fileNameWithoutExtension}{FileExtensions.PDF}";
await assetFileService.UploadFileAsync(pdfStream, documentVersionId, trainingPackageCode, fileName, fileUploadOptions, cancellationToken);
filesSaved.Add(fileName);
}
@wadekeenan
Would you please share your sample HTML in zip format with us so that we can test the scenario in our environment and address it accordingly?
@wadekeenan Could you please attach your problematic input and output documents here for testing? We will check the issue and provide you more information.
Thanks @asad.ali and @alexey.noskov, files attached.
Downloads.zip (6.2 MB)
@wadekeenan Thank you for additional information. Unfortunately, the problem is not reproducible on my side using the following simple code:
Document doc = new Document(@"C:\Temp\in.html");
foreach (Table table in doc.GetChildNodes(NodeType.Table, true))
{
foreach (Row row in table.Rows)
row.RowFormat.HeadingFormat = false;
}
doc.Save(@"C:\Temp\out.docx");
doc.Save(@"C:\Temp\out.pdf");
out.docx (992.2 KB)
out.pdf (1.5 MB)
If possible, please create a simple console application that will allow us to reproduce the problem on our side.
Thanks @alexey.noskov. While creating the sample, it turns out that a workaround of saving first as html then saving as pdf after calling the FixTableFormatting() method did the trick for not repeating headers in the pdf. However, images are missing from the html file and consequently from the files saved from it. Are the options used initially to save to html wrong?
using var htmlStream = documentResult.Document.SaveToStream(SaveFormat.Html);
var myDoc = new Aspose.Words.Document(htmlStream, new Aspose.Words.Loading.HtmlLoadOptions());
myDoc.FixTableFormatting();
using var newHtmlStream = myDoc.SaveToStream(SaveFormat.Html);
var htmlFileName = $"{fileNameWithoutExtension}{FileExtensions.HTML}";
await assetFileService.UploadFileAsync(newHtmlStream, documentVersionId, trainingPackageCode, htmlFileName, fileUploadOptions, cancellationToken);
filesSaved.Add(htmlFileName);
htmlStream.Position = 0;
if (saveDocx)
{
using var docxStream = myDoc.SaveToStream(SaveFormat.Docx);
var fileName = $"{fileNameWithoutExtension}{FileExtensions.DOCX}";
await assetFileService.UploadFileAsync(docxStream, documentVersionId, trainingPackageCode, fileName, fileUploadOptions, cancellationToken);
filesSaved.Add(fileName);
}
if (savePdf)
{
using var pdfStream = myDoc.SaveToStream(SaveFormat.Pdf);
var fileName = $"{fileNameWithoutExtension}{FileExtensions.PDF}";
await assetFileService.UploadFileAsync(pdfStream, documentVersionId, trainingPackageCode, fileName, fileUploadOptions, cancellationToken);
filesSaved.Add(fileName);
}
public static void FixTableFormatting(this Document document)
{
var tables = document.GetChildNodes(NodeType.Table, true);
foreach (var table in tables.Cast<Table>())
{
table.PreferredWidth = PreferredWidth.FromPercent(100);
table.LeftIndent = 0;
foreach (var row in table.Rows.Cast<Row>())
{
row.RowFormat.HeadingFormat = false;
}
// Clear cell borders and set padding, ensuring no cell borders are reapplied
foreach (var cell in table.GetChildNodes(NodeType.Cell, true).Cast<Cell>())
{
cell.CellFormat.Borders.ClearFormatting(); // Remove individual cell borders
cell.CellFormat.Borders.LineStyle = LineStyle.None; // Explicitly disable cell borders
cell.CellFormat.LeftPadding = 5;
cell.CellFormat.RightPadding = 5;
cell.CellFormat.TopPadding = 5;
cell.CellFormat.BottomPadding = 5;
}
// Clear existing table borders and apply new border style
table.SetBorders(LineStyle.Single, 1, System.Drawing.Color.Red); // Correct usage of SetBorders method
}
}
public static Stream SaveToStream(this Document document, SaveFormat saveFormat)
{
var stream = new MemoryStream();
SaveToStream(document, stream, saveFormat);
return stream;
}
public static void SaveToStream(this Document document, Stream stream, SaveFormat saveFormat)
{
if (saveFormat is SaveFormat.Html)
{
var htmlSaveOptions = new HtmlSaveOptions
{
ExportImagesAsBase64 = true,
};
document.Save(stream, htmlSaveOptions);
}
else if (saveFormat is SaveFormat.Pdf)
{
var options = new PdfSaveOptions
{
EmbedFullFonts = true,
};
document.Save(stream, options);
}
else
{
document.Save(stream, saveFormat);
}
if (stream.CanSeek)
{
stream.Seek(0, SeekOrigin.Begin);
}
}
@wadekeenan The code looks correct and images should be preserved in HTML stream. Could you please try saving the output HTML stream to file and attach it here? Have you tried saving the output as DOCX first? Does it contain the images?
@alexey.noskov here’s a zip with the html format and the html to docx/pdf as per above. I’ve also attached a version that was straight to docx where the image is retained.
html docx pdf.zip (1.8 MB)
RRR44551_R1.docx (32.2 KB)