Different formats- different output

Another problem, the pdf created with Aspose is very different from all other formats you see in the attachments. We build our PDF document in the following sequence: Create HTML to show on the web page -> insert into doc -> save as pdf. Also attached a pdf created with word 2007.

Hi

Thanks for your request. Strange thing, but I cannot open the PDF document you have attached. But when I convert your HTML to PDF on my side, the produced PDF works without any issues.
Regarding the formatting problem in PDF document. This occurs because there are merged cells in your HTML document. This is the issue #7739 in our defect database. As a workaround, you can try removing content from merged cells as shown in the following code:

Document doc = new Document(@"Test138\Form+for+check.htm", LoadFormat.Html, "");
// Remove content from merged cells
// Get collection of cells in the docuemnt
NodeCollection cells = doc.GetChildNodes(NodeType.Cell, true);
foreach(Cell cell in cells)
{
    // Check whether cell is merged with previouse
    if (cell.CellFormat.HorizontalMerge == CellMerge.Previous ||
        cell.CellFormat.VerticalMerge == CellMerge.Previous)
    {
        // Remove content from the cell
        cell.RemoveAllChildren();
    }
}
doc.SaveToPdf(@"Test138\out.pdf");

Hope this helps.
Best regards.

Will this code remove the cell formatting used (background etc.)? Will this fix be available within the next 2-3 months? I’ll wait in this case

Hi

Thank you for your inquiry. No, this code will not change anything in your document. formatting of cells will not be lost. This code just remove content from merged cells (anyway, you do not see this content when open the document in MS Word). Unfortunately, I cannot give you any estimate regarding the issue. You will be notified as soon as it is resolved.
Best regards.

No difference in my pdf document, here the code I use (maybe not right?):

With doc
    .SaveOptions.PdfExportImagesFolder = System.Configuration.ConfigurationManager.AppSettings("PathImages")
    Dim cells As NodeCollection = .GetChildNodes(NodeType.Cell, True)

    For Each cell As Aspose.Words.Tables.Cell In cells
        If cell.CellFormat.HorizontalMerge = Aspose.Words.Tables.CellMerge.Previous And cell.CellFormat.VerticalMerge = Aspose.Words.Tables.CellMerge.Previous Then
            cell.RemoveAllChildren()
        End If
    Next
    .Save("Form.pdf", SaveFormat.Pdf, SaveType.OpenInBrowser, Response)
End With
Response.End()

Hi

Thanks for your request.

  1. You do not need to specify PdfExportImagesFolder when you use SaveToPdf method or SaveFormat.Pdf to save your document as PDF. This option is needed only if you use legacy method to convert documents to PDF (Aspose.Words+Aspose.Pdf).
  2. Please attach your output document and explain a bit, what you mean under “difference in my pdf document”. Note, it is extremely hard to produce PDFs that looks exactly same as HTML.

Best regards.

  1. Ok, thanks
  2. I understand that it is hard to get an exact match but certainly you understand that the row height is not acceptable for the texts you see with the grey background in my first attachment. This height is many times heigher than the original one making the whole document 2-3 times longer.

Hi

Thank you for additional information. Could you please attach your current output document? After executing code I suggested, the output document looks fine on my side. It is only four pages long as expected.
Best regards.

I do get the error in the PDF. I wonder if you are testing the whole sequence: If you open the Word document then you already have lost one conversion in the sequence. What I do is the following:

doc = New Document("mytemplate.doc", LoadFormat.Doc, Nothing)
Dim builder As DocumentBuilder = New DocumentBuilder(doc)

'Now we have the template document with a bookmark 'Form'
If docBuilder.MoveToBookmark("Form", True, True) Then

    'Insert the HTML string (see attachment [Form for check.txt](https://forum.aspose.com/t/92242))
    docBuilder.InsertHtml(tstr)

    'We remove all hyperlinks from the document:
    removeHyperlinkFormatting(doc)
    'We insert pictures to the document if they exist (not in this test case!)
    insertFormPictures(doc, formNumber)
    'We replace some other user dependent texts (not in this test case!)
    'The code used is doc.Range.Replace("%R%", vCopy(HttpContext.Current.Session("xxxxxxx").ToString, ""), False, False)
    .replaceVariables(doc)

    'Now save the pdf:
    With doc
        .SaveOptions.PdfExportImagesFolder = System.Configuration.ConfigurationManager.AppSettings("PathImages")
        Dim cells As NodeCollection = .GetChildNodes(NodeType.Cell, True)

        For Each cell As Aspose.Words.Tables.Cell In cells
            If cell.CellFormat.HorizontalMerge = Aspose.Words.Tables.CellMerge.Previous And cell.CellFormat.VerticalMerge = Aspose.Words.Tables.CellMerge.Previous Then
                cell.RemoveAllChildren()
            End If
        Next
        .Save("Form.pdf", SaveFormat.Pdf, SaveType.OpenInBrowser, Response)
    End With
    Response.End()

The code does not work properly because you have incorrectly translated my code to VB. Please try using the following code snippet:

'Now save the pdf:
With doc
    Dim cells As NodeCollection = .GetChildNodes(NodeType.Cell, True)

    For Each cell As Aspose.Words.Tables.Cell In cells
        If cell.CellFormat.HorizontalMerge = Aspose.Words.Tables.CellMerge.Previous Or cell.CellFormat.VerticalMerge = Aspose.Words.Tables.CellMerge.Previous Then
            cell.RemoveAllChildren()
        End If
    Next
    .Save("C:\Temp\Form.pdf")
End With

Best regards.

Works! I’m happy… hopefully for a long time forward

The issues you have found earlier (filed as 7739) have been fixed in this update.