On Save as PDF, Header and Footer go to another page

igor.guerra · August 10, 2023, 5:56pm

When saving a document on a PDF format, the footer is breaking onto another page. I’ll but the sample code below and he’s a description of what I’m doing:

Create 2 documents using Document Builder. Each document has a simple body, footer and header;
Merge 2 documents together;
Save them as Word or PDF.

When saving as Word, everything works. When saving as PDF, the footer problem happens.

Here’s the code:

[Theory]
[InlineData(DocumentFormat.Pdf, LoadFormat.Pdf)]
[InlineData(DocumentFormat.Docx, LoadFormat.Docx)]
public void When_SavingMergedDocuments_Should_KeepHeadersOnSamePage(DocumentFormat format, LoadFormat expectedFormat)
{
    // Arrange
    List<MemoryStream> documentList = new()
            {
                GetAsposeDocument(header: "First Document Header", footer: "First Document Footer"),
                GetAsposeDocument(header: "Second Document Header", footer: "Second Document Footer"),
            };
    var document = MergeDocuments(documentList);

    // Act
    var documentSaved = SaveDocument(document, format);

    // Assert
    Assert.NotNull(documentSaved);

    var documentAsAspose = new Aspose.Words.Document(documentSaved);

    Assert.NotNull(documentAsAspose);
    Assert.Equal(2, documentAsAspose.PageCount);
    Assert.Equal("First Document Header", documentAsAspose.ExtractPages(0, 1).FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary].GetText().Trim());
    Assert.Equal("First Document Footer", documentAsAspose.ExtractPages(0, 1).FirstSection.HeadersFooters[HeaderFooterType.FooterPrimary].GetText().Trim());
    Assert.Equal("Second Document Header", documentAsAspose.ExtractPages(1, 1).FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary].GetText().Trim());
    Assert.Equal("Second Document Footer", documentAsAspose.ExtractPages(1, 1).FirstSection.HeadersFooters[HeaderFooterType.FooterPrimary].GetText().Trim());
}

public static MemoryStream GetAsposeDocument(string header = null, string footer = null)
{
    AsposeExtensions.SetLicense();
    Aspose.Words.Document document = new();

    var identifier = Guid.NewGuid();
    DocumentBuilder builder = new(document);

    builder.Writeln("Static content");

    if (header is not null)
    {
        builder.MoveToHeaderFooter(HeaderFooterType.HeaderPrimary);
        builder.Write(header);
    }

    if (footer is not null)
    {
        builder.MoveToHeaderFooter(HeaderFooterType.FooterPrimary);
        builder.Write(footer);
    }

    MemoryStream ms = new();
    document.Save(ms, SaveFormat.Docx);
    return ms;
}

public Aspose.Words.Document MergeDocuments(List<MemoryStream> documents)
{
    List<MemoryStream> savedFiles = new();
    foreach (var document in documents)
    {
        MemoryStream ms = new();
        savedFiles.Add(ms);
        new Aspose.Words.Document(document).Save(ms, SaveFormat.Docx);
    }

    var doc = new Aspose.Words.Document(savedFiles.First());
    savedFiles.Skip(1).ToList().ForEach(document => doc.AppendDocument(new Aspose.Words.Document(document), ImportFormatMode.KeepSourceFormatting));

    // Restart page numbering of the new merged document
    doc.FirstSection.PageSetup.RestartPageNumbering = true;
    doc.FirstSection.PageSetup.PageStartingNumber = 1;

    return doc;
}

public MemoryStream SaveDocument(Aspose.Words.Document document, DocumentFormat documentFormat)
{
    MemoryStream ms = new();

    document.Save(ms, Enum.Parse<SaveFormat>(documentFormat.ToString(), true));
    ms.Seek(0, SeekOrigin.Begin);

    return ms;
}

Konstantin.Kornilov · August 10, 2023, 7:37pm

@igor.guerra
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-25786

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Konstantin.Kornilov · August 10, 2023, 7:41pm

@igor.guerra You should note that PDF is not a native format to MS Word and Aspose.Words. Import from PDF is a complex procedure and thus DOCX->PDF->DOCX roundtrip may introduce some formatting differences. Please check Convert PDF to Other Document Formats article for more info.

igor.guerra · August 10, 2023, 8:27pm

Hi Konstantin,

Thanks for getting back so quickly. As far as I know, I’m not doing the roundtrip. I’m creating 2 Word documents, merging them as Word Documents and saving them as PDF.

Also, do you think using Aspose.PDF library would solve this? Would you be able to test it or provide me the code so I can test on my side?

Thanks,

Igor

Konstantin.Kornilov · August 10, 2023, 8:51pm

@igor.guerra To be precise you do the roundtrip DOM->PDF->DOM before the asserting the document content which technically is the same as DOCX->PDF->DOCX. var documentAsAspose = new Aspose.Words.Document(documentSaved) line saves the document to PDF and var documentAsAspose = new Aspose.Words.Document(documentSaved) line loads it back to Aspose.Words DOM.
Aspose.PDF also provides the PDF->DOCX conversion functionality. You could try to use it as an alternative to Aspose.Words PDF->DOCX conversion. Please check the Convert PDF to Microsoft Word Documents in .NET article.

igor.guerra · August 14, 2023, 1:43pm

Hi @Konstantin.Kornilov,

Thanks for your answer but I still think we have the wrong idea here.

I’m creating two documents in memory as DOCX. Then I’m merging them together as DOCX. Then I save them as PDF.

There’s no PDF to DOCX in my sample. It’s a merging of DOCX and then saving the result as PDF.

Does that make sense?

Konstantin.Kornilov · August 14, 2023, 2:14pm

@igor.guerra Sorry for misunderstanding. I will try to explain my point more clearly. In your test there is a line var documentSaved = SaveDocument(document, format); The format is DocumentFormat.Pdf in the problematic test case. This methods returns memory stream with the PDF document. In the next line var documentAsAspose = new Aspose.Words.Document(documentSaved); the Document ctor accepts the memory stream with PDF document and loads it into the Aspose.Words DOM. At this stage the problem appears and headers in PDF document is not recognized correctly.

[Theory]
[InlineData(DocumentFormat.Pdf, LoadFormat.Pdf)]
[InlineData(DocumentFormat.Docx, LoadFormat.Docx)]
public void When_SavingMergedDocuments_Should_KeepHeadersOnSamePage(DocumentFormat format, LoadFormat expectedFormat)
{
    // Arrange
    List<MemoryStream> documentList = new()
        {
            GetAsposeDocument(header: "First Document Header", footer: "First Document Footer"),
            GetAsposeDocument(header: "Second Document Header", footer: "Second Document Footer"),
        };
    var document = MergeDocuments(documentList);

    // Act
    // !!!Conversion of document to PDF!!!
    var documentSaved = SaveDocument(document, format);

    // Assert
    Assert.NotNull(documentSaved);

    // !!!Conversion of PDF to the Aspose.Words DOM!!!
    var documentAsAspose = new Aspose.Words.Document(documentSaved);

    Assert.NotNull(documentAsAspose);
    Assert.Equal(2, documentAsAspose.PageCount);
    Assert.Equal("First Document Header", documentAsAspose.ExtractPages(0, 1).FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary].GetText().Trim());
    Assert.Equal("First Document Footer", documentAsAspose.ExtractPages(0, 1).FirstSection.HeadersFooters[HeaderFooterType.FooterPrimary].GetText().Trim());
    Assert.Equal("Second Document Header", documentAsAspose.ExtractPages(1, 1).FirstSection.HeadersFooters[HeaderFooterType.HeaderPrimary].GetText().Trim());
    Assert.Equal("Second Document Footer", documentAsAspose.ExtractPages(1, 1).FirstSection.HeadersFooters[HeaderFooterType.FooterPrimary].GetText().Trim());
}

igor.guerra · August 14, 2023, 2:30pm

OMG, you are absolutely right.

If I execute document.Save("C:\\TestResults\\before.pdf", SaveFormat.Pdf); before the SaveDocument method, the PDF generated works perfectly.

I’ll see if I can apply this solution to my code.

In the meantime, would it be possible to keep this ticket opened? If the PDf -> WORD worked, that would be extremely helpful.

Thanks a lot for your help and your patience.

Konstantin.Kornilov · August 14, 2023, 2:45pm

@igor.guerra Yes, sure. The ticket is opened. You will be notified here when the issue fix will be released.