PDF to HTML to PDF conversion results in different page margins

Hello Aspose Team,

I am working on a round-trip scenario where I convert a PDF to HTML, make a small edit in the HTML, and then convert it back to PDF.

I’ve attached the following files for reference:

Aspose.PDF issue conversion html.zip (1.5 MB)

  1. Original-PDF.pdf – the source file
  2. Original-HTML-After-Conversion-With-AsposePDF.html – HTML generated by Aspose.PDF
  3. Edited-HTML.html – the same HTML with a small style change
  4. Edited-PDF.pdf – the final PDF after converting the edited HTML back

Issue

The output PDF (Edited-PDF.pdf) shows margins that are noticeably different from the original. This happens even though I haven’t changed anything related to page size or margins in the HTML.

  • Is there a way to automatically preserve the original PDF’s PageInfo (size and margins) during round-trip conversion?
  • Are there any additional flags or workarounds to ensure the output PDF matches the original layout exactly?

Thank you for your help!

@gabriel.vega

To preserve the original PDF’s page size and margins during the round-trip conversion from PDF to HTML and back, you can use the following approaches:

  1. Set Page Size and Margins Manually: When converting the edited HTML back to PDF, ensure that you explicitly set the page size and margins to match those of the original PDF. You can do this using the PageInfo properties in Aspose.PDF.

    Here’s a sample code snippet:

    // Load the edited HTML
    Document pdfDocument = new Document("Edited-HTML.html");
    
    // Set the page size and margins to match the original PDF
    pdfDocument.PageInfo.Width = originalWidth; // Set original width
    pdfDocument.PageInfo.Height = originalHeight; // Set original height
    pdfDocument.PageInfo.Margin = new MarginInfo(originalLeftMargin, originalRightMargin, originalTopMargin, originalBottomMargin);
    
    // Save the document as PDF
    pdfDocument.Save("Edited-PDF.pdf");
    
  2. Use HTML Load Options: When converting HTML to PDF, you can use HtmlLoadOptions to control how the HTML is processed. This can help in maintaining the layout.

    HtmlLoadOptions options = new HtmlLoadOptions();
    options.PageSize = new Size(originalWidth, originalHeight);
    options.Margin = new MarginInfo(originalLeftMargin, originalRightMargin, originalTopMargin, originalBottomMargin);
    
    Document pdfDocument = new Document("Edited-HTML.html", options);
    pdfDocument.Save("Edited-PDF.pdf");
    
  3. Check CSS Styles: Ensure that the CSS styles in your edited HTML do not inadvertently alter the layout. Sometimes, styles can affect margins and padding, leading to discrepancies in the final PDF.

By following these steps, you should be able to achieve a more consistent output that closely matches the original PDF’s layout. If you continue to experience issues, consider reviewing the original PDF’s properties and ensuring that all relevant settings are applied during the conversion process.

Hi team,

Thank you for the suggestion. I tried using new Aspose.Pdf.HtmlLoadOptions(); but I can confirm that the properties PageSize and Margin do not exist in HtmlLoadOptions.

I also attempted the alternative solution of manually setting the page size and margins, but that did not work either.

Could you please check if there is another supported way to set these values when loading HTML to PDF?

Hi team,

I previously saved the original values of MarginTop, MarginBottom, MarginLeft, MarginRight, as well as Width and Height from the source PDF.

After editing the document with HTML, I regenerate the PDF with Aspose.PDF and reapply those values as follows:


var pdfSaveOptions = new PdfSaveOptions();

foreach (var page in Document.Pages)
{
    page.SetPageSize(config.PageDetails.Width, config.PageDetails.Height);

    page.PageInfo.Margin.Top = config.PageDetails.MarginTop;
    page.PageInfo.Margin.Bottom = config.PageDetails.MarginBottom;
    page.PageInfo.Margin.Left = config.PageDetails.MarginLeft;
    page.PageInfo.Margin.Right = config.PageDetails.MarginRight;
}

Document.Save(result, pdfSaveOptions);

With this approach, the page size and all margins look slightly different compared to the original file.

I’m attaching the original PDF, the generated HTML, and the output PDF for reference.

issue html to pdf.zip (373.4 KB)

My question is: what other information should I save from the original document to ensure that the margins remain exactly the same as in the source PDF?

Hi @gabriel.vega, thank you for your patience, and apologies for the delay. We’ll be looking into your issue, but please note that it might take a little time. You can expect a reply from us within 1 day.

@gabriel.vega thank you for your patience once again!

Here is an example how to make PDF-HTML-PDF conversion. It covers your test document, but it’s likely to have issue with more complex documents. Please let us know regarding your final goal, maybe we can propose better solution than PDF-HTML-PDF conversion:

internal static class ForumTask
{
    internal static void Solution01()
    {
        var document = new Document(@"Original-PDF.pdf");
        var saveOptions = new HtmlSaveOptions
        {
            SplitIntoPages = true,               
            FixedLayout = true
        };
        document.Save("test.html", saveOptions);

        DoSomeModifications();
        
        var editedDocument = new Document();
        editedDocument.PageInfo.Margin = new MarginInfo { Top = 0, Bottom = 0, Left = 0, Right = 0 };
        foreach (var file in Directory.GetFiles(".", "test*.html"))
        {
            var page = editedDocument.Pages.Add();
            var htmlFragment = new HtmlFragment(File.ReadAllText(file));
            page.Paragraphs.Add(htmlFragment);
        }

        editedDocument.Save("Recreated-PDF.pdf");
    }

    private static void DoSomeModifications()
    {
        var html = File.ReadAllText("test1.html");
        html = html.Replace("And more text.", "<strong>And more text.</strong>");
        File.WriteAllText("test1.html", html);
    }
}

Hi @andriy.andrukhovski

Thanks for your answer. I will check this solution and let you know the results after testing.

1 Like