Hello Aspose Team,
I am working on a round-trip scenario where I convert a PDF to HTML, make a small edit in the HTML, and then convert it back to PDF.
I’ve attached the following files for reference:
Aspose.PDF issue conversion html.zip (1.5 MB)
Original-PDF.pdf
– the source file
Original-HTML-After-Conversion-With-AsposePDF.html
– HTML generated by Aspose.PDF
Edited-HTML.html
– the same HTML with a small style change
Edited-PDF.pdf
– the final PDF after converting the edited HTML back
Issue
The output PDF (Edited-PDF.pdf
) shows margins that are noticeably different from the original. This happens even though I haven’t changed anything related to page size or margins in the HTML.
- Is there a way to automatically preserve the original PDF’s PageInfo (size and margins) during round-trip conversion?
- Are there any additional flags or workarounds to ensure the output PDF matches the original layout exactly?
Thank you for your help!
@gabriel.vega
To preserve the original PDF’s page size and margins during the round-trip conversion from PDF to HTML and back, you can use the following approaches:
-
Set Page Size and Margins Manually: When converting the edited HTML back to PDF, ensure that you explicitly set the page size and margins to match those of the original PDF. You can do this using the PageInfo
properties in Aspose.PDF.
Here’s a sample code snippet:
// Load the edited HTML
Document pdfDocument = new Document("Edited-HTML.html");
// Set the page size and margins to match the original PDF
pdfDocument.PageInfo.Width = originalWidth; // Set original width
pdfDocument.PageInfo.Height = originalHeight; // Set original height
pdfDocument.PageInfo.Margin = new MarginInfo(originalLeftMargin, originalRightMargin, originalTopMargin, originalBottomMargin);
// Save the document as PDF
pdfDocument.Save("Edited-PDF.pdf");
-
Use HTML Load Options: When converting HTML to PDF, you can use HtmlLoadOptions
to control how the HTML is processed. This can help in maintaining the layout.
HtmlLoadOptions options = new HtmlLoadOptions();
options.PageSize = new Size(originalWidth, originalHeight);
options.Margin = new MarginInfo(originalLeftMargin, originalRightMargin, originalTopMargin, originalBottomMargin);
Document pdfDocument = new Document("Edited-HTML.html", options);
pdfDocument.Save("Edited-PDF.pdf");
-
Check CSS Styles: Ensure that the CSS styles in your edited HTML do not inadvertently alter the layout. Sometimes, styles can affect margins and padding, leading to discrepancies in the final PDF.
By following these steps, you should be able to achieve a more consistent output that closely matches the original PDF’s layout. If you continue to experience issues, consider reviewing the original PDF’s properties and ensuring that all relevant settings are applied during the conversion process.
Hi team,
Thank you for the suggestion. I tried using new Aspose.Pdf.HtmlLoadOptions();
but I can confirm that the properties PageSize
and Margin
do not exist in HtmlLoadOptions
.
I also attempted the alternative solution of manually setting the page size and margins, but that did not work either.
Could you please check if there is another supported way to set these values when loading HTML to PDF?
Hi team,
I previously saved the original values of MarginTop
, MarginBottom
, MarginLeft
, MarginRight
, as well as Width
and Height
from the source PDF.
After editing the document with HTML, I regenerate the PDF with Aspose.PDF and reapply those values as follows:
var pdfSaveOptions = new PdfSaveOptions();
foreach (var page in Document.Pages)
{
page.SetPageSize(config.PageDetails.Width, config.PageDetails.Height);
page.PageInfo.Margin.Top = config.PageDetails.MarginTop;
page.PageInfo.Margin.Bottom = config.PageDetails.MarginBottom;
page.PageInfo.Margin.Left = config.PageDetails.MarginLeft;
page.PageInfo.Margin.Right = config.PageDetails.MarginRight;
}
Document.Save(result, pdfSaveOptions);
With this approach, the page size and all margins look slightly different compared to the original file.
I’m attaching the original PDF, the generated HTML, and the output PDF for reference.
issue html to pdf.zip (373.4 KB)
My question is: what other information should I save from the original document to ensure that the margins remain exactly the same as in the source PDF?
Hi @gabriel.vega, thank you for your patience, and apologies for the delay. We’ll be looking into your issue, but please note that it might take a little time. You can expect a reply from us within 1 day.
@gabriel.vega thank you for your patience once again!
Here is an example how to make PDF-HTML-PDF conversion. It covers your test document, but it’s likely to have issue with more complex documents. Please let us know regarding your final goal, maybe we can propose better solution than PDF-HTML-PDF conversion:
internal static class ForumTask
{
internal static void Solution01()
{
var document = new Document(@"Original-PDF.pdf");
var saveOptions = new HtmlSaveOptions
{
SplitIntoPages = true,
FixedLayout = true
};
document.Save("test.html", saveOptions);
DoSomeModifications();
var editedDocument = new Document();
editedDocument.PageInfo.Margin = new MarginInfo { Top = 0, Bottom = 0, Left = 0, Right = 0 };
foreach (var file in Directory.GetFiles(".", "test*.html"))
{
var page = editedDocument.Pages.Add();
var htmlFragment = new HtmlFragment(File.ReadAllText(file));
page.Paragraphs.Add(htmlFragment);
}
editedDocument.Save("Recreated-PDF.pdf");
}
private static void DoSomeModifications()
{
var html = File.ReadAllText("test1.html");
html = html.Replace("And more text.", "<strong>And more text.</strong>");
File.WriteAllText("test1.html", html);
}
}
Hi @andriy.andrukhovski
Thanks for your answer. I will check this solution and let you know the results after testing.
1 Like