Mixed content in final PDF when using HTML to PDF

jrb-ngs · July 11, 2017, 10:59pm

@jrb-ngs,
Old thread: ("[[BL]]https://forum.aspose.com/t/mixed-content-in-final-pdf-when-using-html-pdf/118137/1[[/BL]]

jr-ngs:

We use Aspose.PDF for.NET to generate PDF documents using HTML->PDF. We have a multi-tenant website (multiple companies/users) and on rare occasions we see resultant PDF documents containing wrong content - content from REPORT X found in REPORT B - even for completely different companies. Below is some pseudo-code of how we generate our PDFs:

class Report_A
{
void function_A()
{
Aspose.Pdf.Generator.Pdf pdf = new Aspose.Pdf.Generator.Pdf();
pdf.PageSetup.PageHeight = Aspose.Pdf.Kit.PageSize.A4.Height;
pdf.PageSetup.PageWidth = Aspose.Pdf.Kit.PageSize.A4.Width;
pdf.PageSetup.Margin.Top = 20;
pdf.PageSetup.Margin.Bottom = 20;
pdf.PageSetup.Margin.Left = 5;
pdf.PageSetup.Margin.Right = 5;

Aspose.Pdf.License pdfLicense = new Aspose.Pdf.License();
pdfLicense.SetLicense(/*path*/);

Aspose.Pdf.Generator.Section section = pdf.Sections.Add();
Aspose.Pdf.Generator.Image logoImage = new Aspose.Pdf.Generator.Image(section);
logoImage.ImageInfo.File = logoName;
logoImage.ImageInfo.ImageFileType = Aspose.Pdf.Generator.ImageFileType.Unknown;
logoImage.ImageInfo.FixWidth = 150;
logoImage.ImageInfo.FixHeight = 35;
section.Paragraphs.Add(logoImage);
logoImage.ImageInfo.Alignment = Aspose.Pdf.Generator.AlignmentType.Left;

string htmlContent = string.Empty;
//set htmlContent header

Aspose.Pdf.Generator.Text textPdfHeader = new Aspose.Pdf.Generator.Text(section, htmlContent);
textPdfHeader.IsHtmlTagSupported = true;
section.Paragraphs.Add(textPdfHeader);

htmlContent = string.Empty;
//reset html content body

Aspose.Pdf.Generator.Text textPdfBody = new Aspose.Pdf.Generator.Text(section, htmlContent);
textPdfBody.IsHtmlTagSupported = true;
section.Paragraphs.Add(textPdfBody);

htmlContent = string.Empty;
//reset html content footer

Aspose.Pdf.Generator.Text textPdfFooter = new Aspose.Pdf.Generator.Text(section, htmlContent);
textPdfFooter.IsHtmlTagSupported = true;
section.Paragraphs.Add(textPdfFooter);

string fileName = string.Empty;
//set filename
pdf.Save(fileName + ".pdf", SaveType.OpenInBrowser, Response);

}
}

class Report_B
{
void function_B()
{
MarginInfo marginInfo = new MarginInfo();
marginInfo.Top = margin;
marginInfo.Left = margin;
marginInfo.Right = margin;
marginInfo.Bottom = margin;

Pdf pdfDocument = new Pdf();
pdfDocument.Security = new Aspose.Pdf.Generator.Security();
pdfDocument.Security.IsFormFillingAllowed = false;
pdfDocument.IsLandscape = false;
pdfDocument.PageSetup.PageWidth = 790F;
pdfDocument.PageSetup.Margin = marginInfo;

Section section = pdfDocument.Sections.Add();
section.PageInfo.PageWidth = pageWidth;
section.PageInfo.Margin = marginInfo;

StringBuilder reportHTML = new StringBuilder();
Text pdfText = null;
		
reportHTML.Clear();
//add html to section		
pdfText = new Text(string.Format("{0}", reportHTML.ToString()));
pdfText.IsHtmlTagSupported = true;
pdfText.IsFirstParagraph = includePageBreakBetweenSections;
section.Paragraphs.Add(pdfText);

reportHTML.Clear();
//add html to section
pdfText = new Text(string.Format("{0}", reportHTML.ToString()));
pdfText.IsHtmlTagSupported = true;
pdfText.IsFirstParagraph = includePageBreakBetweenSections;
section.Paragraphs.Add(pdfText);

string exportedPdfName = string.Format("{0}_{1}{2}_Report",
            //customer name,
            DateTime.Now.ToString("MMddyyyy", CultureInfo.InvariantCulture),
            DateTime.Now.ToString("hhmmss", CultureInfo.InvariantCulture));

exportedPdfName = exportedPdfName.TrimStart('_').Replace(", ", "");

using (var streamMemory = new MemoryStream())
{
	Response.Clear();
	pdfDocument.Save(streamMemory);
	Response.ContentType = "application/pdf";
	Response.AddHeader("content-disposition", string.Format("attachment;filename={0}.pdf", exportedPdfName));
	Response.Buffer = true;
	var bytes = streamMemory.ToArray();
	Response.BinaryWrite(bytes);
	Response.End();
}

}
}

jr-ngs:

Thank you for your reply Imran.

Can you please tell me in what version the DOM approach was introduced to see if it is available in the versions we use?

We are not necessarily “converting” one file to the other. The code you see above is sanitized code from two areas of our website. As you can see, we are using Text sections, setting the ‘IsHtmlTagSupported’ value to true, and then directly writing the html (in memory) which then gets flushed out using variations of Pdf.Save(). There is no ‘source’ document/file being read. The html is being dynamically generated by us and in memory in a StringBuffer class.

Attached is a screenshot of a single PDF file with mixed content (report B content somehow embedded in report A) , but please let me know if you need more info. I don’t know if it is possible but if you need to review more of the code, we could start a private conversation?.

image.png3 (90.5 KB)

Imran Rafique:

@jr-ngs,
Thank you for the inquiry. We start to incorporate the new DOM (Document Object Model) approach from years and documenting each new feature, including API changes in the release notes of Aspose.Pdf for .NET API.

Using HtmlFragment class (available in 9.5.0 or higher versions), you can insert the HTML string in the PDF document and do not need to follow the old legacy approach of setting IsHtmlTagSupported and maintaining templates. In order to better understand the difference between the new DOM approach and old legacy approach, please refer to these help topics:

Introduction to the DOM API
Introduction to DOM (legacy)
jr-ngs:
Attached is a screenshot of a single PDF file with mixed content (report B content somehow embedded in report A) , but please let me know if you need more info. I don’t know if it is possible but if you need to review more of the code, we could start a private conversation?.
We need to track the complete details of the use case, including the source documents and code. Please cut off the additional information and share details of the use case with minimum code and source files, so that we could replicate this problem in our environment. In that way, we would be able to investigate and share our findings with you. You can edit your first post and mark this thread as private, then only you and Aspose staff will be able to view this thread.

Best Regards,

This Topic is created by imran.rafique using the Email to Topic plugin.

imran.rafique · July 11, 2017, 11:02pm

Hi Imran,

Sorry for the delayed response. Unfortunately, there was a typo in my email address when I signed up for the forums and my password does not work, so I have been unable to log back in to continue our conversation. I had to create a new account to be able to reply so I am unable to make this chat private. Are you able to help me recover the account or help me make this chat private somehow?

However, our web-forms based website is a large code-base (hundreds of megs in size without including shared DLLs), so it would be very difficult to share that. The code that I posted in the original thread was from 2 pages/areas of our website. The only thing I changes was class / function names and excluded the content that generates the HTML. For example, if you see the following

"reportHTML.Clear();

//add html to section"

then you could replace “//add html to section” with code that writes the html to the variable (based on values from a database).

For example (had to remove open tag (less than sign) as this editor was removing my content):

reportHTML.Append("!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN'+'[http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd](http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd)'");
reportHTML.Append("html xmlns='[http://www.w3.org/1999/xhtml](http://www.w3.org/1999/xhtml)'><head><title></title");
reportHTML.Append("/head><body");

As far as reproducing the issue, we run hundreds, if not thousands of these reports per day, and it happens very rarely (we get a report from our customers once every few days), so it is not something that can be reproduced on demand. The only common thing between the two reports/pages is the fact that they are using the PDF class. I was wondering if something was not thread-safe and if the right conditions were hit, 2 threads could end up incorrectly flushing their data to the same temporary location and causing the mixed content?

I know that you mentioned we are using an old mechanism to create our pdf documents. Do you think that re-writing the code to use the DOM approach would alleviate this problem? Thank you for your continued support. If there is anything I can do to help provide more info, please let me know. We are in serious need of fixing this issue.

Please note that it is quite difficult to investigate and fix an issue without replicating in our environment. We would suggest you please track the problematic use case, create a small application project, which reproduces this problem in your environment, and share its Zip file with us. It may be a bug in the old version of Aspose.Pdf for .NET API and do not provide fixes in the old code base.

Kindly let us know which Aspose.Pdf for .NET API version you are using, we will try to convert your code in the new DOM approach with the same old version.

Do you think that re-writing the code to use the DOM approach would alleviate this problem?

Yes, we think so because it is the latest way of manipulating the PDF documents. You can get a 30 day temporary license from the purchase portal. In this way, you can evaluate the latest version 17.7 of Aspose.Pdf for .NET API in your environment.

We have marked the old thread as private. The old thread with your incorrect email will only be accessible by the Aspose staff (if the email id is invalid).

Best Regards,

Imran Rafique