For PDF conversion to HTML - Is the HTML XSS safe?

Hello,

We use Aspose.PDF for Java, to convert PDF documents to HTML.
Can we assume that the resulting HTML is safe from cross-site scripting attacks?

The reason I ask is because our code currently runs HTML docs (which could have been uploaded by a user) through an html ‘sanitizer’ to protect against cross-site scripting attacks. For example, the sanitizer removes any <script> content from the HTML docs. But for HTML docs that were created by Aspose from PDF files, the sanitizer totally messes those up, because all of the styles also get removed by the sanitizer, so the HTML file looks terrible and nothing like the original PDF.

We are thinking that sanitization to protect against cross-site scripting attack shouldn’t be needed for an HTML doc that was generated from PDF by Aspose.

Is that true?
(I.E., We don’t need to sanitize an HTML doc that was generated from PDF by Aspose)

Or, could cross-site scripting attacks be present in the HTML created by Aspose, from PDF?

Thanks in advance for your answer.
Kind regards,
Becky Mc

The second paragraph, second sentence of my question apparently got sanitized before being posted on your forum!

It should read as (remove the occurrences of ‘=’):
For example, the sanitizer removes any <=S=C=R=I=P=T=> content from the HTML docs.

@beckymc

Thank you for contacting support.

We have logged a ticket with ID PDFNET-46111 for thorough investigations into your concerns. We will let you know as soon as any significant update will be available in this regard.

Thank you, Farhan!

Please note, we are using PDF Aspose for Java (not NET).

Thanks.

@beckymc

Thank you for the information.

We have recorded your comments and will let you know as soon as any significant update will be available in this regard.

Hello –

I’m interested in knowing whether or not the whole HTML document generated by Aspose (PDF for Java) has been cross-site-scripting-sanitized – but in the meantime, if it makes it easier:

 Can someone please indicate if the styles, at least, in the generated PDF-to-HTML document are XSS-safe?  IE, everything between the <STYLE> and </STYLE>  tags?  Is that part of it XSS-safe?

(And whatever holds true for conversion by Aspose.PDF for Java, regarding HTML XSS-sanitization : is the same true for Aspose.WORD for Java?)

Thanks very much for your time!
Kind regards,
Becky

@beckymc

We have recorded your concerns and will let you know as soon as some significant updates will be available regarding Aspose.PDF for Java and Aspose.Words for Java. Please be patient and spare us little time.

@beckymc,

We will Investigate if Aspose.Words for Java generated HTML is XSS safe or not. We have logged your requirement/problem in our issue tracking system. Your ticket number is WORDSNET-18295. We will further look into the details of this requirement/problem and will keep you updated on the status of the linked issue.

Thank you, Awais.

Please note, our top priority is to know if Aspose.PDF-for-JAVA generates safe HTML when converting PDF to HTML using Java - and at least: is the internal CSS embedded within the HTML doc safe from cross-site scripting attack?

Thank you!

@beckymc

We understand your concerns, it is being investigated in detail and we will share our findings with you soon.

That’s great, thank you Farhan!

Hello -

Is there any update or information on this?
(At least for this question: does Aspose.PDF-for-JAVA provide safe CSS in the header element, upon converting from PDF to HTML?)

It’s an urgent issue, for us.
We greatly appreciate any insight you can give us!

Many thanks in advance –
Becky

@beckymc

The investigations are in progress and we will let you know as soon as any update will be available. Please be patient and spare us little more time.

OK, will do, Farhan.
I’m sure you’re all busy.

Thanks,
Becky

@beckymc

We appreciate your patience and comprehension in this regard.

Hello -

It has been 5 weeks that we have waited for an answer.

Can someone please tell us if Aspose.PDF-for-JAVA generates an HTML document that is safe from cross-site scripting, when Aspose.PDF-for-JAVA converts PDF docs to HTML?

Thank you.
Becky Mc

@beckymc

We are afraid that the ticket logged for Aspose.PDF API, PDFNET-46111, has not been investigated yet. We have recorded your concerns and will try to update you as soon as we can. Please spare us little more time. We are thankful to you for your patience.

@beckymc

We are pleased to inform you that the HTML files generated by Aspose.PDF for .NET as well as Aspose.PDF for Java API are XSS safe because Aspose.PDF API creates static HTML pages. When converting PDF to HTML, special characters are replaced with escaped ones, so the resulting HTML is XSS safe.

Farhan and team,

Thank you very much!
For doing the analysis and providing the answer we were hoping for.
We really appreciate your confirmation on this.

We would also like to know if the same is true for PDF WORDs for Java, when it is convenient, but it can be a lower priority.

Thanks again for your time and help.
Kind regards,
Becky

@beckymc

Thank you for your kind feedback and patience.

We will let you know once any information will be available regarding Aspose.Words API.