How to output HTML but prevent XSS attacks

I wrote a php script to fetch the email content.

These contents are HTML format.

I’d like to display the content, as below

<?php 
$email_content = '
    <html>
        <script>alert("XSS");</script>
        <body>
            <div>Line1</div>
            <div>Line2</div>
        </body>
    </html>
';
echo $email_content;
?>

As you can see, it will cause XSS attacks. But if I use htmlspecialchars function, it will not show the correct HTML format, how should I do in this case? Thanks.

Here is Solutions:

We have many solutions to this problem, But we recommend you to use the first solution because it is tested & true solution that will 100% work for you.

Solution 1

HTMLPurifer can do that:

require_once '/path/to/HTMLPurifier.auto.php';

$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);

It takes dirty HTML (ie possibly containing Javascript) and removes any script.

PHP doesn’t have anything native or built in that can remove Javacript like HTMLPurifier. You could use DOMDocument but this would be a lengthy task because Javascript can execute in some attributes (onerror, onclick) and is not just limited to <script></script>.

Solution 2

You should use strip_tags() function and allow only tags that you want user to add.

echo strip_tags($text, '<p><a>');

This line allows <p> and <a> tags every other tag will be removed.

htmlspecialchars() works totally different.

From manual:

The translations performed are:

 '&' (ampersand) becomes '&amp;'
 '"' (double quote) becomes '&quot;' when ENT_NOQUOTES is not set.
 "'" (single quote) becomes '&#039;' (or &apos;) only when ENT_QUOTES is set.
 '<' (less than) becomes '&lt;'
 '>' (greater than) becomes '&gt;'

There is very nice article about XSS prevention and CSRF prenvetion read it.

Note: Use and implement solution 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply