Home > JavaScript, jQuery, Regex > JavaScript: How to prevent execution of JavaScript within a html being added to the DOM

JavaScript: How to prevent execution of JavaScript within a html being added to the DOM

February 5th, 2013 Leave a comment Go to comments

    Getting pieces of html from an external web service, I display them “as is” on a page. To insert the htmls into the document I use handy jQuery methods: append() and html(). The only problem is the htmls contain inclusions of JavaScript that is executed whenever I add the htmls to the DOM (Document Object Model). Having looked for an acceptable solution to prevent untrusted scripts from running I found out that almost all solutions come to removing script-tags from html using the Regular Expressions. I tried a couple of regexes, and they really worked. However, every time I could find such a combination of script-tag, JavaScript inside it and wrapping html when the regexes either didn’t recognize a script block or cut out more than it was needed. And I don’t even mention that parsing an arbitrary HTML with the Regular Expressions is a bad idea in all respects :). Trying to find another solution, I recalled that if we replace the type=”text/javascript” of a script-tag with the type=”text/xml”, the JavaScript inside will not execute. This is quite known fact, and a few JavaScript libraries use this behavior for their purposes. However, two things impeded me to implement the direct replacing of one type with another: the type=”text/javascript” sub-string may encounter outside the <script>, somewhere in the content (like in this article 🙂 ); the script-tag may be without the type attribute at all, and, in this case, the code inside will be run as if the type=”text/javascript” is specified. Taking into account these conditions, I developed a few tricky regexes, that seemed to be fairly complicated and not reliable enough though. After all, this led me to a thought to examine what happens if the script-tag has two type attributes. Something like this:

<script type="text/xml" type="text/javascript">
    alert('Hello!');
</script>

All browsers I tested this in didn’t execute JavaScript. On the contrary, if I change the type attributes over, the script is executed. Only the first type attribute seems to be efficient. So, my resultant solution is based on the following assumptions:

  • JavaScript inside the <script type=”text/xml”> doesn’t execute;
  • JavaScript inside the <script type=”text/xml” type=”text/javascript”> doesn’t execute as well because only the first type is considered;

So, below is a very simple JavaScript function, which prevents scripts from running and seems to avoid most of drawbacks:

function preventJS(html) {
    return html.replace(/<script(?=(\s|>))/i, '<script type="text/xml" ');
}

The solution still relies on the Regular Expressions, but the regex itself is quite simple and unambiguous. The script-tags are preserved in the html, so you can treat them later in a manner you want.

Below is an example of use

<html>
  <head>
    <title>Prevent scripts from running</title>
    <script src="jquery-1.8.3.js" type="text/javascript"></script>
  </head>
  <body>
    <div id="content"></div>
    <script type="text/javascript">
      $(function () {
        var html1 = "<span>Hello 1<script type='text/javascript'>alert('Hello 1!');<\/script></span>"
        var html2 = "<span>Hello 2 &lt;script&gt; alert('Hi') &lt;/script&gt; <script   type='text/javascript'>alert('Hello 2!');<\/script></span>";
        var html3 = "<script src=\"someJs.js\" type='text/javascript'><\/script>";
        var html4 = "<script>alert('Hello 4!');<\/script>";
        var html5 = "<scriptsomeAttr >alert('Hello 5!');";

        $("#content").html(preventJS(html1));
        $("#content").append(preventJS(html2));
        $("#content").append(preventJS(html3));
        $("#content").append(preventJS(html4));
        $("#content").append(preventJS(html5));
      });

      function preventJS(html) {
        return html.replace(/<script(?=(\s|>))/i, '<script type="text/xml" ');
      }
    </script>        
  </body>
</html>

I tested this solution in Google Chrome v. 24.0.1312.56, FireFox v. 18.0.1 and Internet Explorer 9 v.9.0.8112.16421, and it works as directed. However, I still have doubts whether it’s applicable for all browsers and their versions. So, if you have tested it, please don’t hesitate to post a comment here with the result, browser’s name and version you use.

For test purposes you can download a html page and a couple of js-files here. If the preventing works correctly, having opened the page in a browser, you shouldn’t get any alerts.

Related posts:
 
Categories: JavaScript, jQuery, Regex Tags: , ,
  1. No comments yet.
  1. No trackbacks yet.