JavaScript: How to prevent execution of JavaScript within a html being added to the DOM
Getting pieces of html from an external web service, I display them “as is” on a page. To insert the htmls into the document I use handy jQuery methods: append() and html(). The only problem is the htmls contain inclusions of JavaScript that is executed whenever I add the htmls to the DOM (Document Object Model). Having looked for an acceptable solution to prevent untrusted scripts from running I found out that almost all solutions come to removing script-tags from html using the Regular Expressions. I tried a couple of regexes, and they really worked. However, every time I could find such a combination of script-tag, JavaScript inside it and wrapping html when the regexes either didn’t recognize a script block or cut out more than it was needed. And I don’t even mention that parsing an arbitrary HTML with the Regular Expressions is a bad idea in all respects :). Trying to find another solution, I recalled that if we replace the type=”text/javascript” of a script-tag with the type=”text/xml”, the JavaScript inside will not execute. This is quite known fact, and a few JavaScript libraries use this behavior for their purposes. However, two things impeded me to implement the direct replacing of one type with another: the type=”text/javascript” sub-string may encounter outside the <script>, somewhere in the content (like in this article 🙂 ); the script-tag may be without the type attribute at all, and, in this case, the code inside will be run as if the type=”text/javascript” is specified. Taking into account these conditions, I developed a few tricky regexes, that seemed to be fairly complicated and not reliable enough though. After all, this led me to a thought to examine what happens if the script-tag has two type attributes. Something like this:
<script type="text/xml" type="text/javascript"> alert('Hello!'); </script>
All browsers I tested this in didn’t execute JavaScript. On the contrary, if I change the type attributes over, the script is executed. Only the first type attribute seems to be efficient. So, my resultant solution is based on the following assumptions:
- JavaScript inside the <script type=”text/xml”> doesn’t execute;
- JavaScript inside the <script type=”text/xml” type=”text/javascript”> doesn’t execute as well because only the first type is considered;
So, below is a very simple JavaScript function, which prevents scripts from running and seems to avoid most of drawbacks:
function preventJS(html) { return html.replace(/<script(?=(\s|>))/i, '<script type="text/xml" '); }
The solution still relies on the Regular Expressions, but the regex itself is quite simple and unambiguous. The script-tags are preserved in the html, so you can treat them later in a manner you want.
Below is an example of use
<html> <head> <title>Prevent scripts from running</title> <script src="jquery-1.8.3.js" type="text/javascript"></script> </head> <body> <div id="content"></div> <script type="text/javascript"> $(function () { var html1 = "<span>Hello 1<script type='text/javascript'>alert('Hello 1!');<\/script></span>" var html2 = "<span>Hello 2 <script> alert('Hi') </script> <script type='text/javascript'>alert('Hello 2!');<\/script></span>"; var html3 = "<script src=\"someJs.js\" type='text/javascript'><\/script>"; var html4 = "<script>alert('Hello 4!');<\/script>"; var html5 = "<scriptsomeAttr >alert('Hello 5!');"; $("#content").html(preventJS(html1)); $("#content").append(preventJS(html2)); $("#content").append(preventJS(html3)); $("#content").append(preventJS(html4)); $("#content").append(preventJS(html5)); }); function preventJS(html) { return html.replace(/<script(?=(\s|>))/i, '<script type="text/xml" '); } </script> </body> </html>
I tested this solution in Google Chrome v. 24.0.1312.56, FireFox v. 18.0.1 and Internet Explorer 9 v.9.0.8112.16421, and it works as directed. However, I still have doubts whether it’s applicable for all browsers and their versions. So, if you have tested it, please don’t hesitate to post a comment here with the result, browser’s name and version you use.
For test purposes you can download a html page and a couple of js-files here. If the preventing works correctly, having opened the page in a browser, you shouldn’t get any alerts.