JavaScript strip_tags
Strips HTML and PHP tags from a string
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 33 34 3536 37 38 39 4041 42 43 | function strip_tags (input, allowed) { // Strips HTML and PHP tags from a string // // version: 1109.2015 // discuss at: http://phpjs.org/functions/strip_tags // + original by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + improved by: Luke Godfrey // + input by: Pul // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + bugfixed by: Onno Marsman // + input by: Alex // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + input by: Marc Palau // + improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + input by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + bugfixed by: Eric Nagel // + input by: Bobby Drake // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + bugfixed by: Tomasz Wesolowski // + input by: Evertjan Garretsen // + revised by: Rafał Kukawski (http://blog.kukawski.pl/) // * example 1: strip_tags('<p>Kevin</p> <b>van</b> <i>Zonneveld</i>', '<i><b>'); // * returns 1: 'Kevin <b>van</b> <i>Zonneveld</i>' // * example 2: strip_tags('<p>Kevin <img src="someimage.png" onmouseover="someFunction()">van <i>Zonneveld</i></p>', '<p>'); // * returns 2: '<p>Kevin van Zonneveld</p>' // * example 3: strip_tags("<a href='http://kevin.vanzonneveld.net'>Kevin van Zonneveld</a>", "<a>"); // * returns 3: '<a href='http://kevin.vanzonneveld.net'>Kevin van Zonneveld</a>' // * example 4: strip_tags('1 < 5 5 > 1'); // * returns 4: '1 < 5 5 > 1' // * example 5: strip_tags('1 <br/> 1'); // * returns 5: '1 1' // * example 6: strip_tags('1 <br/> 1', '<br>'); // * returns 6: '1 1' // * example 7: strip_tags('1 <br/> 1', '<br><br/>'); // * returns 7: '1 <br/> 1' allowed = (((allowed || "") + "").toLowerCase().match(/<[a-z][a-z0-9]*>/g) || []).join(''); // making sure the allowed arg is a string containing only tags in lowercase (<a><b><c>) var tags = /<\/?([a-z][a-z0-9]*)\b[^>]*>/gi, commentsAndPhpTags = /<!--[\s\S]*?-->|<\?(?:php)?[\s\S]*?\?>/gi; return input.replace(commentsAndPhpTags, '').replace(tags, function ($0, $1) { return allowed.indexOf('<' + $1.toLowerCase() + '>') > -1 ? $0 : ''; }); } |
Examples
» Example 1
Running
1 | strip_tags('<p>Kevin</p> <b>van</b> <i>Zonneveld</i>', '<i><b>'); |
Should return
1 | 'Kevin <b>van</b> <i>Zonneveld</i>' |
» Example 2
Running
1 | strip_tags('<p>Kevin <img src="someimage.png" onmouseover="someFunction()">van <i>Zonneveld</i></p>', '<p>'); |
Should return
1 | '<p>Kevin van Zonneveld</p>' |
Dependencies
No dependencies, you can use this function standalone.
Open syntax issues
php.js uses JsLint to help us keep our code consistent and prevent some common bugs.
Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.
Authors
Thanks to the following developers, you get to have strip_tags goodness in JavaScript.
@ Chris: Sorry, if the comment system is letting you down here, could you try pasting to pastebin.org?
@ Evertjan Garretsen: Looks like the PHP version needs you to explicitly put br/ in the list of allowed tags
@ Rafał Kukawski: Sublime man. In fact your creation's so good that it's better than PHP's version. Have a look at example 6 and you will see that PHP (5.3.2) will require you to explicitly name br/ in the allow list.
I'm still including your version in php.js, though, as I don't think this will cause bad bugs for people (seems like if you're whitelisting br you intend to whitelist br/ as well) so we can fix it later on.
https://github.com/kvz/phpjs/commit/526ac02243899b12cd0929c0a25133304525c0e8
I discovered that when i allow br, this wil not allow the xhtml closed br like:
. Maybe the following line should be added?
if (i != 0) { i = html.toLowerCase().indexOf('<'+allowed_tag+'/>');}
I extended my previous solution with removing comments and php tags. May not be perfect, but should work for most cases
function strip_tags(input, allowed){
allowed = (((allowed || "") + "")
.toLowerCase()
.match(/<[a-z][a-z0-9]*>/g) || [])
.join(''); // making sure the allowed arg is a string containing only tags in lowercase (<a><b><c>)
var tags = /<\/?([a-z][a-z0-9]*)\b[^>]*>/gi,
commentsAndPhpTags = /<!--[\s\S]*?-->|<\?(?:php)?[\s\S]*?\?>/gi;
return input.replace(commentsAndPhpTags, '').replace(tags, function($0, $1){
return allowed.indexOf('<' + $1.toLowerCase() + '>') > -1 ? $0 : '';
});
}
Maybe sth like this?
function strip_tags(input, allowed){
allowed = (((allowed || "") + "")
.toLowerCase()
.match(/<[a-z][a-z0-9]*>/g) || [])
.join(''); // making sure the allowed arg is a string containing only tags in lowercase (<a><b><c>)
var reg = /(<\/?([a-z][a-z0-9]*)\b[^>]*>)/gi;
return input.replace(reg, function($0, $1, $2){
return allowed.indexOf('<' + $2.toLowerCase() + '>') > -1 ? $0 : '';
});
}
Hey,
I have a slight problem with html comments. See this example:
Ooops, das hätte nicht passieren dürfen!
Die angegebene Adresse ist mit Ihren Benutzerrechten nicht erreichbar.
JS result:
Ooops, das hätte nicht passieren dürfen! Die angegebene Adresse ist mit Ihren Benutzerrechten nicht erreichbar. Sekunden zur Startseite weitergeleitet...-->
See missing "Sie werden in" and additional "-->" in the JS result.
@ Brett Zamir: Ok provided an additional fix. After comment caches clear we should be able to review the results.
@Kevin: Thanks for the security fix, and sorry I'm too busy to look into it myself at the moment, but now the code snippets are showing less-than signs, etc. in entity form...
@ Tomasz Wesolowski: Very kind of you to provide the fix! I've added it to SVN along with the credits.
PS: oops indeed! fixed the comment issue
Oops, no HTML escaping in posts? Here's a cleaner repost:
---
That's some useful code. :)
Unfortunately it seems to fail on header tags h1..h7. I have probably fixed that by changing the line 42:
// Build allowes tags associative array
if (allowed_tags) {
allowed_array = allowed_tags.match(/([a-zA-Z]+)/gi);
}
to
allowed_array = allowed_tags.match(/([a-zA-Z0-9]+)/gi);
That's some useful code. :)
Unfortunately it seems to fail on header tags
... I have probably fixed that by changing the line 42:
// Build allowes tags associative array
if (allowed_tags) {
allowed_array = allowed_tags.match(/([a-zA-Z]+)/gi);
}
to
allowed_array = allowed_tags.match(/([a-zA-Z0-9]+)/gi);
// Build allowes tags associative array
if (allowed_tags) {
allowed_array = allowed_tags.match(/([a-zA-Z]+)/gi);
}
to
allowed_array = allowed_tags.match(/([a-zA-Z0-9]+)/gi);
@ Bobby Drake: Thanks for pointing that out. I fixed the bug and added your testcase to prevent future bugs. Thanks!
what does !! do here? validate? convert int to bool?
array unique is using this function internally, but array_unique is not working for me (it returns undefined), and I'm trying to figure out why.
Thanks for the function. I added:
[CODE="Javascript"]
var k = '', i = 0;
[/CODE]
in your variable declarations, as I was using k and i outside the function, which put things into a nasty loop. Hope this helps someone.
You have a great collection of PHP equivalent javascript functions. This is really helpful to develpers. Thanks for sharing.
@ Alex: I wasn't aware of this implementation. And, you're right: it is our objective to mimic php as much as reasonably possible. Thanks for sharing, I've updated the function and credited you accordingly.
It looks like there's a small difference in your JS implementation of strip_tags from PHP's implementation:
PHP declares multiple allowable tags like this: strip_tags('<p><b>text</b></p>', '<p><b>')
The JS version is like this:
strip_tags('<p><b>text</b></p>', '<p>,<b>')
Note the comma separation in the JS version between the allowable tags. It's not a big deal, but I thought I'd point it out, as it tripped me up for a while (and I thought you'd want to know since you're attempting to make these functions work syntactically the same as their PHP equivalents). Thanks!
@ Pul: Thank you for pointing that out. I've fixed the code and added your usage example so it will be tested in the future as well.
try
[CODE="Javascript"]
strip_tags("<a href='index.html'>test</a>", "<a>");
[/CODE]
please fix.. :P
The strip_tags() function appears to be broken in IE7. Upon detecting an opening tag, it completely removes ALL output. The same behavior appears on the test page on this site. It appears that in IE, the match() function returns a copy of the input string and a couple other extraneous values on a successful match, causing the entire string to be replaced by the first matched key (the original input).
To fix, I added this ugly piece of work inside the key loop:
[CODE="Javascript"]
if (key == '0' || Number(key.toString()))
{
// replacement
}
[/CODE]


أخبار أقتصاد و أعمال
Apr 4th