JavaScript htmlentities
Convert all applicable characters to HTML entities
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 33 34 3536 37 38 39 4041 42 43 44 4546 47 48 49 5051 52 | function htmlentities (string, quote_style, charset, double_encode) { // Convert all applicable characters to HTML entities // // version: 1109.2015 // discuss at: http://phpjs.org/functions/htmlentities // + original by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + revised by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + improved by: nobbler // + tweaked by: Jack // + bugfixed by: Onno Marsman // + revised by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + bugfixed by: Brett Zamir (http://brett-zamir.me) // + input by: Ratheous // + improved by: Rafał Kukawski (http://blog.kukawski.pl) // + improved by: Dj (http://phpjs.org/functions/htmlentities:425#comment_134018) // - depends on: get_html_translation_table // * example 1: htmlentities('Kevin & van Zonneveld'); // * returns 1: 'Kevin & van Zonneveld' // * example 2: htmlentities("foo'bar","ENT_QUOTES"); // * returns 2: 'foo'bar' var hash_map = this.get_html_translation_table('HTML_ENTITIES', quote_style), symbol = ''; string = string == null ? '' : string + ''; if (!hash_map) { return false; } if (quote_style && quote_style === 'ENT_QUOTES') { hash_map["'"] = '''; } if (!!double_encode || double_encode == null) { for (symbol in hash_map) { if (hash_map.hasOwnProperty(symbol)) { string = string.split(symbol).join(hash_map[symbol]); } } } else { string = string.replace(/([\s\S]*?)(&(?:#\d+|#x[\da-f]+|[a-zA-Z][\da-z]*);|$)/g, function (ignore, text, entity) { for (symbol in hash_map) { if (hash_map.hasOwnProperty(symbol)) { text = text.split(symbol).join(hash_map[symbol]); } } return text + entity; }); } return string; } |
Examples
» Example 1
Running
1 | htmlentities('Kevin & van Zonneveld'); |
Should return
1 | 'Kevin & van Zonneveld' |
» Example 2
Running
1 | htmlentities("foo'bar","ENT_QUOTES"); |
Should return
1 | 'foo'bar' |
Dependencies
In order to use this function, you also need:
Open syntax issues
php.js uses JsLint to help us keep our code consistent and prevent some common bugs.
Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.
Authors
Thanks to the following developers, you get to have htmlentities goodness in JavaScript.
@Komal: try using new get_html_translation_table and htmlentities (depends on the first one) functions from git https://github.com/kvz/phpjs/commit/f9a42874e652d096245797c155f65a25a667b528
Hi,
I am using ur code in one of my functionality but when I use "extend" word then its shwoing me some code as output instead of "extend"
e.g.,
var str = htmlentities("test extend","ENT_QUOTES");
// Output in am getting is
test function(object) { return Object.extend.apply(this, [this, object]); }
It means "extend" word is replaced by "function(object) { return Object.extend.apply(this, [this, object]); }"
Can someone please check on this.
Thanks in Advanced.
- Komal
@Dj: thanks for your feedback. Changed the function according to your suggestions. You can see the changes on github.
Note that the regex for double encode is not correct because it does not inglude html entities for uppercase characters, like Ñ
replace [a-z][\da-z] with [a-zA-Z][\da-z]
Here one optimized version.
Using recursion calling self.htmlentities() will cause to load the table again and check source values, which does not make sense because you are using the same table.
So instead of recursion, use a simple loop working in the same scope.
function htmlentities (string, quote_style, charset, double_encode) {
string = string !== undefined ? string + '' : '';
var hash_map = this.get_html_translation_table('HTML_ENTITIES', quote_style),
char;
if (hash_map === false) {
return false;
}
if (quote_style && quote_style === 'ENT_QUOTES') {
hash_map["'"] = ''';
}
if (!!double_encode || double_encode == null) {
for (char in hash_map) {
string = string.split(char).join(hash_map[char]);
}
return string;
} else {
return string.replace(/([\s\S]*?)(&(?:#\d+|#x[\da-f]+|[a-z][\da-z]*);|$)/g, function (ignore, text, entity) {
for (char in hash_map) {
text = text.split(char).join(hash_map[char]);
}
return text + entity;
});
}
}
Note that you have a bug.
hash_map["'"] = '''; should only be added when quote_style is ENT_QUOTES, otherwise the single quote will be allways converted independent of the quote style specified
Minified
function htmlentities (s, qS, cS, dE)
{
var h = {}, c = '', e = '', se=this;
s += '';
if (false === (h = se.get_html_translation_table('HTML_ENTITIES', qS)))
{
return false;
}
if (!!dE || dE == null)
{
h["'"] = ''';
for (c in h) s = s.split(c).join(h[c]);
}
else
{
s = s.replace(/([\s\S]*?)(&(?:#\d+|#x[\da-f]+|[a-z][\da-z]*);|$)/g, function (i,t,e) {
return se.htmlentities(t, qS, cS) + e;
});
}
return s;
}
@Aikar: Are you absolutely sure your fix is correct? Simple changing from .split() to .replace() won't work correctly, unless you pass a regular expression with 'g' flag to .replace. When doing str.replace('foo', 'bar'), JavaScript replaces only first occurrence of 'foo' (http://jsfiddle.net/8ydqr/). Creating a new instance of RegExp object for every character (and escaping the character if required) would also take longer time to execute. But yes, this functions needs some optimizations.
Grr I realize I messed up variable names when I was cutting out that code and it was essentially running nothing... But after fixing it it still got down to 9000ms which is an 80% performance gain, so it needs to be modified.
Follow up, changing tmp_str.split to tmp_str.replace(symbol, entity); sped it up by 22% (10s~)
Then I furthur fixed the entire thing and did it properly, so can someone who knows how to update these functions implement this:
https://gist.github.com/794497
Got my benchmark back down to where it was suppose to be with the above pasted versions:
>>> node benchmark.js
rendered 10000 times in 3200ms!
This code is extremely slow...
I did a benchmark before and after adding this to my strings
BEFORE:
>>> node benchmark.js
rendered 10000 times in 3192ms!
AFTER:
>>> node benchmark.js
rendered 10000 times in 42797ms!
Use with caution.
This code is extremely slow...
I did a benchmark before and after adding this to my strings
BEFORE:
>>> node benchmark.js
rendered 10000 times in 3192ms!
AFTER:
>>> node benchmark.js
rendered 10000 times in 42797ms!
Use with caution.
Hi, very useful function, however, it seems to be missing double_encode optional argument. htmlspecialchars has it. This argument is from PHP original functions.
I had an issue with current version of the htmlentities function in Chrome dev browser (but not in FF or IE 8):
when I run htmlentities('"') it doesn't return $quot; but " which is broken obviously. htmlspecialchars() returns " as expected.
PHP htmlentities('"') also returns "
IE or FF return what is expected. I don't know where the problem is, but it does look as if double encoding for amperand and quote takes place. I don't know why one browser interprets this code in a different way than others...
hi
I feel this function has a bug. It cannot preserve single quote ('), when used ENT_NOQUOTES.
So, I need to use strtr() for that:
var tmpVal = html_entity_decode (txtAreaVal);
txtArea.value = htmlentities(tmpVal, "ENT_NOQUOTES");
// restore single quote
$trans = {''' : "'" };
txtArea.value = strtr(txtArea.value, $trans);
Brett: i think its more for data coming FROM the server.
in my case its an email client and i want to send them html that might be in an email, but i want to convert it to source by encoding all the tags but allow a function to put it out as actual html if the user agrees to it. since i cant know what email IS and ISNT safe for them to view, and since its a browser based email its even more dangerous as it runs in the context of that page. i would just strip ALL html, but some i need (such as the html reports made by svnnotify, they just dont look the same when you strip the tags)
but you are correct, this should NOT be used for client-side sanitization. nothing from a client should be considered secure.
@vikal: 1) Why do you want to convert it to an entity? If you are trying to filter user input on the client-side, doing it this way is not a safe way to do it, since people can get around it. You should use your database's own escape mechanisms instead (e.g., mysql_real_escape_string for MySQL). 2) If you do really want the entity form, you can use \ or \ , but there is no need to escape it in HTML or XML like this since a backslash is not reserved there.
hi
though your function
htmlentities()
is good
but
now we are having problem with this symbol \
do you have any idea how to convert it to the html entities
is there any solution so that i can change
\ to htmlentities
hoping best here
regards
vikal
@vikal: Does that mean you figured out the problem with the function? If you are still having trouble, please give a precise example where you see the problem. Thanks...
hi
Really good work that you people accomplished.
so useful and i am happy to use it.
thanks
best regards
vikal acharya
hi
i have used your code to convert <!----> into html entities..
but it does not work neither it return what i need.
i have used your code as it descripted in examples
like this
function htmlentities (string) {
// Convert all applicable characters to HTML entities
//
// version: 907.503
// discuss at: http://phpjs.org/functions/htmlentities
// + original by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
// + revised by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
// + improved by: nobbler
// + tweaked by: Jack
// + bugfixed by: Onno Marsman
// + revised by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
// + bugfixed by: Brett Zamir (http://brett-zamir.me)
// + input by: Ratheous
// - depends on: get_html_translation_table
// * example 1: htmlentities('Kevin & van Zonneveld');
// * returns 1: 'Kevin & van Zonneveld'
// * example 2: htmlentities("foo'bar","ENT_QUOTES");
// * returns 2: 'foo'bar'
var hash_map = {}, symbol = '', tmp_str = '', entity = '';
tmp_str = string.toString();
if (false === (hash_map = this.get_html_translation_table('HTML_ENTITIES', 'ENT_COMPAT'))) {
return false;
}
for (symbol in hash_map) {
entity = hash_map[symbol];
tmp_str = tmp_str.split(symbol).join(entity);
}
return tmp_str;
}
including function get_html_translation_table() as it is
so would you mind telling how does it works
waiting for your response
best regards
vikal acharya
@ Bjorn Roesbeke: I've added your testcase, but it succeeds. Are you sure you're running the latest version?
F.e. a single quote with entity &#039; isn't converted correctly.
[CODE="Javascript"]htmlentities("foo'bar","ENT_QUOTES");[/CODE]
will return foo&amp;#039;
using 'var i' instead of only 'i' in the for loop could prevent from overwriting global 'i', even though no one should use it. But well, i did, and found another error on that way, so it kinda helpt me :)


Eu
20 Dec '11