JavaScript htmlspecialchars_decode
Convert special HTML entities back to characters
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 33 34 3536 37 38 39 4041 42 43 44 4546 47 48 49 5051 52 53 54 5556 57 58 59 6061 62 63 64 | function htmlspecialchars_decode (string, quote_style) { // Convert special HTML entities back to characters // // version: 1008.1718 // discuss at: http://phpjs.org/functions/htmlspecialchars_decode // + original by: Mirek Slugen // + improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + bugfixed by: Mateusz "loonquawl" Zalega // + input by: ReverseSyntax // + input by: Slawomir Kaniecki // + input by: Scott Cariss // + input by: Francois // + bugfixed by: Onno Marsman // + revised by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + bugfixed by: Brett Zamir (http://brett-zamir.me) // + input by: Ratheous // + input by: Mailfaker (http://www.weedem.fr/) // + reimplemented by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Brett Zamir (http://brett-zamir.me) // * example 1: htmlspecialchars_decode("<p>this -> "</p>", 'ENT_NOQUOTES'); // * returns 1: '<p>this -> "</p>' // * example 2: htmlspecialchars_decode("&quot;"); // * returns 2: '"' var optTemp = 0, i = 0, noquotes= false; if (typeof quote_style === 'undefined') { quote_style = 2; } string = string.toString().replace(/</g, '<').replace(/>/g, '>'); var OPTS = { 'ENT_NOQUOTES': 0, 'ENT_HTML_QUOTE_SINGLE' : 1, 'ENT_HTML_QUOTE_DOUBLE' : 2, 'ENT_COMPAT': 2, 'ENT_QUOTES': 3, 'ENT_IGNORE' : 4 }; if (quote_style === 0) { noquotes = true; } if (typeof quote_style !== 'number') { // Allow for a single string or an array of string flags quote_style = [].concat(quote_style); for (i=0; i < quote_style.length; i++) { // Resolve string input to bitwise e.g. 'PATHINFO_EXTENSION' becomes 4 if (OPTS[quote_style[i]] === 0) { noquotes = true; } else if (OPTS[quote_style[i]]) { optTemp = optTemp | OPTS[quote_style[i]]; } } quote_style = optTemp; } if (quote_style & OPTS.ENT_HTML_QUOTE_SINGLE) { string = string.replace(/�*39;/g, "'"); // PHP doesn't currently escape if more than one 0, but it should // string = string.replace(/'|�*27;/g, "'"); // This would also be useful here, but not a part of PHP } if (!noquotes) { string = string.replace(/"/g, '"'); } // Put this in last place to avoid escape being double-decoded string = string.replace(/&/g, '&'); return string; } |
Examples
» Example 1
Running
1 | htmlspecialchars_decode("<p>this -> "</p>", 'ENT_NOQUOTES'); |
Should return
1 | '<p>this -> "</p>' |
» Example 2
Running
1 | htmlspecialchars_decode("&quot;"); |
Should return
1 | '"' |
Dependencies
No dependencies, you can use this function standalone.
Open syntax issues
php.js uses JsLint to help us keep our code consistent and prevent some common bugs.
Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.
Authors
Thanks to the following developers, you get to have htmlspecialchars_decode goodness in JavaScript.
Very nice - I think I will use your modification as its much tidier.
Don't forget the 'g' attribute on the last pattern.
Sorry for the double comment, but now the code should be more readable
function htmlspecialchars_decode(input, quote_style) {
var c = {
'&': '&',
'<': '<',
'>': '>',
'"': '"',
''': '\''
};
return ('' + input).replace(
quote_style === 'ENT_QUOTES' ? /&|<|>|"|'/g :
quote_style === 'ENT_NOQUOTES' ? /&|<|>/g :
/&|<|>|"/,
function (a) {
return c[a];
}
);
}
@Jerry: very short and clean solution. I just managed to replace the 3 .replace calls with just one, by choosing the regexp with a conditional expression.
function htmlspecialchars_decode(input, quote_style){
var c = {
'&': '&',
'<': '<',
'>': '>',
'"': '"',
''': '\''
};
return ('' + input).replace(quote_style === 'ENT_QUOTES' ? /&|<|>|"|'/g : quote_style === 'ENT_NOQUOTES' ? /&|<|>/g : /&|<|>|"/, function(a){return c[a]; });
}
The performance of both solutions should be comparable.
And I added casting of the input to string.
Here is my simple implementation of htmlspecialchars_decode.
I use just one replace and I have not come across a situation where an html entity is double-decoded. Comments are welcome
function(a,b){
var c={
'&':'&',
'<':'<',
'>':'>',
'"':'"',
''':'\''
};
if(b==='ENT_QUOTES'){
return a.replace(/&|<|>|"|'/g,function(a){return c[a];});
}
else if(b==='ENT_NOQUOTES'){
return a.replace(/&|<|>/g,function(a){return c[a];});
}
else{
return a.replace(/&|<|>|"/g,function(a){return c[a];});
}
}
htmlspecialchars_decode function in PHP doesn't work recursive.
but this function is too recursive.
so "&#9787;" will not be converted by this function as "☻"
however, it will be converted as "☻"
on the other hand,
the function in php will convert it as "☻"
@Mailfaker: Thanks. I've completely redone the two htmlspecialchars functions in Git, also to handle flags and arguments: http://github.com/kvz/phpjs/commit/881de8748cf986d025ecfad5f448fbbb8ba7710e . Btw, using replace was much faster for me (and easier) than using split and join.
Hi everyone,
this code wasn't working for me. I have done some changes and now it runs.
The problem is that, for decoding, hash_map table must be read in descending order. Or simply, you can do so:
function htmlspecialchars_decode (string) {
tmp_str = string.toString();
tmp_str = tmp_str.split('"').join('"');
tmp_str = tmp_str.split('<').join('<');
tmp_str = tmp_str.split('>').join('>');
tmp_str = tmp_str.split('&').join('&');
return tmp_str;
}
@ Liviu Mirea: I added your example as a testcase, but I was unable to reproduce the problem.
What version & browser are you using?
I'm sorry but the messaging system seems to be messed up and I can't post my message. What I'm trying to say is that the above function is incorrect. If you try to decode "& amp; quot;" (remove spaces) it will output a double quotation mark instead of "& quot;" (remove spaces). Hope this message will be properly posted. :/
Erm, ignore my message below, the caracters are messed up.
Here:
htmlspecialchars_decode(' " ');
In PHP it returns:
"
The Javascript function above returns: "
Basically, it first decodes
"&"
to
"&"
, thus resulting
"""
. It further decodes the string to a double quotation mark when it shouldn't.
htmlspecialchars_decode(' &quot; ');
In PHP it returns: "
The Javascript function above returns: "
Basically, it first decodes "&" to "&", thus resulting """. Afterward, it decodes """ but it shouldn't.
There is a serious parse error in this function
[CODE="Javascript"]
string = string.replace(/&gt;/g '>');
[/CODE]
should be (added a comma):
[CODE="Javascript"]
string = string.replace(/&gt;/g, '>');
[/CODE]
There is an error in the htmlspecialchars_decode(),
There a single quote around the regex for all params values in replace() except for > the only one that works. this is in the php.min.js
[CODE="php"]
<?php
echo html_entity_decode("&#56;")."\n";
?>
[/CODE]
returns 8.
This behavior is not documented in the PHP manual though, do you know what table is used here?
@ Bob Palin: Thank you for noticing. It is possible to declare global constants in javascript, but that would increase the number of dependencies throughout this project.
We have deliberately chosen to implement this a bit different from the original PHP documentation to allow for more functions to be included separately.
The function description says that 'quote_style' is an int and list constants, in fact the argument is a string as shown in the code and example.
No problem :)
There's another bug in this function. First argument of called function string.replace() is a string object '/&amp;/g'. It won't work, unless it's a regular expression object (should be /&amp;/g - without the apostrophes).
Here's the correct code:
[CODE="Javascript"]
string = string.toString();
// Always encode
string = string.replace(/&amp;/g, '&');
string = string.replace(/&lt;/g, '<');
string = string.replace(/&gt;/g, '>');
// Encode depending on quote_style
if (quote_style == 'ENT_QUOTES') {
string = string.replace(/&quot;/g, '"');
string = string.replace(/&#039;/g, '\'');
} else if (quote_style != 'ENT_NOQUOTES') {
// All other cases (ENT_COMPAT, default, but not ENT_NOQUOTES)
string = string.replace(/&quot;/g, '"');
}
return string;
[/CODE]
This is explained here:
http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Global_Objects:String:replace
http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Reference:Objects:RegExp
Btw. Most people involved in php2js project have their full names in credits. So, my name's Mateusz Zalega. Just saying :)
Shouldn't it be
[CODE="Javascript"]
string = string.replace(/&/g, '&');
string = string.replace(/</g, '<');
string = string.replace(/>/g, '>');
[/CODE]
rather than
[CODE = "Javascript"]
string.replace('/&/g', '&');
string.replace('/</g', '<');
string.replace(/>/g, '>')
[/CODE]
?
Function (string object).replace() doesn't modify the string. It returns a new (replaced) string object.


Robert Sidlauskas
Jul 10th
<a href='http://filesharepoit.com'>Filesharepoint.com</a>