Use PHP functions in JavaScript

JavaScript html_entity_decode

Convert all HTML entities to their applicable characters

1
2
3
4
56
7
8
9
1011
12
13
14
1516
17
18
19
2021
22
23
24
2526
27
28
29
3031
32
33
34
3536
37
38
39
4041
function html_entity_decode (string, quote_style) {
    // Convert all HTML entities to their applicable characters  
    // 
    // version: 1008.1718
    // discuss at: http://phpjs.org/functions/html_entity_decode    // +   original by: john (http://www.jd-tech.net)
    // +      input by: ger
    // +   improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
    // +    revised by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
    // +   bugfixed by: Onno Marsman    // +   improved by: marc andreu
    // +    revised by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
    // +      input by: Ratheous
    // +   bugfixed by: Brett Zamir (http://brett-zamir.me)
    // +      input by: Nick Kolosov (http://sammy.ru)    // +   bugfixed by: Fox
    // -    depends on: get_html_translation_table
    // *     example 1: html_entity_decode('Kevin & van Zonneveld');
    // *     returns 1: 'Kevin & van Zonneveld'
    // *     example 2: html_entity_decode('<');    // *     returns 2: '<'
    var hash_map = {}, symbol = '', tmp_str = '', entity = '';
    tmp_str = string.toString();
    
    if (false === (hash_map = this.get_html_translation_table('HTML_ENTITIES', quote_style))) {        return false;
    }
 
    // fix & problem
    // http://phpjs.org/functions/get_html_translation_table:416#comment_97660    delete(hash_map['&']);
    hash_map['&'] = '&';
 
    for (symbol in hash_map) {
        entity = hash_map[symbol];        tmp_str = tmp_str.split(entity).join(symbol);
    }
    tmp_str = tmp_str.split(''').join("'");
    
    return tmp_str;}
external links: original PHP docs | raw js source

Examples

» Example 1

Running

1
html_entity_decode('Kevin & van Zonneveld');

Should return

1
'Kevin & van Zonneveld'

» Example 2

Running

1
html_entity_decode('<');

Should return

1
'<'

Dependencies

In order to use this function, you also need:

Open syntax issues

php.js uses JsLint to help us keep our code consistent and prevent some common bugs.

Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.


Authors

Thanks to the following developers, you get to have html_entity_decode goodness in JavaScript.

Comments

Add Comment
Use:
[CODE]
your_stuff('here');
[/CODE]
for proper code formatting
By submitting code here you are allowing us to use it in php.js hence dual licensing it under the MIT and GPL licenses

Gravatar
pedro
20 Aug '09 Permalink

q  thank you very much for this function is what I needed to resolve my problem. thkx

Gravatar
Kevin van Zonneveld
18 Jun '09 Permalink

q  @ Brett Zamir: YEah I already have:

class DATABASE_CONFIG {
var $default = array(
'driver' => 'mysql',
'....',
'encoding' => 'utf8',
);

in my cake datasource which should execute that statement ever time. I'm kind of puzzled what else I need to make utf8 aware to avoid these question marks..

Gravatar
Brett Zamir
10 Jun '09 Permalink

q  Nope, still not working, as indicated by my test characters...

Gravatar
Brett Zamir (test: ????? )
10 Jun '09 Permalink

q  @Kevin, do you have the "SET NAMES 'UTF8'" going too? (trying a few characters out) ?????

Gravatar
Kevin van Zonneveld
10 Jun '09 Permalink

q  @ Brett Zamir: Good job man! I'm thinking the only place left that could screw us with unicode is mysql. I've changed the table collation to utf8_unicode_ci. Let's see if things improve.

Gravatar
Brett Zamir
4 Jun '09 Permalink

q  Hello ?ukasz (Kevin, a Unicode bug?--otherwise, I can't credit this person for "input by"),

I did modify get_html_translation_table() to keep the order of what PHP returns for that function (and as a result removed the hack within this and other functions for adding & at the end). One catch is that although get_html_translation_table() returns ', the functions we use like htmlspecialchars, return '. But we cannot modify get_html_translation_table() to add ' since that histogram (correctly) is keyed with an apostrophe leading necessarily to only one value (').

So, we have to modify the functions to work with ' as well (which is not a problem really since this is the only numeric character reference in the list (' is XML-only, so it couldn't be used)).

So, I've fixed htmlspecialchars_decode() and html_entity_decode() to work with both ' and ' and also "fixed" htmlspecialchars() and htmlentities() to use ' for output as they do in PHP (without modifying get_html_translation_table() which uses ').

I think that should address all the issues.

Gravatar
?ukasz Czerwi?ski
3 Jun '09 Permalink

q  I have noticed that ' is decoded by html_entity_decode() as ' (apostrophe), but ' isn't!!! (of course when using 'ENT_QUOTES') The same problem is with htmlspecialchars_decode(). I have checked that in PHP decodes both ' and ' I tried to find the code in PHP sources, but they seems to be veery complicated. I have only found a structure that stores several entities - those decoded by htmlspecialchars_decode:
php-5.2.9.tar.bz2/ext/standard/html.c, lines 454-466

static const struct {
	unsigned short charcode;
	char *entity;
	int entitylen;
	int flags;
} basic_entities[] = {
	{ '"',	""",	6,	ENT_HTML_QUOTE_DOUBLE },
	{ '\'',	"'",	6,	ENT_HTML_QUOTE_SINGLE },
	{ '\'',	"'",	5,	ENT_HTML_QUOTE_SINGLE },
	{ '<',	"&lt;",		4,	0 },
	{ '>',	"&gt;",		4,	0 },
	{ 0, NULL, 0, 0 }
};



As you can see, both ' and ' are listed.
In case of JS code of these two functions (in fact I think we should modify get_html_transition_table), the modification is quite complicated...

Gravatar
Kevin van Zonneveld
31 Dec '08 Permalink

q  @ Azriel Fasten: Yes but that would also make it harder for people to just copy 1 function:
http://trac.plutonia.nl/projects/phpjs/wiki/DeveloperGuidelines#DependencyvsRedundancy

The less dependencies the better, but of course we are not about to duplicate the histogram from get_html_translation_table 4 times, so dependencies are already made in this function family.

I think we should probably first come up with the fastest str_replace as possible. And base our decision (Dependency vs Redundancy) on the final algorithm used.

Gravatar
Azriel Fasten
30 Dec '08 Permalink

q  I think that perhaps the replace should be relegated to str_replace, and that function should be highly optimized. Many other parts of the library all use different ways of replacing. These should all use str_replace.

Gravatar
Kevin van Zonneveld
30 Dec '08 Permalink

q  @ Azriel Fasten: You reported a bug by mail, that is exactly the same as the real PHP encountered at one point: http://bugs.php.net/bug.php?id=25707

I've read the bug report more thorough, and applied the same fix as was proposed there.
I put the &amp; entity at the bottom of the histogram.

Faster ways to replace (without using regex) can still be explored.

Gravatar
Kevin van Zonneveld
20 Oct '08 Permalink

q  @ marc andreu: I've revised all of the functions like get_html_translation_table, htmlentities &amp; htmlspecialchars and their decoding counterparts, they now also support your second argument. Thank you!

Gravatar
marc andreu
15 Oct '08 Permalink

q  Hi I needed to deal with secodn parameter of html_entity_decode() funcion, and I added it as follows. I hope to be right, however it's a suggestion. That's all folks.

// {{{ html_entity_decode
function html_entity_decode(string, quote_style ) {
// Convert all HTML entities to their applicable characters
//
// + discuss at: http://kevin.vanzonneveld.net/techblog/article/javascript_equivalent_for_phps_html_entity_decode/
// + version: 810.621
// + original by: john (http://www.jd-tech.net)
// + input by: ger
// + improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
// + revised by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
// + bugfixed by: Onno Marsman
// % note: table from http://www.the-art-of-web.com/html/character-codes/
// * example 1: html_entity_decode('Kevin &amp;amp; van Zonneveld');
// * returns 1: 'Kevin &amp; van Zonneveld'

var histogram = {}, histogram_r = {}, code = 0;
var entity = chr = '';

histogram['34'] = 'quot';
histogram['38'] = 'amp';
histogram['60'] = 'lt';
histogram['62'] = 'gt';
histogram['160'] = 'nbsp';
histogram['161'] = 'iexcl';
histogram['162'] = 'cent';
histogram['163'] = 'pound';
histogram['164'] = 'curren';
histogram['165'] = 'yen';
histogram['166'] = 'brvbar';
histogram['167'] = 'sect';
histogram['168'] = 'uml';
histogram['169'] = 'copy';
histogram['170'] = 'ordf';
histogram['171'] = 'laquo';
histogram['172'] = 'not';
histogram['173'] = 'shy';
histogram['174'] = 'reg';
histogram['175'] = 'macr';
histogram['176'] = 'deg';
histogram['177'] = 'plusmn';
histogram['178'] = 'sup2';
histogram['179'] = 'sup3';
histogram['180'] = 'acute';
histogram['181'] = 'micro';
histogram['182'] = 'para';
histogram['183'] = 'middot';
histogram['184'] = 'cedil';
histogram['185'] = 'sup1';
histogram['186'] = 'ordm';
histogram['187'] = 'raquo';
histogram['188'] = 'frac14';
histogram['189'] = 'frac12';
histogram['190'] = 'frac34';
histogram['191'] = 'iquest';
histogram['192'] = 'Agrave';
histogram['193'] = 'Aacute';
histogram['194'] = 'Acirc';
histogram['195'] = 'Atilde';
histogram['196'] = 'Auml';
histogram['197'] = 'Aring';
histogram['198'] = 'AElig';
histogram['199'] = 'Ccedil';
histogram['200'] = 'Egrave';
histogram['201'] = 'Eacute';
histogram['202'] = 'Ecirc';
histogram['203'] = 'Euml';
histogram['204'] = 'Igrave';
histogram['205'] = 'Iacute';
histogram['206'] = 'Icirc';
histogram['207'] = 'Iuml';
histogram['208'] = 'ETH';
histogram['209'] = 'Ntilde';
histogram['210'] = 'Ograve';
histogram['211'] = 'Oacute';
histogram['212'] = 'Ocirc';
histogram['213'] = 'Otilde';
histogram['214'] = 'Ouml';
histogram['215'] = 'times';
histogram['216'] = 'Oslash';
histogram['217'] = 'Ugrave';
histogram['218'] = 'Uacute';
histogram['219'] = 'Ucirc';
histogram['220'] = 'Uuml';
histogram['221'] = 'Yacute';
histogram['222'] = 'THORN';
histogram['223'] = 'szlig';
histogram['224'] = 'agrave';
histogram['225'] = 'aacute';
histogram['226'] = 'acirc';
histogram['227'] = 'atilde';
histogram['228'] = 'auml';
histogram['229'] = 'aring';
histogram['230'] = 'aelig';
histogram['231'] = 'ccedil';
histogram['232'] = 'egrave';
histogram['233'] = 'eacute';
histogram['234'] = 'ecirc';
histogram['235'] = 'euml';
histogram['236'] = 'igrave';
histogram['237'] = 'iacute';
histogram['238'] = 'icirc';
histogram['239'] = 'iuml';
histogram['240'] = 'eth';
histogram['241'] = 'ntilde';
histogram['242'] = 'ograve';
histogram['243'] = 'oacute';
histogram['244'] = 'ocirc';
histogram['245'] = 'otilde';
histogram['246'] = 'ouml';
histogram['247'] = 'divide';
histogram['248'] = 'oslash';
histogram['249'] = 'ugrave';
histogram['250'] = 'uacute';
histogram['251'] = 'ucirc';
histogram['252'] = 'uuml';
histogram['253'] = 'yacute';
histogram['254'] = 'thorn';
histogram['255'] = 'yuml';

// Reverse table. Cause for maintainability purposes, the histogram is
// identical to the one in htmlentities.
for (code in histogram) {
entity = histogram[code];
histogram_r[entity] = code;
}

var retTemp = (string+'').replace(/(\&amp;([a-zA-Z]+)\;)/g, function(full, m1, m2){
if (m2 in histogram_r) {
return String.fromCharCode(histogram_r[m2]);
} else {
return m2;
}
});

//Add for Marc Andreu Fernadnez. To decode quotes.
// Encode depending on quote_style
if (quote_style == 'ENT_QUOTES') {
retTemp = retTemp.replace('&amp;quot;','&quot;');
retTemp = retTemp.replace('&amp;#039;',&quot;'&quot;);
} else if (quote_style != 'ENT_NOQUOTES') {
// All other cases (ENT_COMPAT, default, but not ENT_NOQUOTES)
retTemp = retTemp.replace('&amp;quot;','&quot;');
}

return retTemp;
}// }}}

Gravatar
rekcor
23 Jun '08 Permalink

q  Thanks for the code!

But shouldn't you destroy

tarea



(otherwise we will end up with n numbers of textareas floating around in the DOM's hyperspace)

Gravatar
Kevin van Zonneveld
20 Mar '08 Permalink

q  @lubber: You sure did! And as I said, as soon as php.js supports optional components, I will include them. Thanks again!

Gravatar
lubber
20 Mar '08 Permalink

q  @Kevin: i use these functions to shrink my GET-Parameters in cases where POST wasnt possible (imagine an img-tag which will generate a custom picture and the parameters will exceed the 2048 url-chars limit on IE (that was the case for me)) Anyway, i just wanted to contribute my 2cent for this project :)

Gravatar
Kevin van Zonneveld
19 Mar '08 Permalink

q  @ lubber: Wow that is some awesome code and I will definitely save the links. However, the 2 functions are probably rarely used in JavaScript. That hasn't stopped me before, but in this case the 2 functions alone (72kB) will increase the total project size by 52%. That's a bit to much for now.

However, when php.js gets a page for component customization, I will include the functions and just leave them unchecked by default. Sounds good?

Gravatar
lubber
19 Mar '08 Permalink

q  you can find the javascript equivalents for gz_inflate and gz_deflate here

http://www.onicos.com/staff/iz/amuse/javascript/expert/inflate.txt
http://www.onicos.com/staff/iz/amuse/javascript/expert/deflate.txt

Gravatar
john
18 Mar '08 Permalink

q  ha, sry about that!

Gravatar
Kevin van Zonneveld
15 Mar '08 Permalink

q  @ ger: Aha that was ugly. Thanks for helping us!

Gravatar
ger
15 Mar '08 Permalink

q  heh... I almost sure a can see some js code after the return...; in the function source listed at this page.


Contribute a New function