JavaScript get_html_translation_table
Returns the internal translation table used by htmlspecialchars and htmlentities
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 33 34 3536 37 38 39 4041 42 43 44 4546 47 48 49 5051 52 53 54 5556 57 58 59 6061 62 63 64 6566 67 68 69 7071 72 73 74 7576 77 78 79 8081 82 83 84 8586 87 88 89 9091 92 93 94 9596 97 98 99 100101 102 103 104 105106 107 108 109 110111 112 113 114 115116 117 118 119 120121 122 123 124 125126 127 128 129 130131 132 133 134 135136 137 138 139 140141 142 143 144 145146 147 148 149 150151 152 153 154 155156 157 158 159 160161 | function get_html_translation_table (table, quote_style) { // Returns the internal translation table used by htmlspecialchars and htmlentities // // version: 909.322 // discuss at: http://phpjs.org/functions/get_html_translation_table // + original by: Philip Peterson // + revised by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + bugfixed by: noname // + bugfixed by: Alex // + bugfixed by: Marco // + bugfixed by: madipta // + improved by: KELAN // + improved by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Brett Zamir (http://brett-zamir.me) // + input by: Frank Forte // + bugfixed by: T.Wild // + input by: Ratheous // % note: It has been decided that we're not going to add global // % note: dependencies to php.js, meaning the constants are not // % note: real constants, but strings instead. Integers are also supported if someone // % note: chooses to create the constants themselves. // * example 1: get_html_translation_table('HTML_SPECIALCHARS'); // * returns 1: {'"': '"', '&': '&', '<': '<', '>': '>'} var entities = {}, hash_map = {}, decimal = 0, symbol = ''; var constMappingTable = {}, constMappingQuoteStyle = {}; var useTable = {}, useQuoteStyle = {}; // Translate arguments constMappingTable[0] = 'HTML_SPECIALCHARS'; constMappingTable[1] = 'HTML_ENTITIES'; constMappingQuoteStyle[0] = 'ENT_NOQUOTES'; constMappingQuoteStyle[2] = 'ENT_COMPAT'; constMappingQuoteStyle[3] = 'ENT_QUOTES'; useTable = !isNaN(table) ? constMappingTable[table] : table ? table.toUpperCase() : 'HTML_SPECIALCHARS'; useQuoteStyle = !isNaN(quote_style) ? constMappingQuoteStyle[quote_style] : quote_style ? quote_style.toUpperCase() : 'ENT_COMPAT'; if (useTable !== 'HTML_SPECIALCHARS' && useTable !== 'HTML_ENTITIES') { throw new Error("Table: "+useTable+' not supported'); // return false; } entities['38'] = '&'; if (useTable === 'HTML_ENTITIES') { entities['160'] = ' '; entities['161'] = '¡'; entities['162'] = '¢'; entities['163'] = '£'; entities['164'] = '¤'; entities['165'] = '¥'; entities['166'] = '¦'; entities['167'] = '§'; entities['168'] = '¨'; entities['169'] = '©'; entities['170'] = 'ª'; entities['171'] = '«'; entities['172'] = '¬'; entities['173'] = '­'; entities['174'] = '®'; entities['175'] = '¯'; entities['176'] = '°'; entities['177'] = '±'; entities['178'] = '²'; entities['179'] = '³'; entities['180'] = '´'; entities['181'] = 'µ'; entities['182'] = '¶'; entities['183'] = '·'; entities['184'] = '¸'; entities['185'] = '¹'; entities['186'] = 'º'; entities['187'] = '»'; entities['188'] = '¼'; entities['189'] = '½'; entities['190'] = '¾'; entities['191'] = '¿'; entities['192'] = 'À'; entities['193'] = 'Á'; entities['194'] = 'Â'; entities['195'] = 'Ã'; entities['196'] = 'Ä'; entities['197'] = 'Å'; entities['198'] = 'Æ'; entities['199'] = 'Ç'; entities['200'] = 'È'; entities['201'] = 'É'; entities['202'] = 'Ê'; entities['203'] = 'Ë'; entities['204'] = 'Ì'; entities['205'] = 'Í'; entities['206'] = 'Î'; entities['207'] = 'Ï'; entities['208'] = 'Ð'; entities['209'] = 'Ñ'; entities['210'] = 'Ò'; entities['211'] = 'Ó'; entities['212'] = 'Ô'; entities['213'] = 'Õ'; entities['214'] = 'Ö'; entities['215'] = '×'; entities['216'] = 'Ø'; entities['217'] = 'Ù'; entities['218'] = 'Ú'; entities['219'] = 'Û'; entities['220'] = 'Ü'; entities['221'] = 'Ý'; entities['222'] = 'Þ'; entities['223'] = 'ß'; entities['224'] = 'à'; entities['225'] = 'á'; entities['226'] = 'â'; entities['227'] = 'ã'; entities['228'] = 'ä'; entities['229'] = 'å'; entities['230'] = 'æ'; entities['231'] = 'ç'; entities['232'] = 'è'; entities['233'] = 'é'; entities['234'] = 'ê'; entities['235'] = 'ë'; entities['236'] = 'ì'; entities['237'] = 'í'; entities['238'] = 'î'; entities['239'] = 'ï'; entities['240'] = 'ð'; entities['241'] = 'ñ'; entities['242'] = 'ò'; entities['243'] = 'ó'; entities['244'] = 'ô'; entities['245'] = 'õ'; entities['246'] = 'ö'; entities['247'] = '÷'; entities['248'] = 'ø'; entities['249'] = 'ù'; entities['250'] = 'ú'; entities['251'] = 'û'; entities['252'] = 'ü'; entities['253'] = 'ý'; entities['254'] = 'þ'; entities['255'] = 'ÿ'; } if (useQuoteStyle !== 'ENT_NOQUOTES') { entities['34'] = '"'; } if (useQuoteStyle === 'ENT_QUOTES') { entities['39'] = '''; } entities['60'] = '<'; entities['62'] = '>'; // ascii decimals to real symbols for (decimal in entities) { symbol = String.fromCharCode(decimal); hash_map[symbol] = entities[decimal]; } return hash_map;} |
Examples
Running
1 | get_html_translation_table('HTML_SPECIALCHARS'); |
Should return
1 | {'"': '"', '&': '&', '<': '<', '>': '>'} |
Dependencies
No dependencies, you can use this function standalone.
Open syntax issues
php.js uses JsLint to help us keep our code consistent and prevent some common bugs.
Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.
Authors
Thanks to the following developers, you get to have get_html_translation_table goodness in JavaScript.
Sure it's not a good solution, but to answer to Nick Kolosov, and as I had the same problem with using :
- htmlspecialchars => encode
- html_entity_decode => decode
And my aim was to not add parameters in functions (keep functions like php). I decided to add a small code in decode functions in order to fix the problem.
in html_entity_decode function before :
1 | for (symbol in hash_map) { |
add the following lines :
1 2 3 4 | // BOF : fix & problem delete(hash_map['&']); hash_map['&'] = '&'; // EOF : fix & problem |
Ups, blog eated html tags. Error example:
1 | html_entity_decode('&nbsp;') = ' ' instead of ' ' |
Entities order must depend on the direction of translation.
With conversion => <a>
entities['38'] must be the first one.
With conversion <a> =>
entities['38'] must be the last one.
Current version with html_entity_decode converts to space instead of . Its' wrong.
May be html_entity_decode must be corrected, js is not my s trong side, don't know how to reverse hash order
@Roger: Yes, things can be that easy, if that's what you are trying to do. However, your function creating numeric character references has no relation to substituting for get_html_translation_table() for those who need it (nor for htmlentities() or htmlspecialchars() which depend on it).
Things can be so easy:
1 2 3 4 56 7 8 9 10 | function toHTMLEntity(str) { var s = str.split(""); var ret = ""; for (i = 0; i < s.length; i++) { var c = s[i].charCodeAt(0); if(c > 127) ret += ("&#" + c + ";"); else ret += s[i]; } return ret; } |
It was fixed recently in subversion (SVN). It just needed some time to be made available.
Yes, it's true that ECMAScript doesn't guarantee the order of execution within objects, but I understand that all major browsers maintain the order (and PHP.JS in general depends on this, being as we rely on objects for associative array-like behavior).
Good point about "histogram". Maybe someone copied it from count_chars() which looks like that one used the word correctly. Anyways, I fixed it for the other functions (entity ones) where it was indeed not correct.
Because the ampersand is used in all entities, and htmlspecialchars etc. washes the string through the split and join repeatedly, entities['38'] should be the first item in the array and the first character replaced. Thus any ampersand already in the string will be correctly replaced but those introduced by the replacement of other characters will remain intact.
Someone may have commented on this previously; it's hard to tell because the comments are a bit hard to follow, but regardless, it hasn't been fixed.
Moving it up to line 40 solves the problem in my code, but if I remember correctly the use of for...in doesn't guarantee iterators in a particular order so it might be better to take it out of the entities table and replace it separately (though in my experience they come out in the order they were assigned).
P.S. Just as an observation, you use 'histogram' as a variable name in a number of functions for what is actually a hash table...
his?to?gram?/?h?st??græm/
–noun Statistics.
a graph of a frequency distribution in which rectangles with bases on the horizontal axis are given widths equal to the class intervals and heights equal to the corresponding frequencies.
useQuoteStyle = !isNaN(quote_style) ? constMappingQuoteStyle[quote_style] : quote_style ? quote_style.toUpperCase() : 'ENT_COMPAT';
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 | useTable = (table ? table.toUpperCase() : 'HTML_SPECIALCHARS'); useQuoteStyle = (quote_style ? quote_style.toUpperCase() : 'ENT_COMPAT'); // Translate arguments constMappingTable[0] = 'HTML_SPECIALCHARS'; constMappingTable[1] = 'HTML_ENTITIES'; constMappingQuoteStyle[0] = 'ENT_NOQUOTES'; constMappingQuoteStyle[2] = 'ENT_COMPAT'; constMappingQuoteStyle[3] = 'ENT_QUOTES'; // Map numbers to strings for compatibilty with PHP constants if (!isNaN(useTable)) { useTable = constMappingTable[useTable]; } if (!isNaN(useQuoteStyle)) { useQuoteStyle = constMappingQuoteStyle[useQuoteStyle]; } |
==> get_html_translation_table(0,2);
1
2
3
4
56
7
| constMappingTable[0] = 'HTML_SPECIALCHARS'; constMappingTable[1] = 'HTML_ENTITIES'; constMappingQuoteStyle[0] = 'ENT_NOQUOTES'; constMappingQuoteStyle[2] = 'ENT_COMPAT'; constMappingQuoteStyle[3] = 'ENT_QUOTES'; useTable = !isNaN(table) ? constMappingTable[table] : table ? table.toUpperCase() : 'HTML_SPECIALCHARS'; useQuoteStyle = !isNaN(quote_style) ? constMappingQuoteStyle[table] : quote_style ? quote_style.toUpperCase() : 'ENT_COMPAT'; |
i think you need to move entities['38'] on top
1
2
3
4
56
7
8
9
| entities['38'] = '&'; if (useQuoteStyle != 'ENT_NOQUOTES') { entities['34'] = '"'; } if (useQuoteStyle == 'ENT_QUOTES') { entities['39'] = '''; } |
I suggest that you add a ; after the following code so that the script can be packed to one line (for example with: http://dean.edwards.name/packer/)
1 | symbol = String.fromCharCode(decimal) |
@ GreLI: It was easier developing (read: copy & pasting ;) that way. We might want to switch back to reduce it's size though, that's a good point, thanks.
Instead of this:
1 2 3 | entities['38'] = '&amp;amp;'; entities['60'] = '&amp;lt;'; entities['62'] = '&amp;gt;'; |
You can write
1
2
3
4
5 | entities = { '38': '&amp;amp;', '60': '&amp;lt;', '62': '&amp;gt;' } |
to reduce size and increase readability.
You need to change position for some lines.
From:
1 2 3 | entities['60'] = '&lt;'; entities['62'] = '&gt;'; entities['38'] = '&amp;'; |
To:
1 2 3 | entities['38'] = '&amp;'; entities['60'] = '&lt;'; entities['62'] = '&gt;'; |
Because it will be encode wrong. Example:
<a> => &lt;a&gt;


Kevin van Zonneveld
14 Dec '09
@ Fox: Thanks for fixing : )
Will be online shortly folks.