JavaScript urlencode
URL-encodes string
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 33 34 35 | function urlencode (str) { // URL-encodes string // // version: 911.718 // discuss at: http://phpjs.org/functions/urlencode // + original by: Philip Peterson // + improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + input by: AJ // + improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + improved by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + input by: travc // + input by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + improved by: Lars Fischer // + input by: Ratheous // + reimplemented by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Joris // + reimplemented by: Brett Zamir (http://brett-zamir.me) // % note 1: This reflects PHP 5.3/6.0+ behavior // % note 2: Please be aware that this function expects to encode into UTF-8 encoded strings, as found on // % note 2: pages served as UTF-8 // * example 1: urlencode('Kevin van Zonneveld!'); // * returns 1: 'Kevin+van+Zonneveld%21' // * example 2: urlencode('http://kevin.vanzonneveld.net/'); // * returns 2: 'http%3A%2F%2Fkevin.vanzonneveld.net%2F' // * example 3: urlencode('http://www.google.nl/search?q=php.js&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a'); // * returns 3: 'http%3A%2F%2Fwww.google.nl%2Fsearch%3Fq%3Dphp.js%26ie%3Dutf-8%26oe%3Dutf-8%26aq%3Dt%26rls%3Dcom.ubuntu%3Aen-US%3Aunofficial%26client%3Dfirefox-a' str = (str+'').toString(); // Tilde should be allowed unescaped in future versions of PHP (as reflected below), but if you want to reflect current // PHP behavior, you would need to add ".replace(/~/g, '%7E');" to the following. return encodeURIComponent(str).replace(/!/g, '%21').replace(/'/g, '%27').replace(/\(/g, '%28'). replace(/\)/g, '%29').replace(/\*/g, '%2A').replace(/%20/g, '+'); } |
Examples
» Example 1
Running
1 | urlencode('Kevin van Zonneveld!'); |
Should return
1 | 'Kevin+van+Zonneveld%21' |
» Example 2
Running
1 | urlencode('http://kevin.vanzonneveld.net/'); |
Should return
1 | 'http%3A%2F%2Fkevin.vanzonneveld.net%2F' |
Dependencies
No dependencies, you can use this function standalone.
Open syntax issues
php.js uses JsLint to help us keep our code consistent and prevent some common bugs.
Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.
Authors
Thanks to the following developers, you get to have urlencode goodness in JavaScript.
Ok,
I've updated in git (at http://github.com/kvz/phpjs/commit/2691be636ea1d3f8d035bfbe11fb2e05657b48da ) to a simpler (and faster) implementation based on encodeURIComponent (for all the urlencode/decode functions), but fully adjusting to how PHP is SUPPOSED to become as of PHP 5.3/6.0 (though I didn't see news of it yet). If you want PHP how it is now, to the encode functions you should add (since encodeURIComponent() doesn't do it) an additional:
1 | .replace(/~/g, '%7E'); |
...since PHP at present outdatedly encodes the tilde, while later RFC's have let it be unencoded.
(The decode functions in PHP already can decode the tilde ok, so no need to "correct" here.)
Two other lessons learned (I hope) from RFC3986 (at http://labs.apache.org/webarch/uri/rfc/rfc3986.html ):
1) The reason why "!", "'", "(", ")", and "*" are now reserved (though not by the time encodeURIComponent was added to JavaScript, thus it is outdated and has to be corrected), even though they have no special official URI delimiting function, is because as characters normally usable for other purposes, it helps indicate that the other items in the group to which they belong (e.g., as with "&", "=", etc.) are generally not safe to be used as is without escaping. I guess it also allows them to be used for unofficial purposes.
2) Although there are no PHP analogues to encodeURI() in JavaScript (as urlencode() and rawurlencode() pretty much are for encodeURIComponent), so we don't have to worry about it as far as PHP.JS here, another way in which JavaScript is a little behind the times is in encodeURI() as far as how it should stop escaping square brackets, as they have been made reserved in order to be usable with IPv6 (delimiters for an IP literal in the 'host'). One might thus "fix" encodeURI thus (but NOT encodeURIComponent which is SUPPOSED to escape delimiters like '/' and now '['):
1 2 3 | function fixedEncodeURI () { return encodeURI(str).replace(/%5B/g, '[').replace(/%5D/g, ']'); } |
For the record, we can do all of the straight replaces above because UTF-8 only uses bytes 0x00 to 0x7F for single-byte ASCII--these bytes can therefore be safely replaced back-and-forth from their escaped form to their unescaped form without fear that it is being used as part of a multi-byte sequence.
Again, folks, be very careful before submitting patches that you realize that our encoding/decoding is done here assuming UTF-8; you have to serve your PHP pages with a UTF-8 header (as you should) if you want comparable behavior on the PHP side.
Below is the old version, just for easy reference (e.g., if you happen to want to know how to make your own UTF-8 octets):
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 33 34 3536 37 38 39 4041 42 43 44 4546 47 48 49 5051 52 53 54 5556 57 58 59 6061 62 63 | function urlencode (str) { // http://kevin.vanzonneveld.net // + original by: Philip Peterson // + improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + input by: AJ // + improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + improved by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + input by: travc // + input by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + improved by: Lars Fischer // + input by: Ratheous // + reimplemented by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Joris // % note 1: This reflects PHP 5.3/6.0+ behavior // * example 1: urlencode('Kevin van Zonneveld!'); // * returns 1: 'Kevin+van+Zonneveld%21' // * example 2: urlencode('http://kevin.vanzonneveld.net/'); // * returns 2: 'http%3A%2F%2Fkevin.vanzonneveld.net%2F' // * example 3: urlencode('http://www.google.nl/search?q=php.js&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a'); // * returns 3: 'http%3A%2F%2Fwww.google.nl%2Fsearch%3Fq%3Dphp.js%26ie%3Dutf-8%26oe%3Dutf-8%26aq%3Dt%26rls%3Dcom.ubuntu%3Aen-US%3Aunofficial%26client%3Dfirefox-a' var hexStr = function (dec) { return '%' + (dec < 16 ? '0' : '') + dec.toString(16).toUpperCase(); }; var ret = '', unreserved = /[\w.-]/; // A-Za-z0-9_.- // Tilde is not here for historical reasons; to preserve it, use rawurlencode instead str = (str+'').toString(); for (var i = 0, dl = str.length; i < dl; i++) { var ch = str.charAt(i); if (unreserved.test(ch)) { ret += ch; } else { var code = str.charCodeAt(i); if (0xD800 <= code && code <= 0xDBFF) { // High surrogate (could change last hex to 0xDB7F to treat high private surrogates as single characters); https://developer.mozilla.org/index.php?title=en/Core_JavaScript_1.5_Reference/Global_Objects/String/charCodeAt ret += ((code - 0xD800) * 0x400) + (str.charCodeAt(i+1) - 0xDC00) + 0x10000; i++; // skip the next one as we just retrieved it as a low surrogate } // We never come across a low surrogate because we skip them, unless invalid // Reserved assumed to be in UTF-8, as in PHP else if (code === 32) { ret += '+'; // %20 in rawurlencode } else if (code < 128) { // 1 byte ret += hexStr(code); } else if (code >= 128 && code < 2048) { // 2 bytes ret += hexStr((code >> 6) | 0xC0); ret += hexStr((code & 0x3F) | 0x80); } else if (code >= 2048) { // 3 bytes (code < 65536) ret += hexStr((code >> 12) | 0xE0); ret += hexStr(((code >> 6) & 0x3F) | 0x80); ret += hexStr((code & 0x3F) | 0x80); } } } return ret; } |
@Martin, to add to my comment just now, I see escape() would do the trick, but that is deprecated, again because it assumes Latin-1.
Thanks, Martin. Can you explain why we don't want UTF-8 though? I see when I test this with PHP, if the file is encoded in UTF-8, I get the same results. Given the tide turning toward UTF-8, not to mention its compatibility with all languages, I think it's best to try for that, no?
I guess we could add a custom "phpjs." configuration option (triggered through our ini_set() which allowed for other character sets), but we'd probably want to use some generic algorithm to translate assuming Latin-1 input (or whatever) rather than adding character conversions case by case. What do you think?
The UK pound sign (£) encodes with multiple escape sequences giving:
1 | %C2%A3 |
rather than
1 | %A3 |
This is due to conversion into UTF-8. I suggest adding the following into the histogram array as a simple fix:
1 | histogram['%C2%A3'] = '%A3'; |
@ bukura: Unfortunately we have some bad experiences with escape, as it does not provide PHP compatible output. Please also see the link we refer to in the script.
1 2 3 4 56 7 8 9 1011 | function urlencode (str) { var res=""; for (i=0;i<str.length;i++) { if(str[i]==' ') { res+='+'; }else { res+=escape(str[i]); } } return res;} |
@ AJ: I've rewritten the urlencoding functions, should be a great improvement! Thanks for your input.
It's a good function, but the it needs to encode the forward slash character also. I'd recommend adding the following line before the return statement:
1 | ret = ret.replace(/\//g,'%2F'); |
Short of going into the PHP source, this seems to work reasonably similarly.
@ johnrembo: Hi John, thanks for your input again. We had some discussion about it earlier. It doesn't mimic PHP behaviour enough. Differences between JavaScript's encoding functionalities can be found here: http://xkr.us/articles/javascript/encode-compare/
Yeah I did it because Michael reached the conclusion that encodeURIComponent had better PHP compatibility.
I guess the tester doesn't work because in it current form it fails to handle \n characters, and maybe the exclamation mark gets translated twice, I have to double check that.
Discussion on encodeURIComponent vs escape can be found here:
http://kevin.vanzonneveld.net/techblog/article/javascript_equivalent_for_phps_http_build_query/#comment_1071
If you reach a different conclusion, please let me know ok?


Mohsen Haeri
19 Nov '09