JavaScript rawurlencode
URL-encodes string
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 | function rawurlencode (str) { // URL-encodes string // // version: 1109.2015 // discuss at: http://phpjs.org/functions/rawurlencode // + original by: Brett Zamir (http://brett-zamir.me) // + input by: travc // + input by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + input by: Michael Grier // + bugfixed by: Brett Zamir (http://brett-zamir.me) // + input by: Ratheous // + reimplemented by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Joris // + reimplemented by: Brett Zamir (http://brett-zamir.me) // % note 1: This reflects PHP 5.3/6.0+ behavior // % note 2: Please be aware that this function expects to encode into UTF-8 encoded strings, as found on // % note 2: pages served as UTF-8 // * example 1: rawurlencode('Kevin van Zonneveld!'); // * returns 1: 'Kevin%20van%20Zonneveld%21' // * example 2: rawurlencode('http://kevin.vanzonneveld.net/'); // * returns 2: 'http%3A%2F%2Fkevin.vanzonneveld.net%2F' // * example 3: rawurlencode('http://www.google.nl/search?q=php.js&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a'); // * returns 3: 'http%3A%2F%2Fwww.google.nl%2Fsearch%3Fq%3Dphp.js%26ie%3Dutf-8%26oe%3Dutf-8%26aq%3Dt%26rls%3Dcom.ubuntu%3Aen-US%3Aunofficial%26client%3Dfirefox-a' str = (str + '').toString(); // Tilde should be allowed unescaped in future versions of PHP (as reflected below), but if you want to reflect current // PHP behavior, you would need to add ".replace(/~/g, '%7E');" to the following. return encodeURIComponent(str).replace(/!/g, '%21').replace(/'/g, '%27').replace(/\(/g, '%28'). replace(/\)/g, '%29').replace(/\*/g, '%2A');} |
Examples
» Example 1
Running
1 | rawurlencode('Kevin van Zonneveld!'); |
Should return
1 | 'Kevin%20van%20Zonneveld%21' |
» Example 2
Running
1 | rawurlencode('http://kevin.vanzonneveld.net/'); |
Should return
1 | 'http%3A%2F%2Fkevin.vanzonneveld.net%2F' |
Dependencies
No dependencies, you can use this function standalone.
Open syntax issues
php.js uses JsLint to help us keep our code consistent and prevent some common bugs.
Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.
Authors
Thanks to the following developers, you get to have rawurlencode goodness in JavaScript.
@Joris: Good catch about the non-BMP code points; ironic you caught me making the mistake, since I was the one who edited the article you cited for the correction to point this problem out! :) That's what I get for adapting someone else's pattern without thinking... Anyways, your addition is good, except that it should not assign to "code" but instead to "ret" and then do a "continue" after the "i++" or ensure we are in a continuous else/else-if block (I chose the latter). Also, thanks for the catch on the hex needing two chars min... Fixed in git...
This function does not work properly for 4 byte unicode characters. Browsers use UTF-16 for strings. That means any unicode character above 65536 is split up into two surrogates values.
So "code >= 65536" is NEVER true.
Oh and PHP always makes sure a percentage value is composed of two hex numbers.
Here is a version that does urlencode as if the string were really UTF-8:
var hexStr = function (dec) {
return '%' + (dec < 16 ? '0' : '') + dec.toString(16).toUpperCase();
};
var ret = '',
unreserved = /[\w.~-]/; // A-Za-z0-9_.~-
str = (str+'').toString();
for (var i = 0, dl = str.length; i < dl; i++) {
var ch = str.charAt(i);
if (unreserved.test(ch)) {
ret += ch;
}
else {
var code = str.charCodeAt(i);
if (0xD800 <= code && code <= 0xDBFF) // High surrogate (could change last hex to 0xDB7F to treat high private surrogates as single characters); https://developer.mozilla.org/index.php?title=en/Core_JavaScript_1.5_Reference/Global_Objects/String/charCodeAt&revision=39
{
code = ((code - 0xD800) * 0x400) + (str.charCodeAt(i+1) - 0xDC00) + 0x10000;
i++; // skip the next one
}
// We never come across a low surrogate because we skip them
// Reserved assumed to be in UTF-8, as in PHP
if (code < 128) { // 1 byte
ret += hexStr(code);
}
else if (code >= 128 && code < 2048) { // 2 bytes
ret += hexStr((code >> 6) | 0xC0);
ret += hexStr((code & 0x3F) | 0x80);
}
else if (code >= 2048 && code < 65536) { // 3 bytes
ret += hexStr((code >> 12) | 0xE0);
ret += hexStr(((code >> 6) & 0x3F) | 0x80);
ret += hexStr((code & 0x3F) | 0x80);
}
else if (code >= 65536) { // 4 bytes
ret += hexStr((code >> 18) | 0xF0);
ret += hexStr(((code >> 12) & 0x3F) | 0x80);
ret += hexStr(((code >> 6) & 0x3F) | 0x80);
ret += hexStr((code & 0x3F) | 0x80);
}
}
}
return ret;
Gr. Joris
it's not exactly the same chars list in escape and rawurlencode... .. .
The escape and unescape functions do not work properly for non-ASCII characters and have been deprecated. In JavaScript 1.5 and later, use encodeURI or encodeURIComponent... .. . ;o)
@ tchaOo°
Good catch! I'm not sure how that happened, but it is now fixed in SVN. I've actually been meaning to review these functions, as I'm not 100% sure now that the recent changes to the histogram have all been correct, at least for all functions...
Not encoding spaces is not the behavior of rawurlencode or urlencode, for that matter.
urlencode and rawurlencode both encode anything that is not "A to Z", "a to z", "0 to 9", "-", "_" or "." ... the only difference between them is how spaces are encoded... urlencode encodes spaces as "+" and rawurlencode encodes spaces as "%20".


Joris van der Wel
29 Sep '09
Well, if a high surrogate is found, the i++; is just there so we do not loop over the low surrogate the next time.
It then goes all the way to
if (code >= 65536) { // 4 byteto turn it into utf-8
That just me accounting for the remote possibility the specification changes (aka charCodeAt returning something bigger then 65535)
Funny thing is, I actually wrote my own rawurlencode function before finding this one and it was nearly identical.