JavaScript rawurldecode
Decodes URL-encodes string
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 | function rawurldecode (str) { // Decodes URL-encodes string // // version: 911.718 // discuss at: http://phpjs.org/functions/rawurldecode // + original by: Brett Zamir (http://brett-zamir.me) // + input by: travc // + input by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + input by: Ratheous // + reimplemented by: Brett Zamir (http://brett-zamir.me) // % note 1: Please be aware that this function expects to decode from UTF-8 encoded strings, as found on // % note 1: pages served as UTF-8 // * example 1: rawurldecode('Kevin+van+Zonneveld%21'); // * returns 1: 'Kevin+van+Zonneveld!' // * example 2: rawurldecode('http%3A%2F%2Fkevin.vanzonneveld.net%2F'); // * returns 2: 'http://kevin.vanzonneveld.net/' // * example 3: rawurldecode('http%3A%2F%2Fwww.google.nl%2Fsearch%3Fq%3Dphp.js%26ie%3Dutf-8%26oe%3Dutf-8%26aq%3Dt%26rls%3Dcom.ubuntu%3Aen-US%3Aunofficial%26client%3Dfirefox-a'); // * returns 3: 'http://www.google.nl/search?q=php.js&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a' // * example 4: rawurldecode('-22%97bc%2Fbc'); // * returns 4: '-22—bc/bc' return decodeURIComponent(str); } |
Examples
» Example 1
Running
1 | rawurldecode('Kevin+van+Zonneveld%21'); |
Should return
1 | 'Kevin+van+Zonneveld!' |
» Example 2
Running
1 | rawurldecode('http%3A%2F%2Fkevin.vanzonneveld.net%2F'); |
Should return
1 | 'http://kevin.vanzonneveld.net/' |
Dependencies
No dependencies, you can use this function standalone.
Open syntax issues
php.js uses JsLint to help us keep our code consistent and prevent some common bugs.
Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.
Authors
Thanks to the following developers, you get to have rawurldecode goodness in JavaScript.
@Joris: Sorry I haven't gotten to your post yet; that looks great! I'd like to test it out a little first and then commit, if you can bear with me a little...
Incase anyone is interested, here is a version with full UTF-8 support written without decodeURIComponent or any maps.
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 33 34 3536 37 38 39 4041 42 43 44 4546 47 48 49 5051 52 53 54 5556 57 58 59 6061 62 63 64 6566 67 68 69 7071 72 73 74 7576 77 78 79 8081 82 83 84 8586 87 88 89 9091 92 93 94 9596 97 98 99 100101 102 103 | function rawurldecode(url) { // This function mimmicks PHP's rawurldecode under UTF-8 // Any percentage notation is converted to its UTF-16 character. // Only tested on Mozilla browsers (Firefox 3.5) // Does NOT use any of decodeURIComponent, decodeURI, unescape, etc // Supports 4 byte characters (so unicode characters 0x0000 through 0x10FFFF) // // Original by Joris van der Wel var chr, a, len, ret, c, c2, c3, c4, hi, low; ret = ''; for (a = 0, len = url.length; a < len; a++) { chr = url.charAt(a); if (chr != '%') { ret += chr; continue; } c = parseInt(url.charAt(a+1) + url.charAt(a+2), 16); if (isNaN(c)) { ret += '%'; // If php comes across something invalid, it just shows it without parsing continue; } a += 2; // skip 2 ret += String.fromCharCode(c); } // second pass, convert UTF-8 to UTF-16 (Strings in javascript (ECMA-262 to be exact) are UTF-16) url = ret; ret = ''; for (a = 0, len = url.length; a < len; a++) { c = url.charCodeAt(a); // c & 1000 0000 === 0000 0000 if( (c & 0x80) === 0 ) // 0xxxxxxx { ret += url.charAt(a); } // c & 1110 0000 === 1100 0000 else if ((c & 0xE0) === 0xC0) // 110y yyxx 10xx xxxx { a++; c2 = url.charCodeAt(a); ret += String.fromCharCode( ((c & 0x1F) << 6) | ((c2 & 0x3F) << 0) ); } // c & 1111 0000 === 1110 0000 else if ((c & 0xF0) === 0xE0) // 1110 yyyy 10yy yyxx 10xx xxxx { a++; c2 = url.charCodeAt(a); a++; c3 = url.charCodeAt(a); ret += String.fromCharCode( ((c & 0x0F) << 12) | ((c2 & 0x3F) << 6 ) | ((c3 & 0x3F) << 0 ) ); } // c & 1111 1000 === 1111 0000 else if ((c & 0xF8) === 0xF0) // 1111 0zzz 10zz yyyy 10yy yyxx 10xx xxxx { a++; c2 = url.charCodeAt(a); a++; c3 = url.charCodeAt(a); a++; c4 = url.charCodeAt(a); c = ((c & 0x07) << 18) | ((c2 & 0x3F) << 12) | ((c3 & 0x3F) << 6 ) | ((c4 & 0x3F) << 0 ) ; if (c >= 0x10000) // split it up using surrogates { c -= 0x10000; hi = (c & 0xFFC00) >> 10; // first 10 bits low = c & 0x003FF; // last 10 bits hi += 0xD800; // high surrogate range low += 0xDC00; // low surrogate range ret += String.fromCharCode(hi, low); } else { ret += String.fromCharCode(c); } } } return ret; } |
You could probably rewrite it to use only one loop, but that would turn into spaghetti code very fast
Gr.
Sorry, it looks like to fully reflect PHP's behavior, you have to add this to the histogram (all of our other related functions should be converted accordingly as well):
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 | histogram['\u20AC'] = '%80'; histogram['\u0081'] = '%81'; histogram['\u201A'] = '%82'; histogram['\u0192'] = '%83'; histogram['\u201E'] = '%84';histogram['\u2026'] = '%85'; histogram['\u2020'] = '%86'; histogram['\u2021'] = '%87'; histogram['\u02C6'] = '%88'; histogram['\u2030'] = '%89';histogram['\u0160'] = '%8A'; histogram['\u2039'] = '%8B'; histogram['\u0152'] = '%8C'; histogram['\u008D'] = '%8D'; histogram['\u017D'] = '%8E';histogram['\u008F'] = '%8F'; histogram['\u0090'] = '%90'; histogram['\u2018'] = '%91'; histogram['\u2019'] = '%92'; histogram['\u201C'] = '%93';histogram['\u201D'] = '%94'; histogram['\u2022'] = '%95'; histogram['\u2013'] = '%96'; histogram['\u2014'] = '%97'; histogram['\u02DC'] = '%98';histogram['\u2122'] = '%99'; histogram['\u0161'] = '%9A'; histogram['\u203A'] = '%9B'; histogram['\u0153'] = '%9C'; histogram['\u009D'] = '%9D';histogram['\u017E'] = '%9E'; histogram['\u0178'] = '%9F'; |
and then add this line right before the call to decodeURIComponent():
1 | ret = ret.replace(/%([a-fA-F][0-9a-fA-F])/g, function (all, hex) {return String.fromCharCode('0x'+hex);}); // These Latin-B have the same values in Unicode, so we can convert them like this |
Found an apparent bug... I'll try to track it down, but I'm a javascript noob.
1
2
3
4
56
| <? print rawurldecode('-22%97bc%2Fbc'); ?> <script type="text/javascript"> var foo = rawurldecode('-22%97bc%2Fbc'); alert(foo);</script> |
php part works fine, js breaks.
firebug reports:
malformed URI sequence
rawurldecode("-22%97bc%2Fbc")
And, yes, this string comes from encoding in php with rawurlencode (from a big nasty db response).


Brett Zamir
28 Oct '09