JavaScript rawurldecode
Decodes URL-encodes string
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 | function rawurldecode (str) { // Decodes URL-encodes string // // version: 1008.1718 // discuss at: http://phpjs.org/functions/rawurldecode // + original by: Brett Zamir (http://brett-zamir.me) // + input by: travc // + input by: Brett Zamir (http://brett-zamir.me) // + bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + input by: Ratheous // + reimplemented by: Brett Zamir (http://brett-zamir.me) // % note 1: Please be aware that this function expects to decode from UTF-8 encoded strings, as found on // % note 1: pages served as UTF-8 // * example 1: rawurldecode('Kevin+van+Zonneveld%21'); // * returns 1: 'Kevin+van+Zonneveld!' // * example 2: rawurldecode('http%3A%2F%2Fkevin.vanzonneveld.net%2F'); // * returns 2: 'http://kevin.vanzonneveld.net/' // * example 3: rawurldecode('http%3A%2F%2Fwww.google.nl%2Fsearch%3Fq%3Dphp.js%26ie%3Dutf-8%26oe%3Dutf-8%26aq%3Dt%26rls%3Dcom.ubuntu%3Aen-US%3Aunofficial%26client%3Dfirefox-a'); // * returns 3: 'http://www.google.nl/search?q=php.js&ie=utf-8&oe=utf-8&aq=t&rls=com.ubuntu:en-US:unofficial&client=firefox-a' // * example 4: rawurldecode('-22%97bc%2Fbc'); // * returns 4: '-22—bc/bc' return decodeURIComponent(str); } |
Examples
» Example 1
Running
1 | rawurldecode('Kevin+van+Zonneveld%21'); |
Should return
1 | 'Kevin+van+Zonneveld!' |
» Example 2
Running
1 | rawurldecode('http%3A%2F%2Fkevin.vanzonneveld.net%2F'); |
Should return
1 | 'http://kevin.vanzonneveld.net/' |
Dependencies
No dependencies, you can use this function standalone.
Open syntax issues
php.js uses JsLint to help us keep our code consistent and prevent some common bugs.
Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.
Authors
Thanks to the following developers, you get to have rawurldecode goodness in JavaScript.
@Joris: Sorry I haven't gotten to your post yet; that looks great! I'd like to test it out a little first and then commit, if you can bear with me a little...
Incase anyone is interested, here is a version with full UTF-8 support written without decodeURIComponent or any maps.
function rawurldecode(url)
{
// This function mimmicks PHP's rawurldecode under UTF-8
// Any percentage notation is converted to its UTF-16 character.
// Only tested on Mozilla browsers (Firefox 3.5)
// Does NOT use any of decodeURIComponent, decodeURI, unescape, etc
// Supports 4 byte characters (so unicode characters 0x0000 through 0x10FFFF)
//
// Original by Joris van der Wel
var chr, a, len, ret, c, c2, c3, c4, hi, low;
ret = '';
for (a = 0, len = url.length; a < len; a++)
{
chr = url.charAt(a);
if (chr != '%')
{
ret += chr;
continue;
}
c = parseInt(url.charAt(a+1) + url.charAt(a+2), 16);
if (isNaN(c))
{
ret += '%'; // If php comes across something invalid, it just shows it without parsing
continue;
}
a += 2; // skip 2
ret += String.fromCharCode(c);
}
// second pass, convert UTF-8 to UTF-16 (Strings in javascript (ECMA-262 to be exact) are UTF-16)
url = ret;
ret = '';
for (a = 0, len = url.length; a < len; a++)
{
c = url.charCodeAt(a);
// c & 1000 0000 === 0000 0000
if( (c & 0x80) === 0 ) // 0xxxxxxx
{
ret += url.charAt(a);
}
// c & 1110 0000 === 1100 0000
else if ((c & 0xE0) === 0xC0) // 110y yyxx 10xx xxxx
{
a++;
c2 = url.charCodeAt(a);
ret += String.fromCharCode(
((c & 0x1F) << 6) |
((c2 & 0x3F) << 0)
);
}
// c & 1111 0000 === 1110 0000
else if ((c & 0xF0) === 0xE0) // 1110 yyyy 10yy yyxx 10xx xxxx
{
a++;
c2 = url.charCodeAt(a);
a++;
c3 = url.charCodeAt(a);
ret += String.fromCharCode(
((c & 0x0F) << 12) |
((c2 & 0x3F) << 6 ) |
((c3 & 0x3F) << 0 )
);
}
// c & 1111 1000 === 1111 0000
else if ((c & 0xF8) === 0xF0) // 1111 0zzz 10zz yyyy 10yy yyxx 10xx xxxx
{
a++;
c2 = url.charCodeAt(a);
a++;
c3 = url.charCodeAt(a);
a++;
c4 = url.charCodeAt(a);
c = ((c & 0x07) << 18) |
((c2 & 0x3F) << 12) |
((c3 & 0x3F) << 6 ) |
((c4 & 0x3F) << 0 ) ;
if (c >= 0x10000) // split it up using surrogates
{
c -= 0x10000;
hi = (c & 0xFFC00) >> 10; // first 10 bits
low = c & 0x003FF; // last 10 bits
hi += 0xD800; // high surrogate range
low += 0xDC00; // low surrogate range
ret += String.fromCharCode(hi, low);
}
else
{
ret += String.fromCharCode(c);
}
}
}
return ret;
}
You could probably rewrite it to use only one loop, but that would turn into spaghetti code very fast
Gr.
Sorry, it looks like to fully reflect PHP's behavior, you have to add this to the histogram (all of our other related functions should be converted accordingly as well):
histogram['\u20AC'] = '%80'; histogram['\u0081'] = '%81'; histogram['\u201A'] = '%82'; histogram['\u0192'] = '%83'; histogram['\u201E'] = '%84'; histogram['\u2026'] = '%85'; histogram['\u2020'] = '%86'; histogram['\u2021'] = '%87'; histogram['\u02C6'] = '%88'; histogram['\u2030'] = '%89'; histogram['\u0160'] = '%8A'; histogram['\u2039'] = '%8B'; histogram['\u0152'] = '%8C'; histogram['\u008D'] = '%8D'; histogram['\u017D'] = '%8E'; histogram['\u008F'] = '%8F'; histogram['\u0090'] = '%90'; histogram['\u2018'] = '%91'; histogram['\u2019'] = '%92'; histogram['\u201C'] = '%93'; histogram['\u201D'] = '%94'; histogram['\u2022'] = '%95'; histogram['\u2013'] = '%96'; histogram['\u2014'] = '%97'; histogram['\u02DC'] = '%98'; histogram['\u2122'] = '%99'; histogram['\u0161'] = '%9A'; histogram['\u203A'] = '%9B'; histogram['\u0153'] = '%9C'; histogram['\u009D'] = '%9D'; histogram['\u017E'] = '%9E'; histogram['\u0178'] = '%9F';
and then add this line right before the call to decodeURIComponent():
ret = ret.replace(/%([a-fA-F][0-9a-fA-F])/g, function (all, hex) {return String.fromCharCode('0x'+hex);}); // These Latin-B have the same values in Unicode, so we can convert them like this
Found an apparent bug... I'll try to track it down, but I'm a javascript noob.
<? print rawurldecode('-22%97bc%2Fbc'); ?>
<script type="text/javascript">
var foo = rawurldecode('-22%97bc%2Fbc');
alert(foo);
</script>
php part works fine, js breaks.
firebug reports:
malformed URI sequence
rawurldecode("-22%97bc%2Fbc")
And, yes, this string comes from encoding in php with rawurlencode (from a big nasty db response).


Brett Zamir
28 Oct '09