JavaScript strlen
Get string length
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 33 34 3536 37 38 39 4041 42 43 44 4546 47 48 49 5051 52 53 54 | function strlen (string) { // Get string length // // version: 909.322 // discuss at: http://phpjs.org/functions/strlen // + original by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + improved by: Sakimori // + input by: Kirk Strobeck // + improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net) // + bugfixed by: Onno Marsman // + revised by: Brett Zamir (http://brett-zamir.me) // % note 1: May look like overkill, but in order to be truly faithful to handling all Unicode // % note 1: characters and to this function in PHP which does not count the number of bytes // % note 1: but counts the number of characters, something like this is really necessary. // * example 1: strlen('Kevin van Zonneveld'); // * returns 1: 19 // * example 2: strlen('A\ud87e\udc04Z'); // * returns 2: 3 var str = string+''; var i = 0, chr = '', lgth = 0; var getWholeChar = function (str, i) { var code = str.charCodeAt(i); var next = '', prev = ''; if (0xD800 <= code && code <= 0xDBFF) { // High surrogate (could change last hex to 0xDB7F to treat high private surrogates as single characters) if (str.length <= (i+1)) { throw 'High surrogate without following low surrogate'; } next = str.charCodeAt(i+1); if (0xDC00 > next || next > 0xDFFF) { throw 'High surrogate without following low surrogate'; } return str.charAt(i)+str.charAt(i+1); } else if (0xDC00 <= code && code <= 0xDFFF) { // Low surrogate if (i === 0) { throw 'Low surrogate without preceding high surrogate'; } prev = str.charCodeAt(i-1); if (0xD800 > prev || prev > 0xDBFF) { //(could change last hex to 0xDB7F to treat high private surrogates as single characters) throw 'Low surrogate without preceding high surrogate'; } return false; // We can pass over low surrogates now as the second component in a pair which we have already processed } return str.charAt(i); }; for (i=0, lgth=0; i < str.length; i++) { if ((chr = getWholeChar(str, i)) === false) { continue; } // Adapt this line at the top of any loop, passing in the whole string and the current iteration and returning a variable to represent the individual character; purpose is to treat the first part of a surrogate pair as the whole character and then ignore the second part lgth++; } return lgth; } |
Examples
» Example 1
Running
1 | strlen('Kevin van Zonneveld'); |
Should return
1 | 19 |
» Example 2
Running
1 | strlen('A\ud87e\udc04Z'); |
Should return
1 | 3 |
Dependencies
No dependencies, you can use this function standalone.
Open syntax issues
php.js uses JsLint to help us keep our code consistent and prevent some common bugs.
Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.
Authors
Thanks to the following developers, you get to have strlen goodness in JavaScript.
Oh sorry, in order to convert to string, you can add the line
1 | str = str+''; |
as the very first line in strlen() (before getWholeChar()).
By the way, I do see that your blogging software does not convert the character in my 2nd example into entities, so you can try that example too. Best, Brett
While the following may look like overkill, in order to be truly faithful to handling all Unicode characters and to this function in PHP which does not count the number of bytes but counts the number of characters, something like this is really necessary:
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 33 34 3536 37 38 39 4041 42 | // Form a string with a form of the Han character for "you" surrounded by the letters A and Z var str = 'A\ud87e\udc04Z'; // Including two "surrogates" which are used to form a single character in Unicode (so the count of this should be 3, not 4 as str.length will give) // var str = 'A你Z'; // If your blogging software won't mess with the Unicode, you can try this equivalent example as well (should be 3, not 4 as str.length will give) alert( strlen(str) ); // Note that the exceptions will only be thrown if the string is poorly formed Unicode (something unlikely unless it was deliberate--e.g., try taking out one of the surrogate pairs above). // Also note that although it will indeed be rare, especially for Western scripts, that str.length would not handle the situation correctly, in order to support handling of all languages that can be expressed in Unicode, the following is necessary. function strlen (str) { function getWholeChar (str, i) { var code = str.charCodeAt(i); if (0xD800 <= code && code <= 0xDBFF) { // High surrogate(could change last hex to 0xDB7F to treat high private surrogates as single characters) if (str.length <= (i+1)) { throw 'High surrogate without following low surrogate'; } var next = str.charCodeAt(i+1); if (0xDC00 > next || next > 0xDFFF) { throw 'High surrogate without following low surrogate'; } return str[i]+str[i+1]; } else if (0xDC00 <= code && code <= 0xDFFF) { // Low surrogate if (i === 0) { throw 'Low surrogate without preceding high surrogate'; } var prev = str.charCodeAt(i-1); if (0xD800 > prev || prev > 0xDBFF) { //(could change last hex to 0xDB7F to treat high private surrogates as single characters) throw 'Low surrogate without preceding high surrogate'; } return false; // We can pass over low surrogates now as the second component in a pair which we have already processed } return str[i]; } for (var i=0, lgth=0; i < str.length; i++) { if ((chr = getWholeChar(str, i)) === false) {continue;} // Adapt this line at the top of any loop, passing in the whole string and the current iteration and returning a variable to represent the individual character; purpose is to treat the first part of a surrogate pair as the whole character and then ignore the second part lgth++; } return lgth; } |
(By the way on an unrelated note, I see shuffle() and possibly some other array functions also need to be made to work with associative arrays (just correcting myself about only a few needing it).)
This was already covered.
(string+'') is always a string so (string+'').length always is an integer and can never result into false.
So the || 0 can be removed.
There is some behavior that doesn't correspond to PHP behavior and that is when you apply strlen to an array or object. But I don't think there will be a need to check this, and Javascripts behavior can be considered to be better. (calling .toString() on an object when the concatenation occurs)
@ Onno Marsman: I guess sakimori's change slipped through. He did make it into the comments.
I believe the reason for the if statement is that you want to have strlen return 0, even if it returns false. What do you think about this implementation?
The l variable doesn't seem to do much. As far as I can see this function is exactly the same as the following:
1 2 3 | function strlen (string) { return (string+'').length; } |
This is already suggested by Sakimori but for some reason his code didn't make it into the function. I think it should.
@ Kirk Strobeck: I've added some code that I think would make it better. But if you could provide the code that breaks it, that would help greatly, we can then also add it to the examples so it will be tested thoroughly.
Thank you!


Kevin van Zonneveld
15 Jan '09