Use PHP functions in JavaScript

JavaScript strlen

Get string length

1
2
3
4
56
7
8
9
1011
12
13
14
1516
17
18
19
2021
22
23
24
2526
27
28
29
3031
32
33
34
3536
37
38
39
4041
42
43
44
4546
47
48
49
5051
52
53
54
function strlen (string) {
    // Get string length  
    // 
    // version: 909.322
    // discuss at: http://phpjs.org/functions/strlen    // +   original by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
    // +   improved by: Sakimori
    // +      input by: Kirk Strobeck
    // +   improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
    // +   bugfixed by: Onno Marsman    // +    revised by: Brett Zamir (http://brett-zamir.me)
    // %        note 1: May look like overkill, but in order to be truly faithful to handling all Unicode
    // %        note 1: characters and to this function in PHP which does not count the number of bytes
    // %        note 1: but counts the number of characters, something like this is really necessary.
    // *     example 1: strlen('Kevin van Zonneveld');    // *     returns 1: 19
    // *     example 2: strlen('A\ud87e\udc04Z');
    // *     returns 2: 3
    var str = string+'';
    var i = 0, chr = '', lgth = 0; 
    var getWholeChar = function (str, i) {
        var code = str.charCodeAt(i);
        var next = '', prev = '';
        if (0xD800 <= code && code <= 0xDBFF) { // High surrogate (could change last hex to 0xDB7F to treat high private surrogates as single characters)            if (str.length <= (i+1))  {
                throw 'High surrogate without following low surrogate';
            }
            next = str.charCodeAt(i+1);
            if (0xDC00 > next || next > 0xDFFF) {                throw 'High surrogate without following low surrogate';
            }
            return str.charAt(i)+str.charAt(i+1);
        } else if (0xDC00 <= code && code <= 0xDFFF) { // Low surrogate
            if (i === 0) {                throw 'Low surrogate without preceding high surrogate';
            }
            prev = str.charCodeAt(i-1);
            if (0xD800 > prev || prev > 0xDBFF) { //(could change last hex to 0xDB7F to treat high private surrogates as single characters)
                throw 'Low surrogate without preceding high surrogate';            }
            return false; // We can pass over low surrogates now as the second component in a pair which we have already processed
        }
        return str.charAt(i);
    }; 
    for (i=0, lgth=0; i < str.length; i++) {
        if ((chr = getWholeChar(str, i)) === false) {
            continue;
        } // Adapt this line at the top of any loop, passing in the whole string and the current iteration and returning a variable to represent the individual character; purpose is to treat the first part of a surrogate pair as the whole character and then ignore the second part        lgth++;
    }
    return lgth;
}
external links: original PHP docs | raw js source

Examples

» Example 1

Running

1
strlen('Kevin van Zonneveld');

Should return

1
19

» Example 2

Running

1
strlen('A\ud87e\udc04Z');

Should return

1
3

Dependencies

No dependencies, you can use this function standalone.

Open syntax issues

php.js uses JsLint to help us keep our code consistent and prevent some common bugs.

Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.


Authors

Thanks to the following developers, you get to have strlen goodness in JavaScript.

Comments

Add Comment
Use:
[CODE]
your_stuff('here');
[/CODE]
for proper code formatting
By submitting code here you are allowing us to use it in php.js hence dual licensing it under the MIT and GPL licenses

Gravatar
Kevin van Zonneveld
15 Jan '09 Permalink

q  @ Brett Zamir: wow. I don't usually have to deal with these things. more kuddos to you, man.

Gravatar
Brett Zamir
15 Jan '09 Permalink

q   Oh sorry, in order to convert to string, you can add the line

1
str = str+'';


as the very first line in strlen() (before getWholeChar()).

By the way, I do see that your blogging software does not convert the character in my 2nd example into entities, so you can try that example too. Best, Brett

Gravatar
Brett Zamir
15 Jan '09 Permalink

q   While the following may look like overkill, in order to be truly faithful to handling all Unicode characters and to this function in PHP which does not count the number of bytes but counts the number of characters, something like this is really necessary:

1
2
3
4
56
7
8
9
1011
12
13
14
1516
17
18
19
2021
22
23
24
2526
27
28
29
3031
32
33
34
3536
37
38
39
4041
42
// Form a string with a form of the Han character for &quot;you&quot; surrounded by the letters A and Z
var str = 'A\ud87e\udc04Z'; // Including two &quot;surrogates&quot; which are used to form a single character in Unicode (so the count of this should be 3, not 4 as str.length will give)
// var str = 'A你Z'; // If your blogging software won't mess with the Unicode, you can try this equivalent example as well (should be 3, not 4 as str.length will give)
 
alert(  strlen(str)
);
 
// Note that the exceptions will only be thrown if the string is poorly formed Unicode (something unlikely unless it was deliberate--e.g., try taking out one of the surrogate pairs above).
// Also note that although it will indeed be rare, especially for Western scripts, that str.length would not handle the situation correctly, in order to support handling of all languages that can be expressed in Unicode, the following is necessary. 
function strlen (str) {
        function getWholeChar (str, i) {
                var code = str.charCodeAt(i);
                if (0xD800 &lt;= code &amp;&amp; code &lt;= 0xDBFF) { // High surrogate(could change last hex to 0xDB7F to treat high private surrogates as single characters)                        if (str.length &lt;= (i+1))  {
                                throw 'High surrogate without following low surrogate';
                        }
                        var next = str.charCodeAt(i+1);
                        if (0xDC00 &gt; next || next &gt; 0xDFFF) {                                throw 'High surrogate without following low surrogate';
                        }
                        return str[i]+str[i+1];
                }
                else if (0xDC00 &lt;= code &amp;&amp; code &lt;= 0xDFFF) { // Low surrogate                        if (i === 0) {
                                throw 'Low surrogate without preceding high surrogate';
                        }
                        var prev = str.charCodeAt(i-1);
                        if (0xD800 &gt; prev || prev &gt; 0xDBFF) { //(could change last hex to 0xDB7F to treat high private surrogates as single characters)                                throw 'Low surrogate without preceding high surrogate';
                        }
                        return false; // We can pass over low surrogates now as the second component in a pair which we have already processed
                }
                return str[i];        }
        for (var i=0, lgth=0; i &lt; str.length; i++) {
                if ((chr = getWholeChar(str, i)) === false) {continue;} // Adapt this line at the top of any loop, passing in the whole string and the current iteration and returning a variable to represent the individual character; purpose is to treat the first part of a surrogate pair as the whole character and then ignore the second part
                lgth++;
        }        return lgth;
}


(By the way on an unrelated note, I see shuffle() and possibly some other array functions also need to be made to work with associative arrays (just correcting myself about only a few needing it).)

Gravatar
Kevin van Zonneveld
6 Oct '08 Permalink

q   @ Onno Marsman: Very well, I will leave it at

1
return (string+'').length;

then.

Gravatar
Onno Marsman
6 Oct '08 Permalink

q  This was already covered.
(string+'') is always a string so (string+'').length always is an integer and can never result into false.
So the || 0 can be removed.

There is some behavior that doesn't correspond to PHP behavior and that is when you apply strlen to an array or object. But I don't think there will be a need to check this, and Javascripts behavior can be considered to be better. (calling .toString() on an object when the concatenation occurs)

Gravatar
Kevin van Zonneveld
6 Oct '08 Permalink

q  @ Onno Marsman: I guess sakimori's change slipped through. He did make it into the comments.
I believe the reason for the if statement is that you want to have strlen return 0, even if it returns false. What do you think about this implementation?

Gravatar
Onno Marsman
4 Oct '08 Permalink

q   The l variable doesn't seem to do much. As far as I can see this function is exactly the same as the following:

1
2
3
function strlen (string) {
    return (string+'').length;
}


This is already suggested by Sakimori but for some reason his code didn't make it into the function. I think it should.

Gravatar
Kevin van Zonneveld
8 Sep '08 Permalink

q  @ Kirk Strobeck: I've added some code that I think would make it better. But if you could provide the code that breaks it, that would help greatly, we can then also add it to the examples so it will be tested thoroughly.

Thank you!

Gravatar
Kirk Strobeck
6 Sep '08 Permalink

q  There is one problem, this returns NULL if empty, it should return 0, so you can test in an if statement without an error.

Gravatar
Kevin van Zonneveld
16 May '08 Permalink

q  @ Sakimori: Cool, thank you for improving our project!

Gravatar
Sakimori
16 May '08 Permalink

q   In PHP, strlen(45) returns 2. With the above JS implementation, strlen(45) returns undefined (numbers have no "length" property).

You might consider changing it to:

1
return String(string).length;

... or maybe even:
1
return (&quot;&quot; + string).length;


Contribute a New function