Use PHP functions in JavaScript

JavaScript str_word_count

Counts the number of words inside a string. If format of 1 is specified, then the function will return an array containing all the words found inside the string. If format of 2 is specified, then the function will return an associated array where the position of the word is the key and the word itself is the value. For the purpose of this function, 'word' is defined as a locale dependent string containing alphabetic characters, which also may contain, but not start with "'" and "-" characters.

1
2
3
4
56
7
8
9
1011
12
13
14
1516
17
18
19
2021
22
23
24
2526
27
28
29
3031
32
33
34
3536
37
38
39
4041
42
43
44
4546
47
48
49
5051
52
53
54
5556
function str_word_count (str, format, charlist) {
    // Counts the number of words inside a string. If format of 1 is specified,     then the function will return an array containing all the words     found inside the string. If format of 2 is specified, then the function     will return an associated array where the position of the word is the key     and the word itself is the value.      For the purpose of this function, 'word' is defined as a locale dependent     string containing alphabetic characters, which also may contain, but not start     with "'" and "-" characters.  
    // 
    // version: 909.322
    // discuss at: http://phpjs.org/functions/str_word_count    // +   original by: Ole Vrijenhoek
    // +   bugfixed by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
    // +   bugfixed by: Brett Zamir (http://brett-zamir.me)
    // %          note 1: Original author stated that "charlist parameter works correct but the last word in the given string will not be counted", but seems to work
    // *     example 1: str_word_count('Hello fri3nd, youre   looking          good today!', 1, 'àáãç3');    // *     returns 1: ['Hello', 'fri3nd', 'youre', 'looking', 'good', 'today']
    // A word is valid when it contains a-z A-Z ' - Ole Vrijenhoek
    var l = str.length, tmpStr = "";
    var i = 0;
    var c = '';    var wArr = [], wC = 0;
    var assArr = {}, aC = 0, reg = "";
    
    if (charlist) {
        for (i = 0; i<=charlist.length - 1; i++) {            if (i != charlist.length - 1) {
                reg = reg + charlist.charCodeAt(i) + "|";
            } else {
                reg = reg + charlist.charCodeAt(i);
            }        }
        reg = new RegExp(reg);
    }
 
    for (i = 0; i <= l-1; i++) {        c = str.charCodeAt(i);
        if ((c<91&&c>64)||(c<123&&c>96)||c == 45||c == 39||(reg && reg.test(c))) {
            if (tmpStr == "" && format == 2) {
                aC = i;
            }            tmpStr = tmpStr + String.fromCharCode(c);
        } else if (tmpStr != "") {
            if (format != 2) {
                wArr[wArr.length] = tmpStr;
            } else {                assArr[aC] = tmpStr;
            }
            tmpStr = "";
            wC++;
        }    }
    
    if (!format) {
        return wC;
    } else if (format == 1) {        return wArr;
    } else if (format == 2) {
        return assArr;
    }
    throw 'You have supplied an incorrect format';}
external links: original PHP docs | raw js source

Examples

Running

1
str_word_count('Hello fri3nd, youre   looking          good today!', 1, 'àáãç3');

Should return

1
['Hello', 'fri3nd', 'youre', 'looking', 'good', 'today']

Dependencies

No dependencies, you can use this function standalone.

Open syntax issues

php.js uses JsLint to help us keep our code consistent and prevent some common bugs.

Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.


Authors

Thanks to the following developers, you get to have str_word_count goodness in JavaScript.

Comments

Add Comment
Use:
[CODE]
your_stuff('here');
[/CODE]
for proper code formatting
By submitting code here you are allowing us to use it in php.js hence dual licensing it under the MIT and GPL licenses

Gravatar
Brett Zamir
Feb 13th Permalink

q  @Bug?: Yes, you are correct. Thanks for the feedback. I have now fixed it in Git: http://github.com/kvz/phpjs/raw/master/functions/strings/str_word_count.js . Note that the new version requires ctype_alpha now (and which I also needed to update now along with a lot of other functions dependent on RegExp.test()), and that function depends on setlocale() because this function should check in a way that potentially supports what other locales consider a word. I also added support for the very rare non-BMP characters, and as per PHP, allowed hyphens in the middle or apostrophes at the middle or end (and everywhere if the charlist includes these).

Gravatar
Bug?
Feb 3rd Permalink

q   The javascript function returns 5 words for this string:

1
Lorem ipsum dolor asdf asdf asdf


And 6 words for this one:

1
Lorem ipsum dolor asdf asdf asdf.


The PHP function returns 6 for both.

Cheers,

Chris

Gravatar
Brett Zamir
18 Jun '09 Permalink

q  I believe that to do this correctly for not only Chinese but other languages, we'll need to take a good look at the source code, specifically for PHP 6, since that is where full Unicode support is being added.

We might be able to use XRegExp (see http://stevenlevithan.com/regex/xregexp/ ) and its Unicode plug-in (at http://blog.stevenlevithan.com/archives/xregexp-unicode-plugin ) for our preg_ functions and then make str_word_count() dependent on it, though that won't really help determine what a "word" is (since, in Chinese, a character is technically only a graphical morpheme, and not necessarily also an independent word), though at least it will tell us definitively what a "letter" is. Of course, we can just go back to the source to see how PHP interprets a "word" since we're aiming for that anyways, but again, that will take some work, especially if we wish to make it work for other languages as well as Chinese. I'm pretty busy for now, but feel free to take a shot at it if you like.

FYI, as you can see by your Chinese characters getting mangled, the site is having some problems at the moment with Unicode characters, so if you need to refer to any in the future, maybe you could try using entities or the JavaScript Unicode escape sequences instead (e.g., \u0020). But I think the issue is beyond just Chinese (though Chinese in particular also raises the particular need for also handling characters beyond the Basic Multilingual Plane (BMP) since some Chinese characters fall beyond this plane--in JavaScript, such characters must be represented by two Unicode characters called surrogates (characters which are not used outside of such pairs), so we can't rely on the length of the string--see http://phpjs.org/functions/strlen for a solution).

Gravatar
Chris
13 Jun '09 Permalink

q  This function doesn't work quite like PHP's in that it fails to count each single Chinese character as an entire word, eg.:

?? hello ?

Is four words...


Contribute a New function