JavaScript get_meta_tags
Extracts all meta tag content attributes from a file and returns an array
1 2 3 4 56 7 8 9 1011 12 13 14 1516 17 18 19 2021 22 23 24 2526 27 28 29 3031 32 33 34 3536 37 38 39 4041 | function get_meta_tags (file) { // Extracts all meta tag content attributes from a file and returns an array // // version: 905.3122 // discuss at: http://phpjs.org/functions/get_meta_tags // + original by: Brett Zamir (http://brett-zamir.me) // % note 1: This function uses XmlHttpRequest and cannot retrieve resource from different domain. // % note 1: Synchronous so may lock up browser, mainly here for study purposes. // - depends on: file_get_contents // * example 1: get_meta_tags('http://kevin.vanzonneveld.net/pj_test_supportfile_2.htm'); // * returns 1: {description: 'a php manual', author: 'name', keywords: 'php documentation', 'geo_position': '49.33;-86.59'} var fulltxt = ''; if (false) { // Use this for testing instead of the line above: fulltxt = '<meta name="author" content="name">'+ '<meta name="keywords" content="php documentation">'+ '<meta name="DESCRIPTION" content="a php manual">'+ '<meta name="geo.position" content="49.33;-86.59">'+ '</head>'; } else { fulltxt = this.file_get_contents(file).match(/^[\s\S]*<\/head>/i); // We have to disallow some character, so we choose a Unicode non-character } var patt = /<meta[^>]*?>/gim; var patt1 = /<meta\s+.*?name\s*=\s*(['"]?)(.*?)\1\s+.*?content\s*=\s*(['"]?)(.*?)\3/gim; var patt2 = /<meta\s+.*?content\s*=\s*(['"?])(.*?)\1\s+.*?name\s*=\s*(['"]?)(.*?)\3/gim; var txt, match, name, arr={}; while ((txt = patt.exec(fulltxt)) !== null) { while ((match = patt1.exec(txt)) !== null) { name = match[2].replace(/\W/g, '_').toLowerCase(); arr[name] = match[4]; } while ((match = patt2.exec(txt)) !== null) { name = match[4].replace(/\W/g, '_').toLowerCase(); arr[name] = match[2]; } } return arr;} |
Examples
Running
1 | get_meta_tags('http://kevin.vanzonneveld.net/pj_test_supportfile_2.htm'); |
Should return
1 | {description: 'a php manual', author: 'name', keywords: 'php documentation', 'geo_position': '49.33;-86.59'} |
Dependencies
In order to use this function, you also need:
Open syntax issues
php.js uses JsLint to help us keep our code consistent and prevent some common bugs.
Eventually we want all code to pass or at least take into consideration most fixes suggested by JsLint, following this JsLint configuration we’ve decided on.
Authors
Thanks to the following developers, you get to have get_meta_tags goodness in JavaScript.
Ok, came across a better trick than the already-pretty-safe negation I was using...Just use [\s\S] which allows a single character to be either a whitespace (including newilne) or non-whitespace--in other words anything...
Sorry, there had been several bugs in the latest versions of get_meta_tags() and file_get_contents() on which it depended. Those should all be fixed those now, so please use the latest copies for these. However, your own suggested fix will not work because that is no longer a regular expression and became instead a string (that text will no doubt never be found).
Explorer apparently has a problem if a negated character class in a regular expression is empty (e.g., [^]). We use a negated character because 1) We want to use something equivalent to the "." (any character) until we reach the text after it that we do want, but... 2) we want to reach across multiple lines (and the 'm' flag, does not, as is frequently supposed, do this). Although it doesn't look like any character is explicitly forbidden in HTML (only XHTML), since we have to add some character, I added the null control character \u0000. If someone knows another better unlikely character or approach, let us know, but I think that should be a safe bet for now.
Thanks for reporting the issue.


Raphael (Ao) RUDLER
17 Jun '09