Regular expression to find all methods in a piece of code

Regular expression to find all methods in a piece of code - javascript

I am trying to write a regular expression to match all the JavaScript method definitions in a constructor string.
//These two should match
this.myMethod_1 = function(test){ return "foo" }; //Standard
this.myMethod_2 = function(test, test2){ return "foo" }; //Spaces before
//All of these should not
//this.myMethod_3 = function(test){ return "foo" }; //Comment shouldn't match
/**
*this.myMethod_4 = function(test){ return "foo" }; //Block comment shouldn't match
*/
// this.myMethod_5 = function(test){ return "foo" }; //Comment them spaces shouldn't match
/*
* this.myMethod_6 = function(test){ return "foo" }; //Block comment + spaces shouldn't match
*/
this.closure = (function(){ alert("test") })(); //closures shouldn't match
The regular expression should match ['myMethod_1', 'myMethod_2']. The regular expression should not match ['myMethod_3', 'myMethod_5', 'myMethod_6', 'closure'].
Here's what I have so far, but I am having problems with the ones that appear in comments:
/(?<=this\.)\w*(?=\s*=\s*function\()/g
I've been using this cool site to test it.
How do I solve this?

This sounds complicated to do it correctly. You will need to create a parser for this, a simple regular expression will most likely not make it.
A very good starting point is Narcissus, which is a JavaScript parser written in ... JavaScript.
It is just 1000 lines of code. It should be possible to extract just the method-matching parts of it.

Add a ^\s* to the begining might help. It's not perfect, but it will work for your test cases.

One regular expression might be difficult to write and debug. Think about writing several regular expressions, one for each line that should either match to confirm or reject a piece of code.
For example,
/(?<=this.)\w*(?=\s*=\s*function()/g // Matches a simple constructor.
/^\/\// // If it matches then this line starts with a comment.
and so on.

Related

Javascript regex - getting function name from string

I am trying to get a function name from a string in javascript.
Let's say I have this string:
function multiply($number) {
return ($number * 2);
}
I am using the following regex in javascript:
/([a-zA-Z_{1}][a-zA-Z0-9_]+)\(/g
However, what is selected is multiply(. This is wrong. What I want is the word multiply without the the (, though the regex should keep in mind that the function name must be attached an (.
I can't get this done. How can I make the proper regex for this? I know that this is not something I really should do and that it is quite error sensitive, but I still wanna try to make this work.

Just replace last \) with (?=\()
`function multiply($number) {
return ($number * 2);
}`.match(/([a-zA-Z_{1}][a-zA-Z0-9_]+)(?=\()/g) // ["multiply"]

You can use:
var name = functionString.match(/function(.*?)\(/)[1].trim();
Get anything between function and the first ( (using a non-gredy quantifier *?), then get the value of the group [1]. And finally, trim to remove surrounding spaces.
Example:
var functionString = "function dollar$$$AreAllowedToo () { }";
var name = functionString.match(/function(.*?)\(/)[1].trim();
console.log(name);
Notes:
The characters allowed in javascript for variable names are way too much to express in a set. My answer takes care of that. More about this here
You may want to consider the posibility of a comment between function and (, or a new line too. But this really depends on how you intend to use this regex.
take for examlpe:
function /*this is a tricky comment*/ functionName // another one
(param1, param2) {
}

Return inline comment to a new line

I am trying to parse a code block line-by line. Is there a way to grab in-line comments and return it to the next line? I would imagine using regex, but I am having trouble coming up with the expression.
Example:
if(foo) { //Executes bar function
bar();
}
will be
if(foo) {
//Executes bar function
bar();
}

Using JavaScript, could do something like turn all your code into a string and /(\/\/.+$)/g to capture inline comments and then use replace like:
stringVar.replace(/(\/\/.+$)/, '\n\t $1 \n').
If you have a text editor or IDE with regex support, you could use the above .replace pattern for the find and replace options respectively.

To match all single line - comments that are not in an otherwise empty line, you could use following regex:
/^.*\S+.*(\/\/.*$)/mg
https://regex101.com/r/fU5lO4/1
example
console.log("hello"); // this comment will be matched
// this comment won't be matched
// this comment won't be matched
You could replace the found comment with a newline + itself. (And maybe add some whitespaces?)
example
yourText.replace(/^(.*\S+.*)(\/\/.*$)/mg, '$1\n $2' );

JavaScript RegEx to get test spec

I have a need to get the current test spec my caret is in when using Jasmine. So if I have a spec like:
it("should do something", function() {
var foo = 'bar';
expect(foo).toEqual('bar');
});
and I have my caret in the blank line and I click some button in a UI, it should go back from the caret to find the spec. So it goes to the var foo = 'bar'; line and detects it's not a match so goes to the next which has it() and therefore finds that line to be the spec. So going back line by line I can do but detecting if it's the line with the it() in it is what I need help with.
My end case will be detecting if the function() being passed as the 2nd arg has an argument in it or not. If it doesn't then I need to add one in there. So since the above snippet doesn't have an argument in the function() then I need to add one so that it looks like:
it("should do something", function(done) {
var foo = 'bar';
expect(foo).toEqual('bar');
});
Notice the done now in the function(). Also, the "should do something" can be double quotes or single quotes and can contain any legal JavaScript character in it.
As a test, I used this RegEx:
/^\s*it\((?:"|')[\w\s]+(?:"|'), function\((?:\w+)?\) {/
And it works for my simple tests but it feels incomplete especially in the "should do something" detection part.

I think you can safely use a regex that is based on the unroll-the-loop method:
^\s*it\([^,]*(?:,(?!\s*function\()[^,]*)*,\s*function\(\w*\)\s*{
It matches it( at the beginning of a string, followed by anything that is not , function( and then , function(...) {. A synonym of a ^\s*it\([\s\S]*?,\s*function\(\w*\)\s*{, but a much more efficient expression.
See the regex demo
Now, if you need to match such signatures without any text inside function(), you can use capturing groups around the subpatterns you want to keep and that you can later reference as $1 and $2:
var re = /^(\s*it\([^,]*(?:,(?!\s*function\()[^,]*)*,\s*function\()(\)\s*{)/;
var str = 'it("should do something", function() {\n var foo = \'bar\';\n\n expect(foo).toEqual(\'bar\');\n});';
var subst = '$1done$2';
var result = str.replace(re, subst);
document.write(result);
If you really can have such wierd strings as Oriol suggests, use
^\s*it\((?:"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'),\s*function\(\w*\)\s*{
See another regex demo
It will match
it(",function(){", function() {
it(',function(){', function() {

How to split a long regular expression into multiple lines in JavaScript?

I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;

Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)

[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();

Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.

Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.

There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g

The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));

To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');

You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);

I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);

Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");

#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i

regular expression to extract function body

I have this regexp
var bodyRegExp = /function[\\s]+[(].+[)][\\s]+{(.+)}/;
bodyRegExp.exec("module.exports = function () { /* any content */ }");
It doesn't work. Why is it broken?
It's meant to pull the body of the function statement out of the source code.
Edit:
I'm being stupid. Trying to parse javascript with a regexp is stupid.

Don't escape your backslashes. Do escape your curly braces. Your character set square bracket expressions are unnecessary. Use this instead:
var bodyRegExp = /function\s+\(.*\)\s+\{(.+)\}/;
Still, this is not a very robust expression - it won't work with multi-line functions and will give unexpected results when your function has more than one set of parens or curly braces - which seems extremely likely. But it should at least address the issues you are having.
Edit: If you are always dealing with a string that contains a function with no preceding or following statements, the solutions is quite simple. Just get everything after the first opening curly brace and before the last closing curly brace.
var fnBody = fn.substring(fn.indexOf("{") + 1, fn.lastIndexOf("}"));
If you are trying to extract a single function out of a string that contains more than just the one function, you'll need to write a whole parsing algorithm to do it. Or, if it is safe to do so, you could execute the JavaScript and get the function definition string by going var fn = module.exports.toString() and then apply the above code to that string.

Function.prototype.body=function(){
this._body=this.toString().substring(this.toString().indexOf("{") + 1, this.toString().lastIndexOf("}"));
return this._body;
};
then :
myFn.body()
Just , this call,, after that you can access to body using the attribute _body :
myFn._body

/function[\\s]+[(].+[)][\\s]+{(.+)}/
your function (/* right here is wrong */)
use are using .+ which is one or more. So you need zero or more, /function +\(.*\) +{(.+)}/

The regex you need.
This one will separately extract the arguments and the body of the function code. Returning an array of 6 items the 3rd and 5th items in the array will be the arguments and the function code body. Can be used to extract methods from objects.
You can call func.toString() then use this regex on it.
var matcharray = funcstring.match(/(function\s?)([^\.])([\w|,|\s|-|_|\$]*)(.+?\{)([^\.][\s|\S]*(?=\}))/);

var bodyRegExp = /\{(.+?)\}+$/;
console.log("module.exports = function () { /* any content */ }".match(bodyRegExp))
do you have to use exec??
This returns ["{ /* any content */ }", " /* any content */ "]

I'm surprised no one did the obvious regex-wise:
var fn = function whatever1() {
function whatever2() {
}
};
var body = (''+fn).match(/{([^]*)}[^}]*/)[1];
Output:
"
function whatever2() {
}
"
Perfectly suited for multiple lines; however, personally, I like #gilly3's answer the best, using indexOf and lastIndexOf rather than a regex for this simple case (regex may be overkill). ;)

you cannot use regular expressions to parse JavaScript language syntax because the grammar for that language is too complex for what regex can do.

We Keep Coding

JavaScript is the programming language of the Web.

Regular expression to find all methods in a piece of code - javascript

Add a ^\s* to the begining might help. It's not perfect, but it will work for your test cases.

Related

Javascript regex - getting function name from string

Return inline comment to a new line

JavaScript RegEx to get test spec

How to split a long regular expression into multiple lines in JavaScript?

regular expression to extract function body

Categories

Resources