Find all function signatures with more than 2 arguments in javascript - javascript

I need to find all function signatures accepting more than X arguments (say 2).
I tried something like function\s*\((\w*,){3,10} (which would catch all signature with 3-10 args, but it did not work. Variations on it are yielding unexpected results. I guess I'm just not that good at regex, but any help is appreciated.
update: I should point out that I am writing a sort of code inspection tool. Among the many things, I want to spot functions that accept more than 2 arguments (as I promote the usage of functions with few arguments, and 1 argument in case of constructors). So I cannot call arguments.length etc.

Just think "easy":
A method typically has (...): \(\)
A method with 3 parameters has 2 , inside the brackets: \(,{2,2}\)
each , NEEDS to be preceeded AND followed by strings: \((?:\w+,\w+){2,2}\)
no double matches occur, so does not work - let's make the leading string mandatory, the following optional, but finally it needs to stop with a string:
\((?:\w+,\w*){2,2}\w+\)
usually a method declaration starts with function name: function\s+\w+\s*\((?:\w*,\w*){2,2}\)
finally, there could be whitespaces arround the paremeters: function\s+\w+\s*\((?:\s*\w+\s*,\s*\w*\s*){2,2}\w+\s*\)
There you go. This should cover all "common" method declarations, except nameless lambda-expressions:
function\s+\w+\s*\((?:\s*\w+\s*,\s*\w*\s*){2,2}\w+\s*\)
Debuggex Demo
Matching two to two commas will find signatures with 3 parameters.
Matchint two to five commas will find signatures with 3 upto 6 parameters.

First of all, JavaScript is not a regular language, as a result, one cannot use a regex to fully grasp the language, and thus there is a possibility that you will either accept false positives, or false negatives.
A regex that probably comes close is:
function(?:\s+\w+)*\s*\(([^),]*)(\s*,\s*[^),]*){2,}\)
The regex works as follows:
function searches for the function keyword.
next there is an optional group \s+\w+ this group is used to identify with the name of the function: it is possible to define an anonymous function with no name, so the group must be optional.
Next \s*\( there is an arbitrary number of space and a bracket to open the parameter list;
Now between the brackets, we start looking for the parameters. To cover (most) cases, we will define a parameter as [^,)]* (a sequence of characters not containing a comma nor the closed bracket).
Now for the next parameters, we need to skip a comma, this is enforced by the \s*,\s* pattern (\s* is actually unnecessary). Next again a group for a parameter name and of course we need to skip at least two commas.
Finally, an (optional) closing bracket.

You'd want to use function\s*\w+\s*\(\s*(\w+,?){3,10} to match non-anonymous (named) functions, and remove the \w+\s* to get function\s*\(\s*(\w+,?){3,10} for anonymous functions.
These can be combined to get function\s*(?:\w+\s*)?\(\s*(\w+,?){3,10} (the ?: is the non-capturing group)

Related

Javascript - how to use regex process the following complicated string

I have the following string that will occur repeatedly in a larger string:
[SM_g]word[SM_h].[SM_l] "
Notice in this string after the phrase "[SM_g]word[Sm_h]" there are three components:
A period (.) This could also be a comma (,)
[SM_l]
"
Zero to all three of these components will always appear after "[SM_g]word[SM_h]". However, they can also appear in any order after "[SM_g]word[SM_h]". For example, the string could also be:
[SM_g]word[SM_h][SM_l]"
or
[SM_g]word[SM_h]"[SM_l].
or
[SM_g]word[SM_h]".
or
[SM_g]word[SM_h][SM_1].
or
[SM_g]word[SM_h].
or simply just
[SM_g]word[SM_h]
These are just some of the examples. The point is that there are three different components (more if you consider the period can also be a comma) that can appear after "[SM_h]word[SM_g]" where these three components can be in any order and sometimes one, two, or all three of the components will be missing.
Not only that, sometimes there will be up to one space before " and the previous component/[SM_g]word[SM_h].
For example:
[SM_g]word[SM_h] ".
or
[SM_g]word[SM_h][SM_l] ".
etc. etc.
I am trying to process this string by moving each of the three components inside of the core string (and preserving the space, in case there is a space before &\quot; and the previous component/[SM_g]word[SM_h]).
For example, [SM_g]word[SM_h].[SM_l]" would turn into
[SM_g]word.[SM_l]"[SM_h]
or
[SM_g]word[SM_h]"[SM_l]. would turn into
[SM_g]word"[SM_l].[SM_h]
or, to simulate having a space before "
[SM_g]word[SM_h] ".
would turn into
[SM_g]word ".[SM_h]
and so on.
I've tried several combinations of regex expressions, and none of them have worked.
Does anyone have advice?
You need to put each component within an alternation in a grouping construct with maximum match try of 3 if it is necessary:
\[SM_g]word(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})
You may replace word with .*? if it is not a constant or specific keyword.
Then in replacement string you should do:
$1$3$2
var re = /(\[SM_g]word)(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})/g;
var str = `[SM_g]word[SM_h][SM_l] ".`;
console.log(str.replace(re, `$1$3$2`));
This seems applicable for your process, in other word, changing sub-string position.
(\[SM_g])([^[]*)(\[SM_h])((?=([,\.])|(\[SM_l])|( ?&\\?quot;)).*)?
Demo,,, in which all sub-strings are captured to each capture group respectively for your post processing.
[SM_g] is captured to group1, word to group2, [SM_h] to group3, and string of all trailing part is to group4, [,\.] to group5, [SM_l] to group6, " ?&\\?quot;" to group7.
Thus, group1~3 are core part, group4 is trailing part for checking if trailing part exists, and group5~7 are sub-parts of group4 for your post processing.
Therefore, you can get easily matched string's position changed output string in the order of what you want by replacing with captured groups like follows.
\1\2\7\3 or $1$2$7$3 etc..
For replacing in Javascript, please refer to this post. JS Regex, how to replace the captured groups only?
But above regex is not sufficiently precise because it may allow any repeatitions of the sub-part of the trailing string, for example, \1\2\3\5\5\5\5 or \1\2\3\6\7\7\7\7\5\5\5, etc..
To avoid this situation, it needs to adopt condition which accepts only the possible combinations of the sub-parts of the trailing string. Please refer to this example. https://regex101.com/r/6aM4Pv/1/ for the possible combinations in the order.
But if the regex adopts the condition of allowing only possible combinations, the regex will be more complicated so I leave the above simplified regex to help you understand about it. Thank you:-)

Custom Backbonejs route parameters

Consider this URL:
domains.google.com/registrar#t=b
note:
#t=b
In this example, the variable "t" stores the current tab on the page where "b" is for billing.
How can I achieve query like parameters in backbone as shown above?
I understand Backbone has routes that support parameters in urls, but this is limited to when the data is in a hierarchy, for example: item/:id
But what about application settings that would not work well in a directory like structure?
The only solution I can think of is a custom parser and break up the key/values myself.
Any ideas?
Expanding on #try-catch-finally's comment, I'm going to show you how to create your own route with a simple RegEx pattern that will match your conditions.
Here's the regex we'll use:
^\w+? # match one word (or at least a character) at the
# beginning of the route
[=] # until we parse a single 'equals' sign
( # then start saving the match inside the parenthesis
[a-zA-Z0-9]* # which is any combination of letters and numbers
) # and stop saving
Putting it all together the regex looks like: /^\w+?[=]([a-zA-Z0-9]*)/.
Now we set up our router,
var MyRouter = Backbone.Router.extend({
initialize: function(options) {
// Matches t=b, passing "b" to this.open
this.route(/^\w+?(?<=[=])(.*)/, "testFn");
},
routes: {
// Optional additional routes
},
testFn: function (id) {
console.log('id: ' + id );
}
});
var router = new MyRouter();
Backbone.history.start();
The TL;DR is that in the MyRouter.initialize we added a route that takes the regex above and invokes the MyRouter.testFn function. Any call to http://yourdomain.com#word1=word2 will call the MyRouter.testFn with the word after the parenthesis as a parameter. Of course, your word place setting could be a single character like in your question t=b.
Expanding your parameters
Say you want to pull multiple parameters, not just the one. The key to understanding how Backbone pulls your parameters is the capturing group (). A capturing group allows your to "save" the match defined within the parenthesis into variables local to the regex expression. Backbone uses these saved matches as the parameters it applies to the the route callback.
So if you want to capture two parameters in your route you'd use this regex:
^\w+?[=]([a-zA-Z0-9]*)[,]\w+?[=]([a-zA-Z0-9]*)
which simply says to expect a comma delimiter between the two parameter placeholders. It would match,
t=b,some=thing
More general route patterns
You can repeat the [,]\w+?[=]([a-zA-Z0-9]*) pattern as many times as you need. If you want to generalize the pattern, you cold use the non-capturing token (?: ... ) and do something like,
^\w+?[=]([a-zA-Z0-9]*)(?:[,]\w+?[=]([a-zA-Z0-9]*))?(?:[,]\w+?[=]([a-zA-Z0-9]*))?
The regex above will look for at least one match and will optionally take two more matches. By placing a ? token at the end of the (?: ... ) group, we say the pattern in the parenthesis may be found zero or one times (i.e. it may or may not be there). This allows you to set a route when you know you can expect up to 3 parameters, but sometimes you may want only one or two.
Is there a truly general route?
You may be asking yourself, why not simply use one greedy (?: ... ) group that will allow an unlimited number of matches. Something like,
^\w+?[=]([a-zA-Z0-9]*)(?:[,]\w+?[=]([a-zA-Z0-9]*))*
With this regex pattern you must supply one parameter, but you can take an unlimited number of subsequent matches. Unfortunately, while the regex will work fine, you won't get the desired result. (See, for example, this Question.)
That's a limitation of JavaScript. With repeating capturing-groups (i.e. the ([a-zA-Z0-9]*) capturing-group will repeat with every repetition of the (?: ... ) non-capturing-group) JavaScript only saves the last match. So if you pass a route like t=b,z=d,g=f,w=1ee, you'll only save 1ee. So, unfortunately you have to have an idea of what the maximum number of parameters your route should take, and manually code them into your regex pattern like we did above.

Capturing optional part of URL with RegExp

While writing an API service for my site, I realized that String.split() won't do it much longer, and decided to try my luck with regular expressions. I have almost done it but I can't find the last bit. Here is what I want to do:
The URL represents a function call:
/api/SECTION/FUNCTION/[PARAMS]
This last part, including the slash, is optional. Some functions display a JSON reply without having to receive any arguments. Example: /api/sounds/getAllSoundpacks prints a list of available sound packs. Though, /api/sounds/getPack/8Bit prints the detailed information.
Here is the expression I have tried:
req.url.match(/\/(.*)\/(.*)\/?(.*)/);
What am I missing to make the last part optional - or capture it in whole?
This will capture everything after FUNCTION/ in your URL, independent of the appearance of any further / after FUNCTION/:
FUNCTION\/(.+)$
The RegExp will not match if there is no part after FUNCTION.
This regex should work by making last slash and part after optional:
/^\/[^/]*\/[^/]*(?:\/.*)?$/
This matches all of these strings:
/api/SECTION/FUNCTION/abc
/api/SECTION
/api/SECTION/
/api/SECTION/FUNCTION
Your pattern /(.*)/(.*)/?(.*) was almost correct, it's just a bit too short - it allows 2 or 3 slashes, but you want to accept anything with 3 or 4 slashes. And if you want to capture the last (optional) slash AND any text behind it as a whole, you simply need to create a group around that section and make it optional:
/.*/.*/.*(?:/.+)?
should do the trick.
Demo. (The pattern looks different because multiline mode is enabled, but it still works. It's also a little "better" because it won't match garbage like "///".)

Syntax in Mozilla docs/ECMAScript specs

You often see this kind of syntax for describing methods:
Math.max([value1[,value2, ...]])
Function.prototype.call (thisArg [ , arg1 [ , arg2, … ] ] )
Why are parameters denoted like this using brackets and leading commas?
Brackets are used for argument specifications to indicate that the argument is optional.
This likely comes from the format used in UNIX/Linux man pages (although they may have borrowed that syntax from some other earlier source for all I know). The man page on man-pages has a description of how arguments should be represented (emphasis mine):
SYNOPSIS briefly describes the command or function's interface.
For commands, this shows the syntax of the command and
its arguments (including options); boldface is used for
as-is text and italics are used to indicate replaceable
arguments. Brackets ([]) surround optional arguments,
vertical bars (|) separate choices, and ellipses (...)
can be repeated.
Javascript doesn't have strict requirements for function parameters. functions can have as many parameters as you want to put in them. You can call a function that only has two parameters with 3 parameters and javascript will ignore the ones that aren't noted.
However, the order of the variables is still important. In other words, you can't use just the first and third parameters, if you want to use the third parameter you have to specify something for the second.
the square brackets mean the parameter is not required. The comma is just to tell you that you need a comma if you are going to specify a parameter there.

The space character as a punctuator in JavaScript

In chapter 7.7 (Punctuators) of the ECMAScript spec ( http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-262.pdf ) the grid of punctuators appears to have a gap in row 3 of the last column. This is in fact the space character punctuator, correct?
I understand that space characters may be inserted optionally between tokens in the JavaScript code (in order to improve readability), however, I was wondering where they are actually required...
In order to find this out, I searched for space characters in the minified version of the jQuery library. These are my results:
A space is required... (see Update below)
... between a keyword and an identifier:
function x(){}
var x;
return x;
typeof x;
new X();
... between two keywords:
return false;
if(x){}else if(y){}else{}
These are the two cases that I identified. Are there any other cases?
Note: Space characters inside string literals are not regarded as punctuator tokens (obviously).
Update: As it turns out, a space character is not required in those cases. For example a keyword token and a identifier token have to be seperated by something, but that something does not have to be a space character. It could be any input element which is not a token (WhiteSpace, LineTerminator or Comment).
Also... It seems that the space character is regarded as a WhiteSpace input element, and not a token at all, which would mean that it's not a punctuator.
Update (2021): The spec is much clearer now, and space is definitely not in the list of punctuators. Space is whitespace, which is covered in the White Space section.
Answer from 2010:
I don't think that gap is meant to be a space, no, I think it's just a gap (an unfortunate one). If they really meant to be listing a space, I expect they'd use "Whitespace" as they have elsewhere in the document. But whitespace as a punctuator doesn't really make sense.
I believe spaces (and other forms of whitespace) are delimiters. The spec sort of defines them by omission rather than explicitly. The space is required between function and x because otherwise you have the token functionx, which is not of course a keyword (though it could be a name token — e.g., a variable, property, or function name).
You need delimiters around some tokens (Identifiers and ReservedWords), because that's how we recognize where those tokens begin and end — an IdentifierName starts with an IdentifierStart followed by zero or more IdentifierParts, a class which doesn't include whitespace or any of the characters used for punctuators. Other tokens (Punctuators for instance) we can recognize without delimiters. I think that's about it, and so your two rules are pretty much just two examples of the same rule: IdentifierNames must be delimited (by whitespace, by punctuators, by beginning or end of file, ...).
Somewhat off-topic, but of course not all delimiters are equal. Line-breaking delimiters are sometimes treated specially by the grammar for the horror that is "semicolon insertion".
Whitespaces are not required in any of these cases. You just have to write a syntax that is understandable for the parser. In other words: the machine has to know whether you're using a keyword like function or new or just defining another variable like newFunction.
Each keyword has to be delimited somehow - whitespaces are the most sensible and readable, however they can be replaced:
return/**/false;
return(false);
This is just a guess, but I would say that spaces aren't actually required anywhere. They are used just as one of many alternatives to generate word boundaries between keywords. This means you could just as well replace them with other characters.
If what you want is to remove the unnecessary spaces from some code I would say that spaces (white-space to be more exact, tabs will work just as well) are mandatory only where there are no other means of separating keywords and/or variable identifiers. I.e. where by removing the white-space you no longer have the same keywords and identifiers in the resulting code.
What follows is not exactly relevant to your needs but you may find it interesting. You can write you examples so that they no longer have those spaces. I hope none of the examples are wrong.
x=function(){} instead of function x(){}
this.x=null; instead of var x;
return(x); instead of return x;
typeof(x); instead of typeof x;
y=X(); instead of y = new X();
return(false) instead of return false
if(x){}else{if(y){}else{}} instead of if(x){}else if(y){}else{}

Categories