regex capture bind parameters following a colon

regex capture bind parameters following a colon - javascript

I'm trying to match substrings that follow colons : without matching the colon as well. It should be really simple. Given
select * from table where name=:name, id = :id order by :order_by limit :limit
it should match
name
id
order_by
limit
However, it's matching
:name
:id
:order_by
:limit
The regex I'm using is
:([a-zA-Z0-9_]+)
but I've also tried
(?::)([a-zA-Z0-9_]+)
according to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-non-capturing-parentheses.
Can someone help me?

Your regexes should work. Assuming you are using JavaScript, you can collect you findings like this in the matches array:
var myRe = /:(\w+)/g;
var str = "select * from table where name=:name, id = :id order by :order_by limit :limit";
var matches = [];
var myArray;
while ((myArray = myRe.exec(str)) !== null) {
matches.push(myArray[1]);
}
See: http://jsfiddle.net/6CB5Y/1/
myArray is an array containing the whole match (e.g. ':name') and all its parenthesized substring matches, if any (e.g. 'name'). So use myArray[1] to just collect the parenthesized match.

The non-capturing parentheses are still used to the form the $0 or full match, i.e.
:name
+---+ 0
+--+ 1
You probably want to perform a replacement on those place holders, so I would solve the lack of look-behind by using a replacement function:
var bound = {
name: 'test',
id: 'world',
order_by: 'col',
limit: 123
},
sql = 'select * from table where name=:name, id = :id order by :order_by limit :limit';
sql.replace(/:(\w+)/g, function($0, $1) {
// TODO apply escaping
return bound[$1]; // perform lookup using 'name', 'id', etc.
});
// "select * from table where name=test, id = world order by col limit 123"

You just need a "look behind":
(?<=:)\w+
A look behind asserts, but does not capture, what comes before the match.
Note the the whole match (not group 1 as in your regex) is your target.
Also notice the simplification: \w means exactly [a-zA-Z0-9_] .
See this regex running on rubular

Related

How do I make my code concise and short using Regex Expressions

I'm trying to make the code a lot cleaner and concise. The main goal I want to do is to change the string to my requirements .
Requirements
I want to remove any empty lines (like the one in the middle of the two sentences down below)
I want to remove the * in front of each sentence, if there is.
I want to make the first letter of each word capital and the rest lowercase (except words that have $ in front of it)
This is what I've done so far:
const string =
`*SQUARE HAS ‘NO PLANS’ TO BUY MORE BITCOIN: FINANCIAL NEWS
$SQ
*$SQ UPGRADED TO OUTPERFORM FROM PERFORM AT OPPENHEIMER, PT $185`
const nostar = string.replace(/\*/g, ''); // gets rid of the * of each line
const noemptylines = nostar.replace(/^\s*[\r\n]/gm, ''); //gets rid of empty blank lines
const lowercasestring = noemptylines.toLowerCase(); //turns it to lower case
const tweets = lowercasestring.replace(/(^\w{1})|(\s{1}\w{1})/g, match => match.toUpperCase()); //makes first letter of each word capital
console.log(tweets)
I've done most of the code, however, I want to keep words that have $ in front of it, capital, which I don't know how to do.
Furthermore, I was wondering if its possible to combine regex expression, so its even shorter and concise.

You could make use of capture groups and the callback function of replace.
^(\*|[\r\n]+)|\$\S*|(\S+)
^ Start of string
(\*|[\r\n]*$) Capture group 1, match either * or 1 or more newlines
| Or
\$\S* Match $ followed by optional non whitespace chars (which will be returned unmodified in the code)
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
Regex demo
const regex = /^(\*|[\r\n]+)|\$\S*|(\S+)/gm;
const string =
`*SQUARE HAS ‘NO PLANS’ TO BUY MORE BITCOIN: FINANCIAL NEWS
$SQ
*$SQ UPGRADED TO OUTPERFORM FROM PERFORM AT OPPENHEIMER, PT $185`;
const res = string.replace(regex, (m, g1, g2) => {
if (g1) return ""
if (g2) {
g2 = g2.toLowerCase();
return g2.toLowerCase().charAt(0).toUpperCase() + g2.slice(1);
}
return m;
});
console.log(res);

Making it readable is more important than making it short.
const tweets = string
.replace(/\*/g, '') // gets rid of the * of each line
.replace(/^\s*[\r\n]/gm, '') //gets rid of empty blank lines
.toLowerCase() //turns it to lower case
.replace(/(^\w{1})|(\s{1}\w{1})/g, match => match.toUpperCase()) //makes first letter of each word capital
.replace(/\B\$(\w+)\b/g, match => match.toUpperCase()); //keep words that have $ in front of it, capital

Extract a part of a regex name

Examples of filenames
FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_fr-fr-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
FDIP_de-de-nn_Text_v1_YYYYMMDD_SequenceNumber.txt
REGEX is FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
The only part I need is the translation code which is 'en-gb', 'fr-fr' , 'de-de.
How do I extract just that part of the filename?

Modified the regex little bit to match the numbers and text. You can play around here
Explanation
to capture a group you need to wrap the regex into () this will capture as a group.
to do the named capturing you can (?<name_of_group>) and then you can access by name.
Here goes the matching process.
[a-z]{2} match 2 char from a-z
[a-zA-Z0-9] match any char of a-z or A-Z or 0-9
g means global flag i.e. match all.
i means ignore case.
var r = /FDIP_([a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi;
let t = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
let dd = r.exec(t);
console.log(dd[1]);
This is example of group capturing
See the name in the regex and the object destructing name is matching.
const { groups: { language } } = /FDIP_(?<language>[a-z]{2}-[A-Z]{2})-[a-z]{2}_Text_v1_[0-9A-Z]{8}_[A-Z0-9]{14}.txt/gi.exec('FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt');
console.log(language);

To solve your problem, you should:
Fix your regex:
FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
// to
FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt
Use get value from first group by using regex.exec function
const fileNames = [
'FDIP_en-gb-nn_Text_v1_20190101_12345678901234.txt',
'FDIP_fr-fr-nn_Text_v1_20200202_12345678901234.txt',
'FDIP_de-de-nn_Text_v1_20180808_12345678901234.txt']
const cultureNames = fileNames.map(name => {
const matched = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[0-9]{8}_[0-9]{14}.txt/.exec(name)
return matched && matched[1]
})
console.log(cultureNames)

Change FDIP_([a-z]{2}-[A-Z]{2}-[a-z]{2})_Text_v1_[0-9]{8}_[0-9]{14}.txt
to
let pattern = /FDIP_([a-z]{2}-[a-z]{2})-[a-z]{2}_Text_v1_[\w]{8}_[\w]{14}.txt/;
var str = 'FDIP_en-gb-nn_Text_v1_YYYYMMDD_SequenceNumber.txt';
console.log(str.match(pattern)[1]);

getting values from a string using regular expression

Could anyone help me with this regular expression issue?
expr = /\(\(([^)]+)\)\)/;
input = ((111111111111))
the one I would need to be working is = ((111111111111),(222222222),(333333333333333))
That expression works fine to get 111111 from (input) , but not when there are also the groups 2222... and 3333.... the input might be variable by variable I mean could be ((111111111111)) or the one above or different (always following the same parenthesis pattern though)
Is there any reg expression to extract the values for both cases to an array?
The result I would like to come to is:
[0] = "111111"
[1] = "222222"
[2] = "333333"
Thanks

If you are trying to validate format while extracting desired parts you could use sticky y flag. This flag starts match from beginning and next match from where previous match ends. This approach needs one input string at a time.
Regex:
/^\(\(([^)]+)\)|(?!^)(?:,\(([^)]+)\)|\)$)/yg
Breakdown:
^\(\( Match beginning of input and immedietly ((
( Start of capturing group #1
[^)]+ Match anything but )
)\) End of CG #1, match ) immediately
| Or
(?!^) Next patterns shouldn't start at beginning
(?: Start of non-capturing group
,\(([^)]+)\) Match a separetd group (capture value in CG #2, same pattern as above)
| Or
\)$ Match ) and end of input
) End of group
JS code:
var str = '((111111111111),(222222222),(333333333333333))';
console.log(
str.replace(/^\(\(([^)]+)\)|(?!^)(?:,\(([^)]+)\)|\)$)/yg, '$1$2\n')
.split(/\n/).filter(Boolean)
);

You can replace brackes with , split it with , and then use substring to get the required number of string characters out of it.
input.replace(/\(/g, '').replace(/\)/g, '')
This will replace all the ( and ) and return a string like
111111111111,222222222,333333333333333
Now splitting this string with , will result into an array to what you want
var input = "((111111111111),(222222222),(333333333333333))";
var numbers = input.replace(/\(/g, '').replace(/\)/g, '')
numbers.split(",").map(o=> console.log(o.substring(0,6)))

If the level of nesting is fixed, you can just leave out the outer () from the pattern, and add the left parentheses to the [^)] group:
var expr = /\(([^()]+)\)/g;
var input = '((111111111111),(222222222),(333333333333333))';
var match = null;
while(match = expr.exec(input)) {
console.log(match[1]);
}

RegExp doesn't work fine

I'm working on a template engine, I try to catch all strings inside <% %>, but when I work it on the <%object.property%> pattern, everything fails.
My code:
var render = function(input, data){
var re = /<%([^%>]+)?%>/g;
var templateVarArray;
// var step = "";
while((templateVarArray = re.exec(input))!=null){
var strArray = templateVarArray[1].split(".");
// step+= templateVarArray[1]+" ";
if(strArray.length==1)
input = input.replace(templateVarArray[0], data[templateVarArray[1]]);
if(strArray.length==2){
input = input.replace(templateVarArray[0], data[strArray[0]][strArray[1]]);
}
}
// return step;
return input;
}
var input = "<%test.child%><%more%><%name%><%age%>";
document.write(render(input,{
test: { child: "abc"},
more: "MORE",
name:"ivan",
age: 22
}));
My result:
abc<%more%><%name%>22
what I want is: abc MORE ivan 22
Also, the RegExp /<%([^%>]+)?%>/g is referenced online, I did search its meaning, but still quite not sure the meaning. Especially why does it need "+" and "?", thanks a lot!

If you add a console.log() statement it will show where the next search is going to take place:
while((templateVarArray = re.exec(input))!=null){
console.log(re.lastIndex); // <-- insert this
var strArray = templateVarArray[1].split(".");
// step+= templateVarArray[1]+" ";
if(strArray.length==1)
input = input.replace(templateVarArray[0], data[templateVarArray[1]]);
if(strArray.length==2){
input = input.replace(templateVarArray[0], data[strArray[0]][strArray[1]]);
}
}
You will see something like:
14
26
This means that the next time you run re.exec(...) it will start at index 14 and 26 respectively. Consequently, you miss some of the matches after you substitute data in.
As #Alexander points out take the 'g' off the end of the regex. Now you will see something like this:
0
0
This means the search will start each time from the beginning of the string, and you should now get what you were looking for:
abcMOREivan22
Regarding your questions on the RegEx and what it is doing, let's break the pieces apart:
<% - this matches the literal '<' followed immediately by '%'
([^%>]+) - the brackets (...) indicate we want to capture the portion of the string that matches the expression within the brackets
[^...] - indicates to match anything except what follows the '^'; without the '^' would match whatever pattern is within the []
[^%>] - indicates to match and exclude a single character - either a '%' or '>'
[^%>]+ - '+' indicates to match one or more; in other words match one or more series of characters that is not a '%' and not a '>'
? - this indicates we want to do reluctant matching (without it we do what is called 'greedy' matching)
%> - this matches the literal '%' followed immediately by '>'
The trickiest part to understand is the '?'. Used in this context it means that we stop matching with the shortest pattern that will still match the overall regex. In this case, it doesn't make any difference whether you include it though there are times where it will matter depending on the matching patterns.
Suggested Improvement
The current logic is limited to data that nests two levels deep. To make it so it can handle an arbitrary nesting you could do this:
First, add a small function to do the substitution:
var substitute = function (str, data) {
return str.split('.').reduce(function (res, item) {
return res[item];
}, data);
};
Then, change your while loop to look like this:
while ((templateVarArray = re.exec(input)) != null) {
input = input.replace(templateVarArray[0], substitute(templateVarArray[1], data));
}
Not only does it handle any number of levels, you might find other uses for the 'substitute()' function.

The RegExp.prototype.exec() documentation says:
If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property).
But you are replacing each match in the original string so next re.exec with a lastIndex already set not to zero will continue to search not from beginning and will omit something.
So if you want to search and substitute found results in original string - just omit \g global key:
var render = function(input, data) {
var re = /<%([^%>]+)?%>/;
var templateVarArray;
// var step = "";
while (!!(templateVarArray = re.exec(input))) {
var strArray = templateVarArray[1].split(".");
if (strArray.length == 1)
input = input.replace(templateVarArray[0], data[templateVarArray[1]]);
if (strArray.length == 2) {
input = input.replace(templateVarArray[0], data[strArray[0]][strArray[1]]);
}
}
// return step;
return input;
}
var input = "<%test.child%><%more%><%name%><%age%>";
document.write(render(input, {
test: {
child: "abc"
},
more: "MORE",
name: "ivan",
age: 22
}));

RegExp capturing group in capturing group

I want to capture the "1" and "2" in "http://test.com/1/2". Here is my regexp /(?:\/([0-9]+))/g.
The problem is that I only get ["/1", "/2"]. According to http://regex101.com/r/uC2bW5 I have to get "1" and "1".
I'm running my RegExp in JS.

You have a couple of options:
Use a while loop over RegExp.prototype.exec:
var regex = /(?:\/([0-9]+))/g,
string = "http://test.com/1/2",
matches = [];
while (match = regex.exec(string)) {
matches.push(match[1]);
}
Use replace as suggested by elclanrs:
var regex = /(?:\/([0-9]+))/g,
string = "http://test.com/1/2",
matches = [];
string.replace(regex, function() {
matches.push(arguments[1]);
});

In Javascript your "match" has always an element with index 0, that contains the WHOLE pattern match. So in your case, this index 0 is /1 and /2 for the second match.
If you want to get your DEFINED first Matchgroup (the one that does not include the /), you'll find it inside the Match-Array Entry with index 1.
This index 0 cannot be removed and has nothing to do with the outer matching group you defined as non-matching by using ?:
Imagine Javascript wrapps your whole regex into an additional set of brackets.
I.e. the String Hello World and the Regex /Hell(o) World/ will result in :
[0 => Hello World, 1 => o]

We Keep Coding

JavaScript is the programming language of the Web.

regex capture bind parameters following a colon - javascript

You just need a "look behind": (?<=:)\w+ A look behind asserts, but does not capture, what comes before the match. Note the the whole match (not group 1 as in your regex) is your target. Also notice the simplification: \w means exactly [a-zA-Z0-9_] . See this regex running on rubular

Related

How do I make my code concise and short using Regex Expressions

Extract a part of a regex name

getting values from a string using regular expression

RegExp doesn't work fine

RegExp capturing group in capturing group

Categories

Resources