Only Tokens in Regex in Javascript - javascript

I'm using JavaScript for parsing a string that looks as follow:
var myString = "unimportant:part.one:unimportant:part.two:unimportant:part.three";
var regex = /\w*:(part.\w*)./gi
How can I put only the highlighted part within the parenthesis in an array?
var myArray = myString.match(regex); gives me the whole line.

In your pattern \w*:(part.\w*). the leading \w* is optional so if you want to match at least a single word character you can just use \w
After the capture group there is a dot . which matches any character so you will miss the last character at the end of the string as the character is mandatory.
Note to escape the dot \. if you want to match it literally
The pattern can look like:
\w:(part\.\w+)
Then getting the capture group 1 values into an array using matchAll
const myString = "unimportant:part.one:unimportant:part.two:unimportant:part.three";
const regex = /\w:(part\.\w+)/gi;
const result = Array.from(myString.matchAll(regex), m => m[1])
console.log(result)
Or without a capture group using a lookbehind if that is supported using match
const myString = "unimportant:part.one:unimportant:part.two:unimportant:part.three";
const regex = /(?<=\w:)part\.\w+/gi;
console.log(myString.match(regex))

Related

Regex Capture Character and Replace with another

Trying to replace the special characters preceded by digits with dot.
const time = "17:34:12:p. m.";
const output = time.replace(/\d+(.)/g, '.');
// Expected Output "17.34.12.p. m."
console.log(output);
I had wrote the regex which will capture any character preceded by digit/s. The output is replacing the digit too with the replacement. Can someone please help me to figure out the issue?
You can use
const time = "17:34:12:p. m.";
const output = time.replace(/(\d)[\W_]/g, '$1.');
console.log(output);
The time.replace(/(\d)[\W_]/g, '$1.') code will match and capture a digit into Group 1 and match any non-word or underscore chars, and the $1. replacement will put the digit back and replace : with ..
If you want to "subtract" whitespace pattern from [\W_], use (?:[^\w\s]|_).
Consider checking more special character patterns in Check for special characters in string.
You should look for non word(\w) and non spaces (\s) characters and replace them with dot.
You should use some live simulator for regular expressions. For example regex101: https://regex101.com/r/xIStHH/1
const time = "17:34:12:p. m.";
const output = time.replace(/[^\w\s]/g, '.');
// Expected Output "17.34.12.p. m."
console.log(output);

Javascript - Regex - how to filter characters that are not part of regex

I want to accept words and some special characters, so if my regex
does not fully match, let's say I display an error,
var re = /^[[:alnum:]\-_.&\s]+$/;
var string = 'this contains invalid chars like ##';
var valid = string.test(re);
but now I want to "filter" a phrase removing all characters not matching the regex ?
usualy one use replace, but how to list all characters not matching the regex ?
var validString = string.filter(re); // something similar to this
how do I do this ?
regards
Wiktor Stribiżew solution works fine :
regex=/[^a-zA-Z\-_.&\s]+/g;
let s='some bloody-test #rfdsfds';
s = s.replace(/[^\w\s.&-]+/g, '');
console.log(s);
Rajesh solution :
regex=/^[a-zA-Z\-_.&\s]+$/;
let s='some -test #rfdsfds';
s=s.split(' ').filter(x=> regex.test(x));
console.log(s);
JS regex engine does not support POSIX character classes like [:alnum:]. You may use [A-Za-z0-9] instead, but only to match ASCII letters and digits.
Your current regex matches the whole string that contains allowed chars, and it cannot be used to return the chars that are not matched with [^a-zA-Z0-9_.&\s-].
You may remove the unwanted chars with
var s = 'this contains invalid chars like ##';
var res = s.replace(/[^\w\s.&-]+/g, '');
var notallowedchars = s.match(/[^\w\s.&-]+/g);
console.log(res);
console.log(notallowedchars);
The /[^\w\s.&-]+/g pattern matches multiple occurrences (due to /g) of any one or more (due to +) chars other than word chars (digits, letters, _, matched with \w), whitespace (\s), ., & and -.
To match all characters that is not alphanumeric, or one of -_.& move ^ inside group []
var str = 'asd.=!_#$%^&*()564';
console.log(
str.match(/[^a-z0-9\-_.&\s]/gi),
str.replace(/[^a-z0-9\-_.&\s]/gi, '')
);

Find multiple values in square brackets (and preceding text) with JavaScript regex

Given a string like this:
"fieldNameA[fieldValueA] fieldNameB[fieldValueB] fieldNameC[fieldValueC]"
I'm looking for a regular expression (or two) that will allow me to construct a set of key value pairs e.g.
fieldNameA : "fieldValueA"
fieldNameB : "fieldValueB"
fieldNameC : "fieldValueC"
Values will always be in square brackets.
Square brackets will always be preceded by the field name
Field names could vary in length
I tried something like:
const reg = new RegExp("fieldNameA|fieldNameB|fieldNameC\[(.*?)\]", "gi");
const matches = reg.exec(query);
But this doesn't work. I just get an array with one value: "fieldNameA"
You may use the following solution:
var query = "fieldNameA[fieldValueA] fieldNameB[fieldValueB] fieldNameC[fieldValueC]";
var matches = {}, m;
var reg = /(\w+)\[(.*?)]/gi;
while(m=reg.exec(query)) {
matches[m[1]] = m[2];
}
console.log(matches);
You need to use a regex literal, or "\[" will turn into "[" and will no longer match a literal [.
Also, to match all occurrences, you need to run a RegExp#exec in a loop.
Pattern details:
(\w+) - Group 1: one or more word chars (if you want to match an identifier that cannot start with a digit, replace with [_a-zA-Z]\w*)
\[ - a literal [
(.*?) - Group 2: any 0+ chars other than line break chars (if you need to match any char but ], use [^\]]*)
] - a literal ] char.

How do I convert a PHP regex with a lookbehind to Javascript?

Javascript doesn't support lookbehinds in regexes. How do I convert the following PHP regex to Javascript?
regPattern="(?<!\\)\\x"
Here is the test case (in Node.js):
var str = '{"key":"abc \\x123 \xe2\x80\x93 xyz"}'
var newStr = str.replace(/regPattern/g, '\\u')
console.log(newStr); // output: '{"key":"abc \\x123 \ue2\u80\u93 xyz"}'
\\x123 doesn't match because it contains \\x, but \x matches.
Try this:
var newStr = str.replace(/([^\\]|^)\\x/g, '$1\\u');
In other words, match the ^ (start of string) or any non-\ character, followed by \x, capturing the first character in capture group 1.
Then replace the whole 3-character matched group with capture group 1, followed by \u.
For example, in abc?\x, the string ?\x will be matched, and capture group 1 will be ?. So we replace the match (?\x) with $1\u, which evaluates to ?\u. So abc?\x -> abc?\u.

Javascript reg exp not right

Here is a string str = '.js("aaa").js("bbb").js("ccc")', I want to write a regular expression to return an Array like this:
[aaa, bbb, ccc];
My regular expression is:
var jsReg = /.js\(['"](.*)['"]\)/g;
var jsAssets = [];
var js;
while ((js = jsReg.exec(find)) !== null) {
jsAssets.push(js[1]);
}
But the jsAssets result is
[""aaa").js("bbb").js("ccc""]
What's wrong with this regular expression?
Use the lazy version of .*:
/\.js\(['"](.*?)['"]\)/g
^
And it would be better if you escape the first dot.
This will match the least number of characters until the next quote.
jsfiddle demo
If you want to allow escaped quotes, use something like this:
/\.js\(['"]((?:\\['"]|[^"])+)['"]\)/g
regex101 demo
I believe it can be done in one-liner with replace and match method calls:
var str = '.js("aaa").js("bbb").js("ccc")';
str.replace(/[^(]*\("([^"]*)"\)[^(]*/g, '$1,').match(/[^,]+/g);
//=> ["aaa", "bbb", "ccc"]
The problem is that you are using .*. That will match any character. You'll have to be a bit more specific with what you are trying to capture.
If it will only ever be word characters you could use \w which matches any word character. This includes [a-zA-Z0-9_]: uppercase, lowercase, numbers and an underscore.
So your regex would look something like this :
var jsReg = /js\(['"](\w*)['"]\)/g;
In
/.js\(['"](.*)['"]\)/g
matches as much as possible, and does not capture group 1, so it matches
"aaa").js("bbb").js("ccc"
but given your example input.
Try
/\.js\(('(?:[^\\']|\\.)*'|"(?:[\\"]|\\.)*"))\)/
To break this down,
\. matches a literal dot
\.js\( matches the literal string ".js("
( starts to capture the string.
[^\\']|\\. matches a character other than quote or backslash or an escaped non-line terminator.
(?:[\\']|\\.)* matches the body of a string
'(?:[\\']|\\.)*' matches a single quoted string
(...|...) captures a single quoted or double quoted string
)\) closes the capturing group and matches a literal close parenthesis
The second major problem is your loop.
You're doing a global match repeatedly which makes no sense.
Get rid of the g modifier, and then things should work better.
Try this one - http://jsfiddle.net/UDYAq/
var str = new String('.js("aaa").js("bbb").js("ccc")');
var regex = /\.js\(\"(.*?)\"\){1,}/gi;
var result = [];
result = str.match (regex);
for (i in result) {
result[i] = result[i].match(/\"(.*?)\"/i)[1];
}
console.log (result);
To be sure that matched characters are surrounded by the same quotes:
/\.js\((['"])(.*?)\1\)/g

Categories