Match words and full stops, but not the trailing full stop - javascript

I am trying to match a string of the type $word1.word2.word3, which contain dots inside, but should not end with a dot.
In other words:
$context.abc.value, $context.abc.value.random() - should match full string
$context.abc.value. - should match everything except the last character (dot).
My regex for now is:
(?:^|\s)\$(?!\d)[\w.\[\]\(\)]+
Here's a fiddle to play with: https://regex101.com/r/PxCtUv/1
How can I avoid matching the trailing dot character?

You may "decompose" the last [a.]+ pattern into [a.]*[a]:
(?:^|\s)\$(?!\d)[\w.[\]()]*[\w[\]()]
^^^^^^^^^^^^^^^^^^^^
See the regex demo.
Details
(?:^|\s) - a non-capturing group matching either start of string (^) or (|) a whitespace (\s)
\$ - a $ char
(?!\d) - a negative lookahead that fails the match if there is a digit right after the $ char
[\w.[\]()]* - zero or more word, ., [, ], ( or ) chars
[\w[\]()] - a word, ., [, ], ( or ) char.

Use the following one:
\$(?!\d)[\w]+([.]{0,1}[\w()]+)+
You can try it the following way: https://regex101.com/r/2ONUHj/1
This will not match the whitespaces before the $ sign.
it will allow only one .
There could be a lot of other edge cases not defined here.
For further details use the explanation on https://regexr.com/.

You could do it without using regex, if you wanted to:
const data = [
'context.abc.value',
'$context.abc.value.random()',
'$context.abc.value.'
];
const filtered = data.filter(item => Array.from(item).pop() !== '.');
// [ 'context.abc.value', '$context.abc.value.random()' ]

Related

What is the regex to match alphanumeric 6 character words, separated by space or comma

I am newbie in RegEx and trying to design a RegEx which could match the String like below:
pattern 1 separated by comma and a space: KEPM39, JEMGH5, HEPM21 ... (repeat)
pattern 2 separated only by a space: KEPM39 JEMGH5 HEPM21 ... (repeat)
pattern 3 separated only by a comma: KEPM39,JEMGH5,HEPM21 ... (repeat)
this is my concept: "^[a-zA-Z0-9]{6,}[,\s]+$" but it seems wrong.
#I want to validate the whole string, and I use javascript & html to validate user input. (textarea)
#duplicate change to repeat to be more suitable.
function validate(){
var term = "JEPM34, KEPM11 ";
var re = new RegExp("^[a-zA-Z0-9]{6,}[,\s]+$");
if (re.test(term)) {
return true
} else {
return false
}
}
thanks you in advance!
A very loose way to validate could be:
^[A-Z\d]{6}(?:[ ,]+[A-Z\d]{6})*$
See the online demo. With loose, I meant that [ ,]+ is not checking that each delimiter in your string is the same per definition. Therefor even "KEPM39, JEMGH5 HEPM21, HEGD44 ZZZZZZ" would be valid.
If you want consistent delimiters, and there can be trailing spaces (as there is in your example data) you can use a capture group with a backreference \1 to keep consistent delimiters and match optional spaces at the end.
Note that you can also use \s but that could also match a newline.
Using test will return a boolean, so you don't have to use return true or false but you can return the result test`
^[A-Z\d]{6}(?:(, ?| )(?:[A-Z\d]{6}\1)*[A-Z\d]{6} *)?$
The pattern matches:
^ Start of string
[A-Z\d]{6} Match 6 occurrences of a char A-Z or a digit
(?: Non capture group to match as a whole
(, ?| ) Capture group 1, match either a comma and optional space, or a space to be used as a backreference
(?:[A-Z\d]{6}\1)* Optionally repeat any of the listed followed by a backreference \1 to group 1 which will match the same delimiter
[A-Z\d]{6} * Match any of the listed and optional spaces at the end
)? Close the group and make it optional to also match an instance without delimiters
$ End of string
Regex demo
const regex = /^[A-Z\d]{6}(?:(, ?| )(?:[A-Z\d]{6}\1)*[A-Z\d]{6} *)?$/;
const validate = term => regex.test(term);
[
"KEPM39, JEMGH5, HEPM21",
"KEPM39 JEMGH5 HEPM21",
"KEPM39,JEMGH5,HEPM21",
"JEPM34, KEPM11 ",
"JEPM34, KEPM11",
"JEPM34",
"KEPM39, JEMGH5 HEPM21, HEGD44 ZZZZZZ",
"KEPM39, JEMGH5 HEPM21"
].forEach(s =>
console.log(`${s} ==> ${validate(s)}`)
);

Regular expression capture with optional trailing underscore and number

I'm trying to find a regular expression that will match the base string without the optional trailing number (_123). e.g.:
lorem_ipsum_test1_123 -> capture lorem_ipsum_test1
lorem_ipsum_test2 -> capture lorem_ipsum_test2
I tried using the following expression, but it would only work when there is a trailing _number.
/(.+)(?>_[0-9]+)/
/(.+)(?>_[0-9]+)?/
Similarly, adding the ? (zero or more) quantifier only worked when there is no trailing _number, otherwise, the trailing _number would just be part of the first capture.
Any suggestions?
You may use the following expression:
^(?:[^_]+_)+(?!\d+$)[^_]+
^ Anchor beginning of string.
(?:[^_]+_)+ Repeated non capturing group. Negated character set for anything other than a _, followed by a _.
(?!\d+$) Negative lookahead for digits at the end of the string.
[^_]+ Negated character set for anything other than a _.
Regex demo here.
Please note that the \n in the character sets in the Regex demo are only for demonstration purposes, and should by all means be removed when using as a pattern in Javascript.
Javascript demo:
var myString = "lorem_ipsum_test1_123";
var myRegexp = /^(?:[^_]+_)+(?!\d+$)[^_]+/g;
var match = myRegexp.exec(myString);
console.log(match[0]);
var myString = "lorem_ipsum_test2"
var myRegexp = /^(?:[^_]+_)+(?!\d+$)[^_]+/g;
var match = myRegexp.exec(myString);
console.log(match[0]);
You might match any character and use a negative lookahead that asserts that what follows is not an underscore, one or more digits and the end of the string:
^(?:(?!_\d+$).)*
Explanation
^ Assert start of the string
(?: Non capturing group
(?! Negative lookahead to assert what is on the right side is not
_\d+$Match an underscore, one or more digits and assert end of the string
.) Match any character and close negative lookahead
)* Close non capturing group and repeat zero or more times
Regex demo
const strings = [
"lorem_ipsum_test1_123",
"lorem_ipsum_test2"
];
let pattern = /^(?:(?!_\d+$).)*/;
strings.forEach((s) => {
console.log(s + " ==> " + s.match(pattern)[0]);
});
You are asking for
/^(.*?)(?:_\d+)?$/
See the regex demo. The point here is that the first dot pattern must be non-greedy and the _\d+ should be wrapped with an optional non-capturing group and the whole pattern (especially the end) must be enclosed with anchors.
Details
^ - start of string
(.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible due to the non-greedy ("lazy") quantifier *?
(?:_\d+)? - an optional non-capturing group matching 1 or 0 occurrences of _ and then 1+ digits
$ - end of string.
However, it seems easier to use a mere replacing approach,
s = s.replace(/_\d+$/, '')
If the string ends with _ and 1+ digits, the substring will get removed, else, the string will not change.
See this regex demo.
Try to check if the string contains the trailing number. If it does you get only the other part. Otherwise you get the whole string.
var str = "lorem_ipsum_test1_123"
if(/_[0-9]+$/.test(str)) {
console.log(str.match(/(.+)(?=_[0-9]+)/g))
} else {
console.log(str)
}
Or, a lot more concise:
str = str.replace(/_[0-9]+$/g, "")

javascript regex to check if first and last character are similar?

Is there any simple way to check if first and last character of a string are the same or not, only with regex?
I know you can check with charAt
var firstChar = str.charAt(0);
var lastChar = str.charAt(length-1);
console.log(firstChar===lastChar):
I'm not asking for this : Regular Expression to match first and last character
You can use regex with capturing group and its backreference to assert both starting and ending characters are same by capturing the first caharacter. To test the regex match use RegExp#test method.
var regex = /^(.).*\1$/;
console.log(
regex.test('abcdsa')
)
console.log(
regex.test('abcdsaasaw')
)
Regex explanation here :
^ asserts position at start of the string
1st Capturing Group (.)
.* matches any character (except newline) - between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\1 matches the same text as most recently matched by the 1st capturing group
$ asserts position at the end of the string
The . doesn't include newline character, in order include newline update the regex.
var regex = /^([\s\S])[\s\S]*\1$/;
console.log(
regex.test(`abcd
sa`)
)
console.log(
regex.test(`ab
c
dsaasaw`)
)
Refer : How to use JavaScript regex over multiple lines?
Regex explanation here :
[.....] - Match a single character present
\s - matches any whitespace character (equal to [\r\n\t\f\v ])
\S - matches any non-whitespace character (equal to [^\r\n\t\f ])
finally [\s\S] is matches any character.
You can try it
const rg = /^([\w\W]+)[\w\W]*\1$/;
console.log(
rg.test(`abcda`)
)
console.log(
rg.test(`aebcdae`)
)
console.log(
rg.test(`aebcdac`)
)
var rg = /^([a|b])([a|b]+)\1$|^[a|b]$/;
console.log(rg.test('aabbaa'))
console.log(rg.test('a'))
console.log(rg.test('b'))
console.log(rg.test('bab'))
console.log(rg.test('baba'))
This will make sure that characters are none other than a and b which have the same start and end.
It will also match single characters because they too start and end with same character.

Match pattern not preceded by character

I want to make my regex match a pattern only if it is not preceded by a character, the ^ (circumflex) in my case.
My regex:
/[^\^]\w+/g
Text to test it on:
Test: ^Anotherword
Matches: "Test" and " Anotherword", even though the latter is preceded by a circumflex. Which I was trying to prevent by inserting the [^\^] at the start. So I'm not only trying to not match the circumflex, but also the word that comes after it. " Anotherword" should not be matched.
[^\^] - This is what should stop the regex from matching if an accent circonflexe is in front of it.
\w+ - Match any word that is not preceded by a circumflex.
I cannot use lookbehind because of JavaScript limitations.
Use ([^^\w]|^)\w+
(see http://regexr.com/3e85b)
It basically injects a word boundary while excluding the ^ as well.
[^\w] = \W\b\w
Otherwise [^^] will match a '^T'
and \w+ will match est.
You can see it if you put capture groups around it.
If matching is not strictly forbidden.
(?:\^\w+)|(\w+): matches both expressions but no group is generated for ^Anotherworld.
(?:\^\w+): matches ^Kawabanga but no group is generated.
(\w+): everything else for grouping.
I case you want ^Anotherworld to have a group simply remove ?:.
With the growing adoption of the ECMAScript 2018 standard, it makes sense to also consider the lookbehind approach:
const text = "One Test: ^Anotherword";
// Extracing words not preceded with ^:
console.log(text.match(/\b(?<!\^)\w+/g)); // => [ "One", "Test" ]
// Replacing words not preceded with ^ with some other text:
console.log(text.replace(/\b(?<!\^)\w+/g, '<SPAN>$&</SPAN>'));
// => <SPAN>One</SPAN> <SPAN>Test</SPAN>: ^Anotherword
The \b(?<!\^)\w+ regex matches one or more word chars (\w+) that have no word char (letter, digit or _) immediately on the left (achieved with a word boundary, \b) that have no ^ char immediately on the left (achieved with the negative lookbehind (?<!\^)). Note that ^ is a special regex metacharacter that needs to be escaped if one wants to match it as a literal caret char.
For older JavaScript environments, it is still necessary to use a workaround:
var text = "One Test: ^Anotherword";
// Extracing words not preceded with ^:
var regex = /(?:[^\w^]|^)(\w+)/g, result = [], m;
while (m = regex.exec(text)) {
result.push(m[1]);
}
console.log(result); // => [ "One", "Test" ]
// Replacing words not preceded with ^ with some other text:
var regex = /([^\w^]|^)(\w+)/g;
console.log(text.replace(regex, '$1<SPAN>$2</SPAN>'));
// => <SPAN>One</SPAN> <SPAN>Test</SPAN>: ^Anotherword
The extraction and replacement regexps differ in the amount of capturing groups, as when extracing, we only need one group, and when replacing we need both groups. If you decide to use a regex with two capturing groups for extraction, you would need to collect m[2] values.
Extraction pattern means
(?:[^\w^]|^) - a non-capturing group matching
[^\w^] - any char other than a word and ^ char
| - or
^ - start of string
(\w+) - Group 1: one or more word chars.

regexp to quote only string matches (not numbers)

I'm struggling with string:
"some text [2string] some another[test] and another [4]";
trying to quote every value but number within [], so it could be converted into
"some text ['2string'] some another['test'] and another [4]"
Thanks.
You need a regex that
matches content between [], i. e. a [, any number of characters except ], then a ]
asserts that there is at least one other character besides digits here.
You can solve this using character classes and negative lookahead assertions:
result = subject.replace(/\[(?!\d+\])([^\]]*)\]/g, "['$1']");
Explanation:
\[ # Match [
(?! # Assert that it's impossible to match...
\d+ # one or more digits
\] # followed by ]
) # End of lookahead assertion
( # Match and capture in group number 1:
[^\]]* # any number of characters except ]
) # End of capturing group
\] # Match ]
A longer, but IMO cleaner approach, if performance is not a big concern:
var string = "some text [2string] some another[test] and another [4]";
var output = string.replace(/(\[)(.*?)(\])/g, function(match, a, b, c) {
if(/^\d+$/.test(b)) {
return match;
} else {
return a + "'" + b + "'" + c;
}
});
console.log(output);
You basically match every expression inside square brackets, then test to see if it's a number. If it is, return the string as-it-is, otherwise insert quotes at the specific places.
Output:
some text ['2string'] some another['test'] and another [4]
I'd try something like \[(\d*?[a-z]\w*?)]. This should match any [...] as long as there's at least one letter inside. If underscores (_) aren't valid, replace the \w at the end with [a-z].
\[ is just a simple match for [, it has to be escaped due to the special meaning of [.
\d*? will match any amount of digits (or none), but as few as possible to fulfill the match.
[a-z] will match any character within the given range.
\w*? will match any "word" (alphanumeric) characters (letters, digits, and underscores), again as few as possible to fulfill the match.
] is another simple match, this one doesn't have to be escaped, as it's not misleading (no open [ at this level). It can be escaped, but this is usually a style preference (depends on the actual regex engine).
You can replace it with this regex
input.replace(/(?!\d+\])(\w+)(?=\])/g, "'$1'");
another solution that add a simple regex to your attempt:
str.split('[').join("['").split(']').join("']").replace(/\['(\d+)'\]/, "[$1]");

Categories