How to detect namespace string in Javascript regular expression? - javascript

I need to check if a string represents a valid namespace format. A namespace is comprised of ids separated with dots. Each id starts with an alphabetic character and continues with an alphanumeric character.
Valid namespaces:
"com.company.package"
"com.company"
"com"
Invalid namespaces:
"1com.company.package"
"com.1company"
"com.com%any"
".com.company"
"com.company."
"com "
" com"
""
"."
"com..company"
Currently I use this simple regexp but it really don't check all of those invalid namespaces:
if( /^[\w\.]$/.test( namespaceStr ) ) {
//valid namespace
} else {
//invalid namespace
}
Any better suggestion for a small and efficient way to check if a string represents a valid namespace?
Here is a little jsfiddle that you can use for testing this regular expression: http://jsfiddle.net/bA85y/

Edit: This one should work for every case:
/^(?:[a-z]\d*(?:\.[a-z])?)+$/i
If you don't care about capturing groups even shorter:
/^([a-z]\d*(\.[a-z])?)+$/i
A little explanation:
^ // Start
( // Open group
[a-z]\d* // Must start by letter and may be followed by a number (greedy)
(\.[a-z])? // It may be followed by a dot only if it's followed by a letter (non-greedy)
)+ // Close group and match at least once so we get rid of empty values
$ // Ends, not allow any other characters
Demo: http://jsfiddle.net/elclanrs/5hnQV/

Try this pattern:
/^[a-z][a-z0-9]*(?:\.[a-z][a-z0-9]*)*$/i
EDIT:
this is a reversion of #elclanrs jsfiddle

I think you are looking for this:
/^[a-z][a-z0-9]*(\.[a-z][a-z0-9]*)*$/i
EDIT:
This one is a little better (with ?: and \d inspired by #HashemQolami and #elclanrs):
/^[a-z][a-z\d]*(?:\.[a-z][a-z\d]*)*$/i
And this one is shorter but does the same job:
/^[a-z](?:[a-z\d]*(?:\.[a-z])?)*$/i
And this one too, using lookahead to test that it doesn't end with a .:
/^(?!.*\.$)(?:[a-z][a-z\d]*\.?)+$/i
Please note that the selected answer doesn't work with "a.b.c" or in some cases with more than two levels.
UPDATE:
I've made a little (very basic) test:
var valid = [
"com.company.package",
"com.company",
"com.company1",
"com1.company1",
"a.b.c",
"a1.b.c3.d",
"a1.b2.c3.d4"];
var invalid = [
"1com.company.package",
"com.1company",
"com.com%any",
".com.company",
"com.company.",
"com ",
" com",
"",
".",
"com..company"];
function testRegex(regex, list)
{
var res=[];
for(var i=0; i<list.length; i++)
{
if(regex.test(list[i]))
res.push(list[i] + " ==> matched");
else
res.push(list[i] + " ==> NOT matched");
}
return res.join('<br>');
}
var regex = /^[a-z][a-z0-9]*(\.[a-z][a-z0-9]*)*$/i;
var html = "<p>VALID</p>";
html += testRegex(regex, valid);
html += "<p>INVALID</p>";
html += testRegex(regex, invalid);
document.write("<div>" + html + "</div>");

Based on #dionyziz answer this work:
/^[a-z]+(\.[a-z]+)*[^.\s]$/

The following regular expression will do what you need. It checks for an alphabetic string and then allows multiple other alphabetic strings separated by a dot.
/^[a-z]+(\.[a-z]+)*$/

Related

Excluding matcher [duplicate]

After coming to the shocking realization that regular expressions in JavaScript are somewhat different from the ones in PCE, I am stuck with the following.
In php I extract a number after x:
(?x)[0-9]+
In JavaScript the same regex doesn't work, due to invalid group resulting from the capturing parenthesis difference.
So I am trying to achieve the same trivial functionality, but I keep getting both the x and the number:
(?:x)([0-9]+)
How do I capture the number after x without including x?
This works too:
/(?:x)([0-9]+)/.test('YOUR_STRING');
Then, the value you want is:
RegExp.$1 // group 1
You can try the following regex: (?!x)[0-9]+
fiddle here: https://jsfiddle.net/xy6x938e/1/
This is assuming that you are now looking for an x followed by a number, it uses a capture group to capture just the numbers section.
var myString = "x12345";
var myRegexp = /x([0-9]+)/g;
var match = myRegexp.exec(myString);
var myString2 = "z12345";
var match2 = myRegexp.exec(myString2);
if(match != null && match.length > 1){
alert('match1:' + match[1]);
}
else{
alert('no match 1');
}
if(match2 != null && match2.length > 1){
alert('match2:' + match2[1]);
}
else{
alert('no match 2');
}
(\d+) try this!
i have tested on this tool with x12345
http://www.regular-expressions.info/javascriptexample.html
How do I capture the number after x without including x?
In fact, you just want to extract a sequence of digits after a fixed string/known pattern.
Your PCRE (PHP) regex, (?x)[0-9]+, is wrong becaue (?x) is an inline version of a PCRE_EXTENDED VERBOSE/COMMENTS flag (see "Pattern Modifiers"). It does not do anything meaningful in this case, (?x)[0-9]+ is equal to [0-9]+ or \d+.
You can use
console.log("x15 x25".match(/(?<=x)\d+/g));
You can also use a capturing group and then extract Group 1 value after a match is obtained:
const match = /x(\d+)/.exec("x15");
if (match) {
console.log(match[1]); // Getting the first match
}
// All matches
const matches = Array.from("x15,x25".matchAll(/x(\d+)/g), x=>x[1]);
console.log(matches);
You still can use exclusive pattern (?!...)
So, for your example it will be /(?!x)[0-9]+/. Give a try to the following:
/(?!x)\d+/.exec('x123')
// => ["123"]

How to ban words with diacritics using a blacklist array and regex?

I have an input of type text where I return true or false depending on a list of banned words. Everything works fine. My problem is that I don't know how to check against words with diacritics from the array:
var bannedWords = ["bad", "mad", "testing", "băţ"];
var regex = new RegExp('\\b' + bannedWords.join("\\b|\\b") + '\\b', 'i');
$(function () {
$("input").on("change", function () {
var valid = !regex.test(this.value);
alert(valid);
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type='text' name='word_to_check'>
Now on the word băţ it returns true instead of false for example.
Chiu's comment is right: 'aaáaa'.match(/\b.+?\b/g) yelds quite counter-intuitive [ "aa", "á", "aa" ], because "word character" (\w) in JavaScript regular expressions is just a shorthand for [A-Za-z0-9_] ('case-insensitive-alpha-numeric-and-underscore'), so word boundary (\b) matches any place between chunk of alpha-numerics and any other character. This makes extracting "Unicode words" quite hard.
For non-unicase writing systems it is possible to identify "word character" by its dual nature: ch.toUpperCase() != ch.toLowerCase(), so your altered snippet could look like this:
var bannedWords = ["bad", "mad", "testing", "băţ", "bať"];
var bannedWordsRegex = new RegExp('-' + bannedWords.join("-|-") + '-', 'i');
$(function() {
$("input").on("input", function() {
var invalid = bannedWordsRegex.test(dashPaddedWords(this.value));
$('#log').html(invalid ? 'bad' : 'good');
});
$("input").trigger("input").focus();
function dashPaddedWords(str) {
return '-' + str.replace(/./g, wordCharOrDash) + '-';
};
function wordCharOrDash(ch) {
return isWordChar(ch) ? ch : '-'
};
function isWordChar(ch) {
return ch.toUpperCase() != ch.toLowerCase();
};
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type='text' name='word_to_check' value="ba">
<p id="log"></p>
Let's see what's going on:
alert("băţ".match(/\w\b/));
This is [ "b" ] because word boundary \b doesn't recognize word characters beyond ASCII. JavaScript's "word characters" are strictly [0-9A-Z_a-z], so aä, pπ, and zƶ match \w\b\W since they contain a word character, a word boundary, and a non-word character.
I think the best you can do is something like this:
var bound = '[^\\w\u00c0-\u02c1\u037f-\u0587\u1e00-\u1ffe]';
var regex = new RegExp('(?:^|' + bound + ')(?:'
+ bannedWords.join('|')
+ ')(?=' + bound + '|$)', 'i');
where bound is a reversed list of all ASCII word characters plus most Latin-esque letters, used with start/end of line markers to approximate an internationalized \b. (The second of which is a zero-width lookahead that better mimics \b and therefore works well with the g regex flag.)
Given ["bad", "mad", "testing", "băţ"], this becomes:
/(?:^|[^\w\u00c0-\u02c1\u037f-\u0587\u1e00-\u1ffe])(?:bad|mad|testing|băţ)(?=[^\w\u00c0-\u02c1\u037f-\u0587\u1e00-\u1ffe]|$)/i
This doesn't need anything like ….join('\\b|\\b')… because there are parentheses around the list (and that would create things like \b(?:hey\b|\byou)\b, which is akin to \bhey\b\b|\b\byou\b, including the nonsensical \b\b – which JavaScript interprets as merely \b).
You can also use var bound = '[\\s!-/:-#[-`{-~]' for a simpler ASCII-only list of acceptable non-word characters. Be careful about that order! The dashes indicate ranges between characters.
You need a Unicode aware word boundary. The easiest way is to use XRegExp package.
Although its \b is still ASCII based, there is a \p{L} (or a shorter pL version) construct that matches any Unicode letter from the BMP plane. To build a custom word boundary using this contruct is easy:
\b word \b
---------------------------------------
| | |
([^\pL0-9_]|^) word (?=[^\pL0-9_]|$)
The leading word boundary can be represented with a (non)capturing group ([^\pL0-9_]|^) that matches (and consumes) either a character other than a Unicode letter from the BMP plane, a digit and _ or a start of the string before the word.
The trailing word boundary can be represented with a positive lookahead (?=[^\pL0-9_]|$) that requires a character other than a Unicode letter from the BMP plane, a digit and _ or the end of string after the word.
See the snippet below that will detect băţ as a banned word, and băţy as an allowed word.
var bannedWords = ["bad", "mad", "testing", "băţ"];
var regex = new XRegExp('(?:^|[^\\pL0-9_])(?:' + bannedWords.join("|") + ')(?=$|[^\\pL0-9_])', 'i');
$(function () {
$("input").on("change", function () {
var valid = !regex.test(this.value);
//alert(valid);
console.log("The word is", valid ? "allowed" : "banned");
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
<input type='text' name='word_to_check'>
In stead of using word boundary, you could do it with
(?:[^\w\u0080-\u02af]+|^)
to check for start of word, and
(?=[^\w\u0080-\u02af]|$)
to check for the end of it.
The [^\w\u0080-\u02af] matches any characters not (^) being basic Latin word characters - \w - or the Unicode 1_Supplement, Extended-A, Extended-B and Extensions. This include some punctuation, but would get very long to match just letters. It may also have to be extended if other character sets have to be included. See for example Wikipedia.
Since javascript doesn't support look-behinds, the start-of-word test consumes any before mentioned non-word characters, but I don't think that should be a problem. The important thing is that the end-of-word test doesn't.
Also, putting these test outside a non capturing group that alternates the words, makes it significantly more effective.
var bannedWords = ["bad", "mad", "testing", "băţ", "båt", "süß"],
regex = new RegExp('(?:[^\\w\\u00c0-\\u02af]+|^)(?:' + bannedWords.join("|") + ')(?=[^\\w\\u00c0-\\u02af]|$)', 'i');
function myFunction() {
document.getElementById('result').innerHTML = 'Banned = ' + regex.test(document.getElementById('word_to_check').value);
}
<!DOCTYPE html>
<html>
<body>
Enter word: <input type='text' id='word_to_check'>
<button onclick='myFunction()'>Test</button>
<p id='result'></p>
</body>
</html>
When dealing with characters outside my base set (which can show up at any time), I convert them to an appropriate base equivalent (8bit, 16bit, 32bit). before running any character matching over them.
var bannedWords = ["bad", "mad", "testing", "băţ"];
var bannedWordsBits = {};
bannedWords.forEach(function(word){
bannedWordsBits[word] = "";
for (var i = 0; i < word.length; i++){
bannedWordsBits[word] += word.charCodeAt(i).toString(16) + "-";
}
});
var bannedWordsJoin = []
var keys = Object.keys(bannedWordsBits);
keys.forEach(function(key){
bannedWordsJoin.push(bannedWordsBits[key]);
});
var regex = new RegExp(bannedWordsJoin.join("|"), 'i');
function checkword(word) {
var wordBits = "";
for (var i = 0; i < word.length; i++){
wordBits += word.charCodeAt(i).toString(16) + "-";
}
return !regex.test(wordBits);
};
The separator "-" is there to make sure that unique characters don't bleed together creating undesired matches.
Very useful as it brings all the characters down to a common base that everything can interact with. And this can be re-encoded back to it's original without having to ship it in key/value pair.
For me the best thing about it is that I don't have to know all of the rules for all of the character sets that I might intersect with, because I can pull them all into a common playing field.
As a side note:
To speed things up, rather than passing the large regex statement that you probably have, which takes exponentially longer to pass with the length of the words that you're banning, I would pass each separate word in the sentence through the filter. And break the filter up into length based segments. like;
checkword3Chars();
checkword4Chars();
checkword5chars();
who's functions you can generate systematically and even create on the fly as and when they become required.

JS Regex for matching specific array increment ignoring string and seperate increment

I have the following input fields with name attributes of:
carousels['components'][0][0][title]
carousels['components'][0][1][title]
carousels['components'][0][2][title]
carousels['components'][1][0][title]
carousels['components'][1][1][title]
carousels['components'][1][2][title]
carousels['components'][2][0][title]
carousels['components'][2][1][title]
carousels['components'][2][2][title]
I am trying to match the final [ number ] eg this part:
carousels['components'][2][THIS][title]
carousels['components'][2][THIS][title]
carousels['components'][2][THIS][title]
While ignoring the rest
Here is my regex pattern:
/(\[[^components\]])+(\[*])/
This affects both of the int's within brackets when I just want the last one. This regex also doesn't recognize the specific requirement of the first array key 'component'
Live regex test here:
http://www.regexpal.com/?fam=94974
If you want to get the last [ + digits + ], you can use
/^.*\[(\d+)\].*$/
See the regex demo
Backtracking will help getting exactly the last occurrence of [digits]. Grab Group 1 value.
var re = /^.*\[(\d+)\].*$/;
var str = 'carousels[\'components\'][0][0][title]\ncarousels[\'components\'][0][1][title]\ncarousels[\'components\'][0][2][title]\n\ncarousels[\'components\'][1][0][title]\ncarousels[\'components\'][1][1][title]\ncarousels[\'components\'][1][2][title]\n\ncarousels[\'components\'][2][0][title]\ncarousels[\'components\'][2][1][title]\ncarousels[\'components\'][2][2][title]';
for (var s of str.split("\n")) {
var res = (m=re.exec(s)) ? m[1] : "";
if (res) {
document.body.innerHTML += s + ": " + res + "<br/>";
}
}
UPDATE:
To get the first [ + digits + ], you need to use lazy matching with the first dot:
/^.*?\[(\d+)\].*$/
^ - Here, the ? will make matching lazy/reluctant
(it will match any 0+ chars other than a newline as few as possible)
See another regex demo.
You can try this
^.*(\[.*?\])\[.*?\]$
<------->
Match in this(1st captured group)
Regex Demo
If you want to match ['components'] exclusively, then you can use
^.*\['components'\].*(\[.*?\])\[.*?\]$

JavaScript regex get number after string

After coming to the shocking realization that regular expressions in JavaScript are somewhat different from the ones in PCE, I am stuck with the following.
In php I extract a number after x:
(?x)[0-9]+
In JavaScript the same regex doesn't work, due to invalid group resulting from the capturing parenthesis difference.
So I am trying to achieve the same trivial functionality, but I keep getting both the x and the number:
(?:x)([0-9]+)
How do I capture the number after x without including x?
This works too:
/(?:x)([0-9]+)/.test('YOUR_STRING');
Then, the value you want is:
RegExp.$1 // group 1
You can try the following regex: (?!x)[0-9]+
fiddle here: https://jsfiddle.net/xy6x938e/1/
This is assuming that you are now looking for an x followed by a number, it uses a capture group to capture just the numbers section.
var myString = "x12345";
var myRegexp = /x([0-9]+)/g;
var match = myRegexp.exec(myString);
var myString2 = "z12345";
var match2 = myRegexp.exec(myString2);
if(match != null && match.length > 1){
alert('match1:' + match[1]);
}
else{
alert('no match 1');
}
if(match2 != null && match2.length > 1){
alert('match2:' + match2[1]);
}
else{
alert('no match 2');
}
(\d+) try this!
i have tested on this tool with x12345
http://www.regular-expressions.info/javascriptexample.html
How do I capture the number after x without including x?
In fact, you just want to extract a sequence of digits after a fixed string/known pattern.
Your PCRE (PHP) regex, (?x)[0-9]+, is wrong becaue (?x) is an inline version of a PCRE_EXTENDED VERBOSE/COMMENTS flag (see "Pattern Modifiers"). It does not do anything meaningful in this case, (?x)[0-9]+ is equal to [0-9]+ or \d+.
You can use
console.log("x15 x25".match(/(?<=x)\d+/g));
You can also use a capturing group and then extract Group 1 value after a match is obtained:
const match = /x(\d+)/.exec("x15");
if (match) {
console.log(match[1]); // Getting the first match
}
// All matches
const matches = Array.from("x15,x25".matchAll(/x(\d+)/g), x=>x[1]);
console.log(matches);
You still can use exclusive pattern (?!...)
So, for your example it will be /(?!x)[0-9]+/. Give a try to the following:
/(?!x)\d+/.exec('x123')
// => ["123"]

Regular Expressions required format

I want to validate following text using regular expressions
integer(1..any)/'fs' or 'sf'/ + or - /integer(1..any)/(h) or (m) or (d)
samples :
1) 8fs+60h
2) 10sf-30m
3) 2fs+3h
3) 15sf-20m
i tried with this
function checkRegx(str,id){
var arr = strSplit(str);
var regx_FS =/\wFS\w|\d{0,9}\d[hmd]/gi;
for (var i in arr){
var str_ = arr[i];
console.log(str_);
var is_ok = str_.match(regx_FS);
var err_pos = str_.search(regx_FS);
if(is_ok){
console.log(' ID from ok ' + id);
$('#'+id).text('Format Error');
break;
}else{
console.log(' ID from fail ' + id);
$('#'+id).text('');
}
}
}
but it is not working
please can any one help me to make this correct
This should do it:
/^[1-9]\d*(?:fs|sf)[-+][1-9]\d*[hmd]$/i
You were close, but you seem to be missing some basic regex comprehension.
First of all, the ^ and $ just make sure you're matching the entire string. Otherwise any junk before or after will count as valid.
The formation [1-9]\d* allows for any integer from 1 upwards (and any number of digits long).
(?:fs|sf) is an alternation (the ?: is to make the group non-capturing) to allow for both options.
[-+] and [hmd] are character classes allowing to match any one of the characters in there.
That final i allows the letters to be lowercase or uppercase.
I don't see how the expression you tried relates anyhow to the description you gave us. What you want is
/\d+(fs|sf)[+-]\d+[hmd]/
Since you seem to know a bit about regular expressions I won't give a step-by-step explanation :-)
If you need exclude zero from the "integer" matches, use [1-9]\d* instead. Not sure whether by "(1..any)" you meant the number of digits or the number itself.
Looking on the code, you
should not use for in enumerations on arrays
will need string start and end anchors to check whether _str exactly matches the regex (instead of only some part)
don't need the global flag on the regex
rather might use the RegExp test method than match - you don't need a result string but only whether it did match or not
are not using the err_pos variable anywhere, and it hardly will work with search
function checkRegx(str, id) {
var arr = strSplit(str);
var regx_FS = /^\d+(fs|sf)[+-]\d+[hmd]$/i;
for (var i=0; i<arr.length; i++) {
var str = arr[i];
console.log(str);
if (regx_FS.test(str) {
console.log(' ID from ok ' + id);
$('#'+id).text('Format Error');
break;
} else {
console.log(' ID from fail ' + id);
$('#'+id).text('');
}
}
}
Btw, it would be better to separate the validation (regex, array split, iteration) from the output (id, jQuery, logs) into two functions.
Try something like this:
/^\d+(?:fs|sf)[-+]\d+[hmd]$/i

Categories