Regex for SpecialCharacters in bewteen in Javascript

Regex for SpecialCharacters in bewteen in Javascript - javascript

I am trying to create regex which check if string has any special characters only in between. So I am checking for following cases:
"BX_#PO" -- Invalid
"40-66-7" -- Invalid
"_BXTP" -- Valid
"abc123?" -- Valid
"BXTP#" -- Valid
"PO#GO_" -- Invalid
I am trying below code but it which check special characters in anywhere in string and not only in between.
const hasSpecialCharacters = (str) => {
return !/[~`!#$%\^&*+=\-\[\]\\';,/{}|\\":<>\?]/g.test(str);
}

Try this regex:
^[\s\S][a-zA-Z0-9]*[\s\S]$
Click for Demo
Explanation:
^ - asserts the start of the string
[\s\S] - matches any character
[a-zA-Z0-9]* - matching 0+ occurrences of only letters and digits in the middle. Thus, not allowing special characters in the middle
[\s\S] - matches any character
$ - asserts the end of the string

Just like T.J's answer, but going for the readable way around:
const hasSpecialCharacters = (str) => {
return !/[a-zA-Z0-9][^a-zA-Z0-9]+[a-zA-Z0-9]/g.test(str)
}

Related

Handling Dash Character in Regular Expression for Filenames

I have a string which will be used to create a filename. The original string pattern may include a dash. Recently, the pattern has changed and I need to handle the regular expression to remove the dashes near the end or middle of the string but not those at the beginning of the string.
Regular Expression Pattern Rules/Requirements:
Replace all special characters with an underscore with some exceptions
Remove dashes not located at the beginning of the string
The dashes which need to be kept are typically between numeric values [0-9] and can appear any number of times in the string (i.e. "23-564-8 Testing - The - String" -> "23-564-8_testing_the_string")
The dashes which should be converted to underscores are typically between [a-zA-Z] characters (i.e. "Testing - The - String" -> "testing_the_string")
Examples of Potential Strings:
23-564-8 Testing the String -> Expected Output: 23-564-8_testing_the_string
Testing - The String -> Expected Output: testing_the_string
23-564-8 Testing - The - String -> Expected Output: 23-564-8_testing_the_string
Opinion: Personally, I'm not a fan of including dashes in filename but it is a requirement
Current Regexp Solution:
var str = "23-564-8 Testing the String";
str.replace(/[^a-zA-Z0-9-]/g, '_').replace(/__/g, '_');
Question: What is the best way to handle this case? My current solution leaves all dashes in the string.

You may use this regex with a negative lookahead:
/[^a-zA-Z0-9-]+|-(?!\d)/g
RegEx Details:
[^a-zA-Z0-9-]+: Match 1 or more of any character that is not hyphen or alphanumeric
|: OR
-(?!\d): Match hyphen if it is NOT immediately followed by a digit
Code:
const arr = [
'23-564-8 Testing the String',
'Testing - The String',
'-23-564-8 Testing - The - String'
]
const re = /[^a-zA-Z0-9-]+|-(?!\d)/g
var result = []
arr.forEach(el => {
result.push( el.replace(re, '_').replace(/_{2,}/g, '_') )
})
console.log( result )

The following Regex pattern can be used with a replacement string $1_ (see demo):
(\d+(?:-+\d+)+)?[\W\-_]+
The pattern consists of two parts:
(\d+(?:-+\d+)+)? captures numbers with allowed dashes into the Group1
[\W\-_]+ captures special characters to be replaced
The Group1 is required to prevent allowed dashes from being replaced. The $1 token in the replacement string ensures that this content of Group1 will be kept in the result.
This Regex pattern also handles the scenario of duplicate _ characters, so .replace(/__/g, '_') is no longer required. The code can be transformed to:
var str = "23-564-8 Testing the String";
var res = str.replace(/(\d+(?:-+\d+)+)?[\W\-_]+/g, "$1_");
console.log(res);

Regex to detect dates separated by newlines

I'm trying to validate text that's in the format of dates separated by newlines.
The date format needs to be in the form of MM-DD-YYYY.
So a sample could be
MM-DD-YYYY\n
MM-DD-YYYY\n
MM-DD-YYYY
Where there could be an infinite amount of dates entered that are separated by newlines
I've tried /^(\d{2})-(\d{2})-(\d{4})\s+$/ but that doesn't seem to fully work.
Note: I want this to allow for any leading, trailing whitespace and empty newlines as well.
Basically,
A space character
A carriage return character
A newline character
I'm not partial to using regexes. If another way is simpler, desired, more efficient, than I'd gladly switch to that. Thanks!

To validate a string with multiple date-like strings in it with or without leading/trailing whitespace, allowing empty/blank lines, you may use
A method to split the text into lines and use .every() to test each line against a simple pattern:
text.split("\n").every(x => /^\s*(?:\d{2}-\d{2}-\d{4}\s*)?$/.test(x))
NOTE: This will validate a blank input!
Details
^ - start of string
\s* - 0+ whitespaces
(?: - starts a non-capturing group
\d{2}-\d{2}-\d{4} - two digits, -, two digits, - and four digits
\s* - 0+ whitespaces
)? - end of the group, repeat 1 or 0 times (it is optional)
$ - end of string.
A single regex for the multiline string
/^\s*\d{2}-\d{2}-\d{4}(?:[^\S\n]*\n\s*\d{2}-\d{2}-\d{4})*\s*$/.test(text)
See the regex demo. This will not validate blank input.
This regex is long, but is still efficient since the backtracking is minimal (see [^\S\n]*\n\s* part where the first [^\S\n]* matches any whitespace but a line feed, then \n matches a newline (hence, no backtracking here) and then \s* matches 0+ whitespace (again, \n is not quantified so no backtracking into the pattern is \s* fails). The (?:[^\S\n]*\n\s*\d{2}-\d{2}-\d{4})* part is a * quantified non-capturing group that matches 0 or more occurrences of the quantified pattern sequence.
JS demos:
var matching_text = "\n01-01-2020\n 01-01-2020\n01-01-2020 \n\n\n 01-01-2020 \n";
var non_matching_text = "\n01-01-2020\n 01-01-2020\n01-01-2020 \n\n\n 01-01-2020 \n01201-01-20202020";
var regex_1 = /^\s*(?:\d{2}-\d{2}-\d{4}\s*)?$/;
var regex_2 = /^\s*\d{2}-\d{2}-\d{4}(?:[^\S\n]*\n\s*\d{2}-\d{2}-\d{4})*\s*$/;
// Test Solution 1:
console.log(matching_text.split("\n").every(x => regex_1.test(x))); // => true
console.log(non_matching_text.split("\n").every(x => regex_1.test(x))); // => false
// Test Solution 2:
console.log(regex_2.test(matching_text)); // => true
console.log(regex_2.test(non_matching_text)); // => false

You can use something like below to get all matches that satisfy the regular expression. Notice the parentheses () are only around the date part (\d{2}-\d{2}-\d{4}) so that is what you will end up capturing. Since the global flag g is also set on the regex, this will return all occurrences of the parenthesized expression.
Edit: added support for a leading and trailing whitespace.
Edit 2: added ^ and $ so the regex doesn't allow for more than 2 digits in day and more than 4 digits in year.
Run and test:
let regex = /[\\s]*(\d{2}-\d{2}-\d{4})[\\s]*[\\n]*/g;
let dates = " 12-02-2020 \n 09-10-2020\n 03-03-2020 ";
console.log( dates.match(regex) );
EDIT: In order to validate the string of dates you could use the regex.test() method like this:
let regex = /^\s*\d{2}-\d{2}-\d{4}\s*$/;
let dateString = " 12-02-2020 \n 09-10-2020\n 03-03-2020 ";
var dates = dateString.split('\n');
var datesValid = () => {
dates.forEach((el) => {
if(!regex.test(el))
return false;
});
return true;
};
console.log( datesValid() );

Regex remove all leading and trailing special characters?

Let's say I have the following string in javascript:
&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&
I want to remove all the leading and trailing special characters (anything which is not alphanumeric or alphabet in another language) from all the words.
So the string should look like
a.b.c a.b.c a.b.c a.b.c a.b&.c a.b.&&dc ê.b..c
Notice how the special characters in between the alphanumeric is left behind. The last ê is also left behind.

This regex should do what you want. It looks for
start of line, or some spaces (^| +) captured in group 1
some number of symbol characters [!-\/:-#\[-``\{-~]*
a minimal number of non-space characters ([^ ]*?) captured in group 2
some number of symbol characters [!-\/:-#\[-``\{-~]*
followed by a space or end-of-line (using a positive lookahead) (?=\s|$)
Matches are replaced with just groups 1 and 2 (the spacing and the characters between the symbols).
let str = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*([^ ]*?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
Note that if you want to preserve a string of punctuation characters on their own (e.g. as in Apple & Sauce), you should change the second capture group to insist on there being one or more non-space characters (([^ ]+?)) instead of none and add a lookahead after the initial match of punctuation characters to assert that the next character is not punctuation:
let str = 'Apple &&& Sauce; -This + !That!';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*(?![!-\/:-#\[-`\{-~])([^ ]+?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);

a-zA-Z\u00C0-\u017F is used to capture all valid characters, including diacritics.
The following is a single regular expression to capture each individual word. The logic is that it will look for the first valid character as the beginning of the capture group, and then the last sequence of invalid characters before a space character or string terminator as the end of the capture group.
const myRegEx = /[^a-zA-Z\u00C0-\u017F]*([a-zA-Z\u00C0-\u017F].*?[a-zA-Z\u00C0-\u017F]*)[^a-zA-Z\u00C0-\u017F]*?(\s|$)/g;
let myString = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&'.replace(myRegEx, '$1$2');
console.log(myString);

Something like this might help:
const string = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
const result = string.split(' ').map(s => /^[^a-zA-Z0-9ê]*([\w\W]*?)[^a-zA-Z0-9ê]*$/g.exec(s)[1]).join(' ');
console.log(result);
Note that this is not one single regex, but uses JS help code.
Rough explanation: We first split the string into an array of strings, divided by spaces. We then transform each of the substrings by stripping
the leading and trailing special characters. We do this by capturing all special characters with [^a-zA-Z0-9ê]*, because of the leading ^ character it matches all characters except those listed, so all special characters. Between these two groups we capture all relevant characters with ([\w\W]*?). \w catches words, \W catches non-words, so \w\W catches all possible characters. By appending the ? after the *, we make the quantifier * lazy, so that the group stops catching as soon as the next group, which catches trailing special characters, catches something. We also start the regex with a ^ symbol and end it with an $ symbol to capture the entire string (they respectively set anchors to the start end the end of the string). With .exec(s)[1] we then execute the regex on the substring and return the first capturing group result in our transform function. Note that this might be null if a substring does not include proper characters. At the end we join the substrings with spaces.

Regular expression capture with optional trailing underscore and number

I'm trying to find a regular expression that will match the base string without the optional trailing number (_123). e.g.:
lorem_ipsum_test1_123 -> capture lorem_ipsum_test1
lorem_ipsum_test2 -> capture lorem_ipsum_test2
I tried using the following expression, but it would only work when there is a trailing _number.
/(.+)(?>_[0-9]+)/
/(.+)(?>_[0-9]+)?/
Similarly, adding the ? (zero or more) quantifier only worked when there is no trailing _number, otherwise, the trailing _number would just be part of the first capture.
Any suggestions?

You may use the following expression:
^(?:[^_]+_)+(?!\d+$)[^_]+
^ Anchor beginning of string.
(?:[^_]+_)+ Repeated non capturing group. Negated character set for anything other than a _, followed by a _.
(?!\d+$) Negative lookahead for digits at the end of the string.
[^_]+ Negated character set for anything other than a _.
Regex demo here.
Please note that the \n in the character sets in the Regex demo are only for demonstration purposes, and should by all means be removed when using as a pattern in Javascript.
Javascript demo:
var myString = "lorem_ipsum_test1_123";
var myRegexp = /^(?:[^_]+_)+(?!\d+$)[^_]+/g;
var match = myRegexp.exec(myString);
console.log(match[0]);
var myString = "lorem_ipsum_test2"
var myRegexp = /^(?:[^_]+_)+(?!\d+$)[^_]+/g;
var match = myRegexp.exec(myString);
console.log(match[0]);

You might match any character and use a negative lookahead that asserts that what follows is not an underscore, one or more digits and the end of the string:
^(?:(?!_\d+$).)*
Explanation
^ Assert start of the string
(?: Non capturing group
(?! Negative lookahead to assert what is on the right side is not
_\d+$Match an underscore, one or more digits and assert end of the string
.) Match any character and close negative lookahead
)* Close non capturing group and repeat zero or more times
Regex demo
const strings = [
"lorem_ipsum_test1_123",
"lorem_ipsum_test2"
];
let pattern = /^(?:(?!_\d+$).)*/;
strings.forEach((s) => {
console.log(s + " ==> " + s.match(pattern)[0]);
});

You are asking for
/^(.*?)(?:_\d+)?$/
See the regex demo. The point here is that the first dot pattern must be non-greedy and the _\d+ should be wrapped with an optional non-capturing group and the whole pattern (especially the end) must be enclosed with anchors.
Details
^ - start of string
(.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible due to the non-greedy ("lazy") quantifier *?
(?:_\d+)? - an optional non-capturing group matching 1 or 0 occurrences of _ and then 1+ digits
$ - end of string.
However, it seems easier to use a mere replacing approach,
s = s.replace(/_\d+$/, '')
If the string ends with _ and 1+ digits, the substring will get removed, else, the string will not change.
See this regex demo.

Try to check if the string contains the trailing number. If it does you get only the other part. Otherwise you get the whole string.
var str = "lorem_ipsum_test1_123"
if(/_[0-9]+$/.test(str)) {
console.log(str.match(/(.+)(?=_[0-9]+)/g))
} else {
console.log(str)
}
Or, a lot more concise:
str = str.replace(/_[0-9]+$/g, "")

Regular expression negative match

I can't seem to figure out how to compose a regular expression (used in Javascript) that does the following:
Match all strings where the characters after the 4th character do not contain "GP".
Some example strings:
EDAR - match!
EDARGP - no match
EDARDTGPRI - no match
ECMRNL - match
I'd love some help here...

Use zero-width assertions:
if (subject.match(/^.{4}(?!.*GP)/)) {
// Successful match
}
Explanation:
"
^ # Assert position at the beginning of the string
. # Match any single character that is not a line break character
{4} # Exactly 4 times
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
GP # Match the characters “GP” literally
)
"

You can use what's called a negative lookahead assertion here. It looks into the string ahead of the location and matches only if the pattern contained is /not/ found. Here is an example regular expression:
/^.{4}(?!.*GP)/
This matches only if, after the first four characters, the string GP is not found.

could do something like this:
var str = "EDARDTGPRI";
var test = !(/GP/.test(str.substr(4)));
test will return true for matches and false for non.

We Keep Coding

JavaScript is the programming language of the Web.

Regex for SpecialCharacters in bewteen in Javascript - javascript

Just like T.J's answer, but going for the readable way around: const hasSpecialCharacters = (str) => { return !/[a-zA-Z0-9][^a-zA-Z0-9]+[a-zA-Z0-9]/g.test(str) }

Related

Handling Dash Character in Regular Expression for Filenames

Regex to detect dates separated by newlines

Regex remove all leading and trailing special characters?

Regular expression capture with optional trailing underscore and number

Regular expression negative match

Categories

Resources