Regex pattern for my below mentioned requirement - javascript

I am getting very confused in writing the regex pattern for my requirement.
I want that a text field should not accept any special character except underscore and hyphen. Also, it shouldn't accept underscore, hyphen, and space if entered alone in the text field.
I tried following pattern->
/[ !##$%^&*()+\=\[\]{};':"\\|,.<>\/?]/;
but this is also allowing underscore and hyphen, as well as space if entered alone.

Rather than matching what you do not want, you should match what you actually want. Since you never specified if you string could have letter, number and spaces in it, i just assumed it was a single word, so I matched uppercase and lowercase letters only, with underscore and hyphen.
^(([A-Za-z])+([\-|_ ])?)+$
I have created a regex101 if you wish to try more cases.

If you want your string not to contain special characters except underscore and hyphen. But there is an exception for that if they contain space with the hyphen and underscore, then you can handle that exception separately. This will make your code easier to understand and easily adaptable for further exceptions.
function validateString(str){
let reg = /[^!##$%^&*()+\=\[\]{};':"\\|,.<>\/?]/g;
let match = str.match(reg);
console.log(match);
if(match && (match.includes(" ") || match.includes("_") || match.includes("-")) && (!match.join(",").match(/[a-zA-Z]/))){
// regex contains invalid characters
console.log(str + ": Invalid input");
}
else if(match){
console.log(str + ": Valid string");
}
}
let str = "-_ ";
let str1 = "Mathus-Mark";
let str2 = "Mathus Mark";
let str3 = "Mathus_Mark";
let str4 = " ";
let str5 = "-";
let str6 = "_";
validateString(str);
validateString(str1);
validateString(str2);
validateString(str3);
validateString(str4);
validateString(str5);
validateString(str6);

Related

Is there a javascript method to recognize a string even if the words are out of order? [duplicate]

I would like to find all the matches of given strings (divided by spaces) in a string.
(The way for example, iTunes search box works).
That, for example, both "ab de" and "de ab" will return true on "abcde" (also "bc e a" or any order should return true)
If I replace the white space with a wild card, "ab*de" would return true on "abcde", but not "de*ab".
[I use * and not Regex syntax just for this explanation]
I could not find any pure Regex solution for that.
The only solution I could think of is spliting the search term and run multiple Regex.
Is it possible to find a pure Regex expression that will cover all these options ?
Returns true when all parts (divided by , or ' ') of a searchString occur in text. Otherwise false is returned.
filter(text, searchString) {
const regexStr = '(?=.*' + searchString.split(/\,|\s/).join(')(?=.*') + ')';
const searchRegEx = new RegExp(regexStr, 'gi');
return text.match(searchRegEx) !== null;
}
I'm pretty sure you could come up with a regex to do what you want, but it may not be the most efficient approach.
For example, the regex pattern (?=.*bc)(?=.*e)(?=.*a) will match any string that contains bc, e, and a.
var isMatch = 'abcde'.match(/(?=.*bc)(?=.*e)(?=.*a)/) != null; // equals true
var isMatch = 'bcde'.match(/(?=.*bc)(?=.*e)(?=.*a)/) != null; // equals false
You could write a function to dynamically create an expression based on your search terms, but whether it's the best way to accomplish what you are doing is another question.
Alternations are order insensitive:
"abcde".match(/(ab|de)/g); // => ['ab', 'de']
"abcde".match(/(de|ab)/g); // => ['ab', 'de']
So if you have a list of words to match you can build a regex with an alternation on the fly like so:
function regexForWordList(words) {
return new RegExp('(' + words.join('|') + ')', 'g');
}
'abcde'.match(['a', 'e']); // => ['a', 'e']
Try this:
var str = "your string";
str = str.split( " " );
for( var i = 0 ; i < str.length ; i++ ){
// your regexp match
}
This is script which I use - it works also with single word searchStrings
var what="test string with search cool word";
var searchString="search word";
var search = new RegExp(searchString, "gi"); // one-word searching
// multiple search words
if(searchString.indexOf(' ') != -1) {
search="";
var words=searchString.split(" ");
for(var i = 0; i < words.length; i++) {
search+="(?=.*" + words[i] + ")";
}
search = new RegExp(search + ".+", "gi");
}
if(search.test(what)) {
// found
} else {
// notfound
}
I assume you are matching words, or parts of words. You want space-separated search terms to limit search results, and it seems you intend to return only those entries which have all the words that the user supplies. And you intend a wildcard character * to stand for 0 or more characters in a matching word.
For example, if the user searches for the words term1 term2, you intend to return only those items which have both words term1 and term2. If the user searches for the word term*, it would match any word beginning with term.
There are suitable regular expressions which are equivalent to this search language and can be generated from it.
A simple example, the word term, can be asserted in regex by converting to \bterm\b. But two or more words which must match in any order require lookahead assertions. Using extended syntax, the equivalent regex is:
(?= .* \b term1 \b )
(?= .* \b term2 \b )
The asterisk wildcard can be asserted in regex with a character class followed by asterisk. The character class identifies which letters you consider to be part of word. For example, you might find that [A-Za-z0-9]* fits the bill.
In short, you might be satisfied if you convert an expression such as:
foo ba* quux
to:
(?= .* \b foo \b )
(?= .* \b ba[A-Za-z0-9]* \b )
(?= .* \b quux \b )
That is a simple matter of search and replace. But do be careful to sanitize the input string to avoid injection attacks by removing punctuation, etc.
I think you may be barking up the wrong tree with RegEx. What you might want to look at is the Levenshtein distance of two input strings.
There's a Javascript implementation here and a usage example here.

How to get the first alphabetic word of string in javascript

I am trying to retrieve the first alphabetic word of a string, which might include tags as well.
I have tried using split(" ") but it gives me the spaces.
let letter = ' <section class="contact" id="contact">';
let firstWord = letter.split (" ");
It should just show section as the first word. Is there any way I can do. Thank you
Simple regex to match alphabetic (not alphanumeric) words /[a-zA-Z]+/g
let letter = ' <section class="contact" id="contact">';
let words = letter.match (/[a-zA-Z]+/g); // Match all alphabet
let firstWord = words.length > 0 ? words[0] : '';
console.log(firstWord);
You may use several solutions based on what you really need.
For the current scenario, you may match a chunk of 1+ ASCII letters
let letter = ' <section class="contact" id="contact">';
let first_word = (letter.match(/[a-z]+/i) || [""])[0];
console.log(first_word)
You may tell the regex engine to only match it if there are no digits or underscores around it using \b, word boundaries:
/\b[a-z]+\b/i
And in case you want to match any Unicode letter word and target ECMAScript 2018 and newer, you may use
let regex = /\p{Alphabetic}+/u;
console.log("Один,два".match(regex)[0]); // => Один
Or, with Unicode word boundaries,
let regex = /(?<![\p{Alphabetic}\p{N}_])\p{Alphabetic}+(?![\p{Alphabetic}\p{N}_])/u;
// Or,
// let regex = /(?<!\p{L}\p{M}*|[\p{N}_])\p{Alphabetic}+(?![\p{L}\p{N}_])/u
console.log("1Один2,два-три".match(regex)[0]); // => два
That is, to match 1+ alphabetic chars not preceded nor followed with letters or digits.
To get the first word, match a non-letter, then one or more letters inside a capturing group, then another non-letter:
let letter = ' <section class="contact" id="contact">';
let [, firstWord] = letter.match(/[^a-z]([a-z]+)[^a-z]/i);
console.log(firstWord);

javascript extract hashtags from strings

I have a string received from backend, and I need to extract hashtags. The tags are written in one of these two forms
type 1. #World is a #good #place to #live.
type 2. #World#place#live.
I managed to extract from first type by : str.replace(/#(\S*)/g
how can i change the second format to space seperated tags as well as format one?
basically i want format two to be converted from
#World#place#live.
to
#World #place #live.
You can use String.match, with regex #\w+:
var str = `
type 1. #World is a #good #place to #live.
type 2. #World#place#live.`
var matches = str.match(/#\w+/g)
console.log(matches)
\w+ matches any word character [a-zA-Z0-9_] more than once, so you might want to tweak that.
Once you have the matches in an array you can rearrange them to your likes.
The pattern #(\S*) will match a # followed by 0+ times a non whitespace character in a captured group. That would match a single # as well. The string #World#place#live. contains no whitespace character so the whole string will be matched.
You could match them instead by using a negated character class. Match #, followed by a negated character class that matches not a # or a whitespace character.
#[^#\s]+
Regex demo
const strings = [
"#World is a #good #place to #live.",
"#World#place#live."
];
let pattern = /#[^#\s]+/g;
strings.forEach(s => {
console.log(s.match(pattern));
});
How about that using regex /#([\w]+\b)/gm and join by space like below to extract #hastags from your string? OR you can use str.replace(/\b#[^\s#]+/g, " $&") as commented by #Wiktor
function findHashTags(str) {
var regex = /#([\w]+\b)/gm;
var matches = [];
var match;
while ((match = regex.exec(str))) {
matches.push(match[0]);
}
return matches;
}
let str1 = "#World is a #good #place to #live."
let str2 = "#World#place#live";
let res1 = findHashTags(str1);
let res2 = findHashTags(str2);
console.log(res1.join(' '));
console.log(res2.join(' '));

Remove leading and trailing characters from a

I have a text file which has strings separated by whitespace. The text file contains some special characters (latin, currency, punctuations etc.) Which need to be discarded from final output. Please note that legal characters are all characters in Unicode except these special characters.
We need to separate/split text by whitespaces and then remove only leading and trailing special characters. If special characters are in between two legal characters then we won't remove them.
I can easily do it in two phases. Split text by whitespaces and then remove only leading and trailing special characters from each string. However, I need to process string only once. Is there any way, it could be achieved in one pass. Note: We can't use RegEx.
For this question assume that these characters are special:
[: , ! . < ; ' " > [ ] { } ` ~ = + - ? / ]
Example:
:!/,.<;:.?;,BBM!/,.<;:.?;,` IS TALKING TO `B!?AM!/,.<;:.?;,
Here output would be an array of valid strings: ["BBM", "IS", "TALKING", "TO", "B!?AM"]
Make simple state machine (finite automata)
Walk in a loop through all chars
At every step check if current char is letter, space or special
Execute some operation (perhaps empty) depending on state and char kind
Change state if needed
for example, you may stay in "special" state until letter is met. Remember starting index of the word and make state "inside word". Continue until special char or space is met (it is still not clear from your question).
I have used typescript and have done it in a single pass.
Please note that isSpecialCharacterCode(charCode) function simply checks whether unicode of text character is same as unicode of provided special characters.Same is true for isWhitespaceCode(charCode) function.
parseText(text: string): string[]{
let words : string[] = [];
let word = "";
let charCode = 1;
let haveSeenLegalChar = false; //set it if we have encountered legal character in text
let seenSpecialCharsToInclude = false; //set it if we have encountered //special character in text
let inBetweenSpecialChars = ""; // string containing special chars //which may be included in between legal word
for(let index = 0; index < text.length; index++){
charCode = text.charCodeAt(index);
let isSpecialChar = isSpecialCharacterCode(charCode);
let isWhitespace = isWhitespaceCode(charCode);
if(isSpecialChar && !isWhitespace){
//if this is a special character then two cases
//first is: It can be part of word (it is only possible if we have already seen atleast one legal character)
//Since it can be part of word but we are not sure whether this will be part of word so store it for now
//second is: This is either leading or trailing special character..we should not include these in word
if(haveSeenLegalChar){
inBetweenSpecialChars += text[index];
seenSpecialCharsToInclude = true;
}else{
//since we have not seen any legal character till now so it must be either leading or trailing special chars
seenSpecialCharsToInclude = false;
inBetweenSpecialChars = "";
}
}else if(isWhitespace){
//we have encountered a whitespace.This is either beginning of word or ending of word.
//if we have encountered any leagl char, push word into array
if(haveSeenLegalChar){
words.push(word);
word = "";
inBetweenSpecialChars = "";
}
haveSeenLegalChar = false;
}else if(!isSpecialChar){
//legal character case
haveSeenLegalChar = true;
if(seenSpecialCharsToInclude){
word += inBetweenSpecialChars;
seenSpecialCharsToInclude = false;
inBetweenSpecialChars = "";
}
word += text[index];
}
}
return words;
}

How to ignore dashes or hyphens in JS comparisons

I want to compare two input values on a page.
One input value is always entered with hyphens as spaces. ie "first-value"
The other input value is never entered with hyphens as spaces. ie "first value"
"first-test" == "first test"
this views them as different. Is there an operator that would view these as the same?
Dashes may come in more varieties thank you'd expect. Especially if people copy/paste their input from MS Word and the likes. For example, would you consider - or ‐ or ‑ or ‒ or – or — or ― all to be dashes? (they're all different unicode characters)
If the parts that you care about are only alphanumeric, you're better off stripping away everything else.
Do you regard first-test and firs ttest to be equal? If yes, then simply removing all non-alphanumeric chars will do:
str1 = str1.replace(/[^a-z0-9]/gi,'');
str2 = str2.replace(/[^a-z0-9]/gi,'');
var doMatch = (str1 == str2);
If no, then replace all non-alphanumeric parts with single spaces:
str1 = str1.replace(/[^a-z0-9]+/gi,' ');
str2 = str2.replace(/[^a-z0-9]+/gi,' ');
// trim to ignore space at begin or end
str1 = str1.replace(/^\s+|\s+$/g,'');
str2 = str2.replace(/^\s+|\s+$/g,'');
var doMatch = (str1 == str2);
This also allows for people copy/pasting values with an accidental extra space at the end. Which sometimes happens but is barely noticeable, and could cause lots of headaches if you consider that different.
var str1 = 'first-test';
var str2 = 'first test';
var doMatch = str1.replace('-', ' ') === str2.replace('-', ' '); // true
var variableWithHyphen = "variable-value-value";
var variable = "variable value value";
function areEqual(varOne, varTwo) {
var hypen = new RegExp('-', 'g');
return varTwo.replace(hypen, " ") === varOne.replace(hypen, " ");
}
alert(areEqual(variable, variableWithHyphen));

Categories