RegExp match word till space or character - javascript

I'm trying to match all the words starting with # and words between 2 # (see example)
var str = "#The test# rain in #SPAIN stays mainly in the #plain";
var res = str.match(/(#)[^\s]+/gi);
The result will be ["#The", "#SPAIN", "#plain"] but it should be ["#The test#", "#SPAIN", "#plain"]
Extra: would be nice if the result would be without the #.
Does anyone has a solution for this?

You can use
/#\w+(?:(?: +\w+)*#)?/g
See the demo here
The regex matches:
# - a hash symbol
\w+ - one or more alphanumeric and underscore characters
(?:(?: +\w+)*#)? - one or zero occurrence of:
(?: +\w+)* - zero or more occurrences of one or more spaces followed with one or more word characters followed with
# - a hash symbol
NOTE: If there can be characters other than word characters (those in the [A-Za-z0-9_] range), you can replace \w with [^ #]:
/#[^ #]+(?:(?: +[^ #]+)*#)?/g
See another demo
var re = /#[^ #]+(?:(?: +[^ #]+)*#)?/g;
var str = '#The test-mode# rain in #SPAIN stays mainly in the #plain #SPAIN has #the test# and more #here';
var m = str.match(re);
if (m) {
// Using ES6 Arrow functions
m = m.map(s => s.replace(/#$/g, ''));
// ES5 Equivalent
/*m = m.map(function(s) {
return s.replace(/#$/g, '');
});*/ // getting rid of the trailing #
document.body.innerHTML = "<pre>" + JSON.stringify(m, 0, 4) + "</pre>";
}

You can also try this regex.
#(?:\b[\s\S]*?\b#|\w+)
(?: opens a non capture group for alternation
\b matches a word boundary
\w matches a word character
[\s\S] matches any character
See demo at regex101 (use with g global flag)

Related

Replace not numbers or words to underscore but leave dash and remove spaces around it

So I got this string
'word word - word word 24/03/21'
And I would like to convert it to
'word_word-word_word_24_03_21'
I have tried this
replace(/[^aA-zZ0-9]/g, '_')
But I get this instead
word_word___word_word_24_03_21
You can use 2 .replace() calls:
const s = 'word word - word word 24/03/21'
var r = s.replace(/\s*-\s*/g, '-').replace(/[^-\w]+/g, '_')
console.log(r)
//=> "word_word-word_word_24_03_21"
Explanation:
.replace(/\s*-\s*/g, '-'): Remove surrounding spaces of a hyphen
.replace(/[^-\w]+/g, '_'): Replace all character that are not a hyphen and not a word character with an underscore
You can use
console.log(
'word word - word word 24/03/21'.replace(/\s*(-)\s*|[^\w-]+/g, (x,y) => y || "_")
)
Here,
/\s*(-)\s*|[^\w-]+/g - matches and captures into Group 1 a - enclosed with zero or more whitespaces, and just matches any non-word char excluding -
(x,y) => y || "_") - replaces with Group 1 if it was matched, and if not, replacement is a _ char.
With a function for replace and an alternation in the pattern, you could also match:
(\s*-\s*) Match a - between optional whtiespace chars
| Or
[^a-zA-Z0-9-]+ Match 1+ times any of the listed ranges
In the callback, check if group 1 exists. If it does, return only a -, else return _
Note that this notation [^aA-zZ0-9] is not the same as [a-zA-Z0-9], see what [A-z] matches.
let s = "word word - word word 24/03/21";
s = s.replace(/(\s*-\s*)|[^a-zA-Z0-9-]+/g, (_, g1) => g1 ? "-" : "_");
console.log(s);
You can use the + regex operator to replace 1 or more continuous matches at once.
let s = 'word word - word word 24/03/21';
let r = s
.replace(/[^aA-zZ0-9]*-[^aA-zZ0-9]*/g, '-')
.replace(/[^aA-zZ0-9-]+/g, '_');
console.log(r);
// 'word_word-word_word_24_03_21'

Finding exact words in text, excluding quoted words

In the javascript code below I need to find in a text exact words, but excluding the words that are between quotes. This is my attempt, what's wrong with the regex? It should find all the words excluding word22 and "word3". If I use only \b in the regex it selects exact words but it doesn't exclude the words between quotes.
var text = 'word1, word2, word22, "word3" and word4';
var words = [ 'word1', 'word2', 'word3' , 'word4' ];
words.forEach(function(word){
var re = new RegExp('\\b^"' + word + '^"\\b', 'i');
var pos = text.search(re);
if (pos > -1)
alert(word + " found in position " + pos);
});
First, we'll use a function to escape the characters of the word, just in case there's some that have special meaning for regexp.
// from https://stackoverflow.com/a/30851002/240443
function regExpEscape(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}
Then, we construct a regular expression as an alternation between individual word regexps. For each word, we assert that it starts with a word boundary, ends with a word boundary, and has an even number of quote characters between its end, and the end of string. (Note that from the end of word3, there is only one quote till the end of string, which is odd.)
let text = 'word1, word2, word22, "word3" and word4';
let words = [ 'word1', 'word2', 'word3' , 'word4' ];
let regexp = new RegExp(words.map(word =>
'\\b' + regExpEscape(word) + '\\b(?=(?:[^"]*"[^"]*")*[^"]*$)').join('|'), 'g')
text.match(regexp)
// => word1, word2, word4
while ((m = regexp.exec(text))) {
console.log(m[0], m.index);
}
// word1 0
// word2 7
// word4 34
EDIT: Actually, we can speed the regexp up a bit if we factor out the surrounding conditions:
let regexp = new RegExp(
'\\b(?:' +
words.map(regExpEscape).join('|') +
')\\b(?=(?:[^"]*"[^"]*")*[^"]*$)', 'g')
Your excluding of the quote character is wrong, that's actually matching the beginning of the string followed by a quote. Trying this instead
var re = new RegExp('\\b[^"]' + word + '[^"]\\b', 'i');
Also, this site is amazing to help you debug regex : https://regexpal.com
Edit: Because \b will match on quotation marks, this needs to be tweaked further. Unfortunately javascript doesn't support lookbehinds, so we have to get a little tricky.
var re = new RegExp('(?:^|[^"\\w])' + word + '(?:$|[^"\\w])','i')
So what this is doing is saying
(?: Don't capture this group
^ | [^"\w]) either match the start of the line, or any non word (alphanumeric and underscore) character that isn't a quote
word capture and match your word here
(?: Don't capture this group either
$|[^"\w) either match the end of the line, or any non word character that isn't a quote again

Non-capturing group matching whitespace boundaries in JavaScript regex

I have this function that finds whole words and should replace them. It identifies spaces but should not replace them, ie, not capture them.
function asd (sentence, word) {
str = sentence.replace(new RegExp('(?:^|\\s)' + word + '(?:$|\\s)'), "*****");
return str;
};
Then I have the following strings:
var sentence = "ich mag Äpfel";
var word = "Äpfel";
The result should be something like:
"ich mag *****"
and NOT:
"ich mag*****"
I'm getting the latter.
How can I make it so that it identifies the space but ignores it when replacing the word?
At first this may seem like a duplicate but I did not find an answer to this question, that's why I'm asking it.
Thank you
You should put back the matched whitespaces by using a capturing group (rather than a non-capturing one) with a replacement backreference in the replacement pattern, and you may also leverage a lookahead for the right whitespace boundary, which is handy in case of consecutive matches:
function asd (sentence, word) {
str = sentence.replace(new RegExp('(^|\\s)' + word + '(?=$|\\s)'), "$1*****");
return str;
};
var sentence = "ich mag Äpfel";
var word = "Äpfel";
console.log(asd(sentence, word));
See the regex demo.
Details
(^|\s) - Group 1 (later referred to with the help of a $1 placeholder in the replacement pattern): a capturing group that matches either start of string or a whitespace
Äpfel - a search word
(?=$|\s) - a positive lookahead that requires the end of string or whitespace immediately to the right of the current location.
NOTE: If the word can contain special regex metacharacters, escape them:
function asd (sentence, word) {
str = sentence.replace(new RegExp('(^|\\s)' + word.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + '(?=$|\\s)'), "$1*****");
return str;
};

What will be the regular expression for below requirement in javascript

Criteria:
any word that start with a and end with b having middle char digit. this word should not be on the line which start with char '#'
Given string:
a1b a2b a3b
#a4b a5b a6b
a7b a8b a9b
Expected output:
a1b
a2b
a3b
a7b
a8b
a9b
regex: ?i need it for javascipt.
So far tried below thing:
var text_content =above_mention_content
var reg_exp = /^[^#]?a[0-9]b/gmi;
var matched_text = text_content.match(reg_exp);
console.log(matched_text);
Getting below output:
[ 'a1b', ' a7b' ]
Your /^[^#]?a[0-9]b/gmi will match multiple occurrences of the pattern matching the start of line, then 1 or 0 chars other than #, then a, digit and b. No checking for a whole word, nor actually matching words farther than at the beginning of a string.
You may use a regex that will match lines starting with # and match and capture the words you need in other contexts:
var s = "a1b a2b a3b\n#a4b a5b a6b\n a7b a8b a9b";
var res = [];
s.replace(/^[^\S\r\n]*#.*|\b(a\db)\b/gm, function($0,$1) {
if ($1) res.push($1);
});
console.log(res);
Pattern details:
^ - start of a line (as m multiline modifier makes ^ match the line start)
[^\S\r\n]* - 0+ horizontal whitespaces
#.* - a # and any 0+ chars up to the end of a line
| - or
\b - a leading word boundary
(a\db) - Group 1 capturing a, a digit, a b
\b - a trailing word boundary.
Inside the replace() method, a callback is used where the res array is populated with the contents of Group 1 only.
I would suggest to use 2 reg ex:
First Reg ex fetches the non-hashed lines:
^[^#][a\db\s]+
and then another reg ex for fetching individual words(from each line):
^a\db\s

Cannot get all possible overlapping regular expression matches

I have string
Started: 11.11.2014 11:19:28.376<br/>Ended: 1.1.4<br/>1:9:8.378<br/>Request took: 0:0:0.2
I need to add zeros in case I encounter 1:1:8 it should be 01:01:08 same goes for date. I tried using
/((:|\.|\s)[0-9](:|\.))/g
but it did not give all possible overlapping matches. How to fix it?
var str = "Started: 11.11.2014 11:19:28.376<br/>Ended: 11.11.2014<br/>11:19:28.378<br/>Request took: 0:0:0.2";
var re = /((:|\.|\s)[0-9](:|\.))/g
while ((match = re.exec(str)) != null) {
//alert("match found at " + match.index);
str = [str.slice(0,match.index), '0', str.slice(match.index+1,str.length)];
}
alert(str);
This will probably do what you want:
str.replace(/\b\d\b/g, "0$&")
It searches for lone digits \d, and pad 0 in front.
The first word boundary \b checks that there is no [a-zA-Z0-9_] in front, and the second checks there is no [a-zA-Z0-9_] behind the digit.
$& in the replacement string refers to the whole match.
If you want to pad 0 as long as the character before and after are not digits:
str.replace(/(^|\D)(\d)(?!\d)/g, "$10$2")

Categories