Javascript: Replace needle in haystack that is not between two characters - javascript

I'd like to replace a needle in a haystack that is not between two characters.
So say I have a regular expression that is /\bneedle\b/g, that will match all needle's that are not next to any word character. I want to expand on this and include needle's that are not next to any word character, and NOT between two characters such as [ and ]
So the first three needles will match but the rest will not: needle: needle's needle [needle] [needle something] needle1 needler
Any idea how I'd go about doing this?

You can use this regex:
/(^|[^\w\[])(needle)(?![\w\]])/
RegEx Demo
Breakup:
(^|[^\w\[]) # line start OR a non-word non-[ character
(needle) # match and group our targeted text
(?![\w\]]) # negative lookahead to fail the match if next char is word char or [
As per comments below you can use following replace code:
var str = "needle: needle's needle [needle] [needle something] needle1 needler";
var reg = new RegExp("(^|[^\\w\\[])(needle)(?![\\w\\]])", "gi");
str = str.replace(reg, "$1*$2*");
//=> *needle*: *needle*'s *needle* [needle] [needle something] needle1 needler

Related

Finding exact words in text, excluding quoted words

In the javascript code below I need to find in a text exact words, but excluding the words that are between quotes. This is my attempt, what's wrong with the regex? It should find all the words excluding word22 and "word3". If I use only \b in the regex it selects exact words but it doesn't exclude the words between quotes.
var text = 'word1, word2, word22, "word3" and word4';
var words = [ 'word1', 'word2', 'word3' , 'word4' ];
words.forEach(function(word){
var re = new RegExp('\\b^"' + word + '^"\\b', 'i');
var pos = text.search(re);
if (pos > -1)
alert(word + " found in position " + pos);
});
First, we'll use a function to escape the characters of the word, just in case there's some that have special meaning for regexp.
// from https://stackoverflow.com/a/30851002/240443
function regExpEscape(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}
Then, we construct a regular expression as an alternation between individual word regexps. For each word, we assert that it starts with a word boundary, ends with a word boundary, and has an even number of quote characters between its end, and the end of string. (Note that from the end of word3, there is only one quote till the end of string, which is odd.)
let text = 'word1, word2, word22, "word3" and word4';
let words = [ 'word1', 'word2', 'word3' , 'word4' ];
let regexp = new RegExp(words.map(word =>
'\\b' + regExpEscape(word) + '\\b(?=(?:[^"]*"[^"]*")*[^"]*$)').join('|'), 'g')
text.match(regexp)
// => word1, word2, word4
while ((m = regexp.exec(text))) {
console.log(m[0], m.index);
}
// word1 0
// word2 7
// word4 34
EDIT: Actually, we can speed the regexp up a bit if we factor out the surrounding conditions:
let regexp = new RegExp(
'\\b(?:' +
words.map(regExpEscape).join('|') +
')\\b(?=(?:[^"]*"[^"]*")*[^"]*$)', 'g')
Your excluding of the quote character is wrong, that's actually matching the beginning of the string followed by a quote. Trying this instead
var re = new RegExp('\\b[^"]' + word + '[^"]\\b', 'i');
Also, this site is amazing to help you debug regex : https://regexpal.com
Edit: Because \b will match on quotation marks, this needs to be tweaked further. Unfortunately javascript doesn't support lookbehinds, so we have to get a little tricky.
var re = new RegExp('(?:^|[^"\\w])' + word + '(?:$|[^"\\w])','i')
So what this is doing is saying
(?: Don't capture this group
^ | [^"\w]) either match the start of the line, or any non word (alphanumeric and underscore) character that isn't a quote
word capture and match your word here
(?: Don't capture this group either
$|[^"\w) either match the end of the line, or any non word character that isn't a quote again

exec from regex returns null

This is my code snippet:
var myString = '#EXTINF:-1 group-title="|FR| CHAINES FRANÇAISES |FR|",|FR|*****CANALSAT*****|FR|';
var group_title = /(group-title=")(\S*)["]/;
var matchgroup_title = group_title.exec(myString);
console.log(matchgroup_title);
I am not familiar to regex so I can't understand this code. Why does exec here return null?
The problem exists because of \S*. If you pop it into regex101.com, you'll see:
So, \s* will match any non-whitespace character, and a space is considered a whitespace character, so this does not match. You can simply use [^"] to check for anything that isn't another quote.
You can simplify this to:
var myString = '#EXTINF:-1 group-title="|FR| CHAINES FRANÇAISES |FR|",|FR|*****CANALSAT*****|FR|';
var group_title = /group-title="([^"]+)"/;
var matchgroup_title = group_title.exec(myString);
console.log(matchgroup_title);
(group-title=") matches group-title="
(\S*) matches zero or more non-whitespace characters, so |FR|
["] then matches a ", but there isn't one (well, there is, but there are whitespace characters first, so no match).

Return word before or after a string with newline characters

In short: I want to return the word right before or after a newline character in a string. How would I accomplish that?
I want to return: 1,150 and Svendborg
This is my string:
var newline = /\n/;
var str = "Specialzed Road Expert 2017\nkr.1,150 - Svendborg\n\nSpecialzed"
This will essentially match a whole line with a leading and trailing newline character with groups to match just the first and last "words".
var str = "Specialzed Road Expert 2017\nkr.1,150 - Svendborg\n\nSpecialzed";
var matches = str.match(/\n([^\s]+).*?([^\s]+)\n/);
console.log(matches);
Your words would be in matches[1] and matches[2] with matches[0] being the whole line.

What will be the regular expression for below requirement in javascript

Criteria:
any word that start with a and end with b having middle char digit. this word should not be on the line which start with char '#'
Given string:
a1b a2b a3b
#a4b a5b a6b
a7b a8b a9b
Expected output:
a1b
a2b
a3b
a7b
a8b
a9b
regex: ?i need it for javascipt.
So far tried below thing:
var text_content =above_mention_content
var reg_exp = /^[^#]?a[0-9]b/gmi;
var matched_text = text_content.match(reg_exp);
console.log(matched_text);
Getting below output:
[ 'a1b', ' a7b' ]
Your /^[^#]?a[0-9]b/gmi will match multiple occurrences of the pattern matching the start of line, then 1 or 0 chars other than #, then a, digit and b. No checking for a whole word, nor actually matching words farther than at the beginning of a string.
You may use a regex that will match lines starting with # and match and capture the words you need in other contexts:
var s = "a1b a2b a3b\n#a4b a5b a6b\n a7b a8b a9b";
var res = [];
s.replace(/^[^\S\r\n]*#.*|\b(a\db)\b/gm, function($0,$1) {
if ($1) res.push($1);
});
console.log(res);
Pattern details:
^ - start of a line (as m multiline modifier makes ^ match the line start)
[^\S\r\n]* - 0+ horizontal whitespaces
#.* - a # and any 0+ chars up to the end of a line
| - or
\b - a leading word boundary
(a\db) - Group 1 capturing a, a digit, a b
\b - a trailing word boundary.
Inside the replace() method, a callback is used where the res array is populated with the contents of Group 1 only.
I would suggest to use 2 reg ex:
First Reg ex fetches the non-hashed lines:
^[^#][a\db\s]+
and then another reg ex for fetching individual words(from each line):
^a\db\s

RegExp match word till space or character

I'm trying to match all the words starting with # and words between 2 # (see example)
var str = "#The test# rain in #SPAIN stays mainly in the #plain";
var res = str.match(/(#)[^\s]+/gi);
The result will be ["#The", "#SPAIN", "#plain"] but it should be ["#The test#", "#SPAIN", "#plain"]
Extra: would be nice if the result would be without the #.
Does anyone has a solution for this?
You can use
/#\w+(?:(?: +\w+)*#)?/g
See the demo here
The regex matches:
# - a hash symbol
\w+ - one or more alphanumeric and underscore characters
(?:(?: +\w+)*#)? - one or zero occurrence of:
(?: +\w+)* - zero or more occurrences of one or more spaces followed with one or more word characters followed with
# - a hash symbol
NOTE: If there can be characters other than word characters (those in the [A-Za-z0-9_] range), you can replace \w with [^ #]:
/#[^ #]+(?:(?: +[^ #]+)*#)?/g
See another demo
var re = /#[^ #]+(?:(?: +[^ #]+)*#)?/g;
var str = '#The test-mode# rain in #SPAIN stays mainly in the #plain #SPAIN has #the test# and more #here';
var m = str.match(re);
if (m) {
// Using ES6 Arrow functions
m = m.map(s => s.replace(/#$/g, ''));
// ES5 Equivalent
/*m = m.map(function(s) {
return s.replace(/#$/g, '');
});*/ // getting rid of the trailing #
document.body.innerHTML = "<pre>" + JSON.stringify(m, 0, 4) + "</pre>";
}
You can also try this regex.
#(?:\b[\s\S]*?\b#|\w+)
(?: opens a non capture group for alternation
\b matches a word boundary
\w matches a word character
[\s\S] matches any character
See demo at regex101 (use with g global flag)

Categories