RegExp to match any string except two reserved strings? - javascript

Probably a simple one but my knowledge of creating regular expressions is a little vague.
I'm trying to match any string followed by a comma and a space except if it is 'Bair Hugger' or 'Fluid Warmer'
Here is what I have so far
var re_comma = new RegExp("\w+[^Bair Hugger|Fluid Warmer]" + ", ", "i");
Any ideas?

New answer
Regarding your example I'd say it is really easier to split the string and iterate over it:
function filter(str, delim, test) {
var parts = str.split(delim),
result = [];
for(var i = 0, len = parts.length; i < len; i++) {
if(test(parts[i])) result.push(parts[i]);
}
return result.join(delim);
}
str = filter(str, ', ', function(s) {
s = s.toLowerCase();
return s === 'bair hugger' || s === 'fluid warmer';
});
Otherwise, your expression becomes something like this:
new RegExp("(^|, )(?!(?:Bair Hugger|Fluid Warmer)(?:$|, )).+?(, |$)", "i");
and you have to use a callback for the replacement to decide whether to remove the preceding , or trailing , or not:
str = str.replace(re_comma, function(str, pre, tail) {
return pre && tail ? tail : '';// middle of the string, leave one
});
The intention of this code is less clear. Maybe there is a simpler expression, but I think filtering the array is still cleaner.
Old answer: (doesn't solve the problem at hand but provides information regarding regular expressions).
[] denotes a character class and will only match one character out of the ones you provided. [^Bair Hugger|Fluid Warmer] is the same as [^Bair Huge|FldWm].
You could use a negative lookahead:
new RegExp("^(?!(Bair Hugger|Fluid Warmer), ).+?, $", "i");
Note that you have to use \\ inside a string to produce one \. Otherwise, "\w" becomes w and is not a special character sequence anymore.You also have to anchor the expression.
Update: As you mentioned you want to match any string before the comma, I decided to use . instead of \w, to match any character.

I love regex and Felix Kling answer is correct :)
However for such simple matching I would normally use something like below
function contains(str, text) {
return str.indexOf(text) >= 0;
}
if(contains(myString, 'random')) {
//myString contains "random"
}

Solution:
reg =/(?:Bair Hugger|Fluid Warmer),| (.*)/
str='Bair Hugger, lalala'
reg.exec(str)
>> ["Bair Hugger, lalala", " lalala"]
newStr = reg.exec(str)[1]

Related

Regex - match the better part of a word in a search string

I am using Javascript and currently looking for a way to match as many of my pattern's letters as possible, maintaining the original order..
For example a search pattern queued should return the march Queue/queue against the any of the following search strings:
queueTable
scheduledQueueTable
qScheduledQueueTable
As of now I've reached as far as this:
var myregex = new RegExp("([queued])", "i");
var result = myregex.exec('queueTable');
but it doesn't seem to work correctly as it highlights the single characters q,u,e,u,e and e at the end of the word Table.
Any ideas?
Generate the regex with optional non-capturing group part where regex pattern can be generate using Array#reduceRight method.
var myregex = new RegExp("queued"
.split('')
.reduceRight(function(str, s) {
return '(?:' + s + str + ')?';
}, ''), "i");
var result = myregex.exec('queueTable');
console.log(result)
The method generates regex : /(?:q(?:u(?:e(?:u(?:e(?:d?)?)?)?)?)?)?/
UPDATE : If you want to get the first longest match then use g modifier in regex and find out the largest using Array#reduce method.
var myregex = new RegExp(
"queued".split('')
.reduceRight(function(str, s) {
return '(?:' + s + str + ')?';
}, ''), "ig");
var result = 'qscheduledQueueTable'
.match(myregex)
.reduce(function(a, b) {
return a.length > b.length ? a : b;
});
console.log(result);
I think the logic would have to be something like:
Match as many of these letters as possible, in this order.
The only real answer that comes to mind is to get the match to continue if possible, but allow it to bail out. In this case...
myregex = /q(?:u(?:e(?:u(?:e(?:d|)|)|)|)|)/;
You can generate this, of course:
function matchAsMuchAsPossible(word) { // name me something sensible please!
return new RegExp(
word.split("").join("(?:")
+ (new Array(word.length).join("|)"))
);
}
You are using square brackets - which mean that it will match a single instance of any character listed inside.
There are a few ways of interpreting your intentions:
You want to match the word queue with an optional 'd' at the end:
var myregex = new RegExp("queued?", "i");
var result = myregex.exec('queueTable');
Note this can be shorter try this:
'queueTable'.match(/queued?/i);
I also removed the brackets as these were not adding anything here.
This link provides some good examples that may help you further: https://www.w3schools.com/js/js_regexp.asp
When you use [] in a regular expression, it means you want to match any of the characters inside the brackets.
Example: if I use [abc] it means "match a single character, and this character can be 'a', 'b' or 'c'"
So in your code [queued] means "match a single character, and this character can be 'q', 'u', 'e' or 'd'" - note that 'u' and 'e' appear twice so they are redundant in this case. That's why this expression matches just one single character.
If you want to match the whole string "queued", just remove the brackets. But in this case it won't match, because queueTable doesn't have 'd'. If you want 'd' to be optional, you can use queued? as already explained in previous answers.
Try something like the following :
var myregex = /queued?\B/g;
var result = myregex.exec('queueTable');
console.log(result);

Remove punctuation, retain spaces, toLowerCase, add dashes succinctly

I need to do the following to a string:
Remove any punctuation (but retain spaces) (can include removal of foreign chars)
Add dashes instead of spaces
toLowercase
I'd like to be able to do this as succinctly as possible, so on one line for example.
At the moment I have:
const ele = str.replace(/[^\w\s]/, '').replace(/\s+/g, '-').toLowerCase();
Few problems I'm having. Firstly the line above is syntactically incorrect. I think it's a problem with /[^\w\s] but I am not sure what I've done wrong.
Secondly I wonder if it is possible to write a regex statement that removes the punctuation AND converts spaces to dashes?
And example of what I want to change:
Where to? = where-to
Destination(s) = destinations
Travel dates?: = travel-dates
EDIT: I have updated the missing / from the first regex replace. I am finding that Destination(s) is becoming destinations) which is peculiar.
Codepen: http://codepen.io/anon/pen/mAdXJm?editors=0011
You may use the following regex to only match ASCII punctuation and some symbols (source) - maybe we should remove _ from it:
var punct = /[!"#$%&'()*+,.\/:;<=>?#\[\\\]^`{|}~-]+/g;
or a more contracted one since some of these symbols appear in the ASCII table as consecutive chars:
var punct = /[!-\/:-#\[-^`{-~]+/g;
You may chain 2 regex replacements.
var punct = /[!"#$%&'()*+,.\/:;<=>?#\[\\\]^`{|}~-]+/g;
var s = "Where to?"; // = where-to
console.log(s.replace(punct, '').replace(/\s+/, '-').toLowerCase());
s = "Destination(s)"; // = destinations
console.log(s.replace(punct, '').replace(/\s+/, '-').toLowerCase());
console.log(s.replace(punct, '').replace(/\s+/, '-').toLowerCase());
Or use an anonymous method inside the replace with arrow functions (less compatibility, but succint):
var s="Travel dates?:"; // = travel-dates
var o=/([!-\/:-#\[-^`{-~]+)|\s+/g;
console.log(s.replace(o,(m,g)=>g?'':'-').toLowerCase());
Note you may also use XRegExp to match any Unicode punctuation with \pP construct.
Wiktor touched on the subject, but my first thought was an anonymous function using the regex /(\s+)|([\W])/g like this:
var inputs = ['Where to?', 'Destination(s)', 'Travel dates?:'],
res,
idx;
for( idx=0; idx<inputs.length; idx++ ) {
res = inputs[idx].replace(/(\s+)|([\W])/g, function(a, b) {return b ? '-' : '';}).toLowerCase();
document.getElementById('output').innerHTML += '"' + inputs[idx] + '" -> "'
+ res + '"<br/>';
}
<!DOCTYPE html>
<html>
<body>
<p id='output'></p>
</body>
</html>
The regex captures either white space (1+) or a non-word characters. If the first is true the anonymous function returns -, otherwise an empty string.

Retrieving several capturing groups recursively with RegExp

I have a string with this format:
#someID#tn#company#somethingNew#classing#somethingElse#With
There might be unlimited #-separated words, but definitely the whole string begins with #
I have written the following regexp, though it matches it, but I cannot get each #-separated word, and what I get is the last recursion and the first (as well as the whole string). How can I get an array of every word in an element separately?
(?:^\#\w*)(?:(\#\w*)+) //I know I have ruled out second capturing group with ?: , though doesn't make much difference.
And here is my Javascript code:
var reg = /(?:^\#\w*)(?:(\#\w*)+)/g;
var x = null;
while(x = reg.exec("#someID#tn#company#somethingNew#classing#somethingElse#With"))
{
console.log(x);
}
And here is the result (Firebug, console):
["#someID#tn#company#somet...sing#somethingElse#With", "#With"]
0
"#someID#tn#company#somet...sing#somethingElse#With"
1
"#With"
index
0
input
"#someID#tn#company#somet...sing#somethingElse#With"
EDIT :
I want an output like this with regular expression if possible:
["#someID", "#tn", #company", "#somethingNew", "#classing", "#somethingElse", "#With"]
NOTE that I want a RegExp solution. I know about String.split() and String operations.
You can use:
var s = '#someID#tn#company#somethingNew#classing#somethingElse#With'
if (s.substr(0, 1) == "#")
tok = s.substr(1).split('#');
//=> ["someID", "tn", "company", "somethingNew", "classing", "somethingElse", "With"]
You could try this regex also,
((?:#|#)\w+)
DEMO
Explanation:
() Capturing groups. Anything inside this capturing group would be captured.
(?:) It just matches the strings but won't capture anything.
#|# Literal # or # symbol.
\w+ Followed by one or more word characters.
OR
> "#someID#tn#company#somethingNew#classing#somethingElse#With".split(/\b(?=#|#)/g);
[ '#someID',
'#tn',
'#company',
'#somethingNew',
'#classing',
'#somethingElse',
'#With' ]
It will be easier without regExp:
var str = "#someID#tn#company#somethingNew#classing#somethingElse#With";
var strSplit = str.split("#");
for(var i = 1; i < strSplit.length; i++) {
strSplit[i] = "#" + strSplit[i];
}
console.log(strSplit);
// ["#someID", "#tn", "#company", "#somethingNew", "#classing", "#somethingElse", "#With"]

Javascript split function not correct worked with specific regex

I have a problem. I have a string - "\,str\,i,ing" and i need to split by comma before which not have slash. For my string - ["\,str\,i", "ing"]. I'm use next regex
myString.split("[^\],", 2)
but it's doesn't worked.
Well, this is ridiculous to avoid the lack of lookbehind but seems to get the correct result.
"\\,str\\,i,ing".split('').reverse().join('').split(/,(?=[^\\])/).map(function(a){
return a.split('').reverse().join('');
}).reverse();
//=> ["\,str\,i", "ing"]
Not sure about your expected output but you are specifying string not a regex, use:
var arr = "\,str\,i,ing".split(/[^\\],/, 2);
console.log(arr);
To split using regex, wrap your regex in /..../
This is not easily possible with js, because it does not support lookbehind. Even if you'd use a real regex, it would eat the last character:
> "xyz\\,xyz,xyz".split(/[^\\],/, 2)
["xyz\\,xy", "xyz"]
If you don't want the z to be eaten, I'd suggest:
var str = "....";
return str.split(",").reduce(function(res, part) {
var l = res.length;
if (l && res[l-1].substr(-1) == "\\" || l<2)
// ^ ^^ ^
// not the first was escaped limit
res[l-1] += ","+part;
else
res.push(part);
return;
}, []);
Reading between the lines, it looks like you want to split a string by , characters that are not preceded by \ characters.
It would be really great if JavaScript had a regular expression lookbehind (and negative lookbehind) pattern, but unfortunately it does not. What it does have is a lookahead ((?=) )and negative lookahead ((?!)) pattern. Make sure to review the documentation.
You can use these as a lookbehind if you reverse the string:
var str,
reverseStr,
arr,
reverseArr;
//don't forget to escape your backslashes
str = '\\,str\\,i,ing';
//reverse your string
reverseStr = str.split('').reverse().join('');
//split the array on `,`s that aren't followed by `\`
reverseArr = reverseStr.split(/,(?!\\)/);
//reverse the reversed array, and reverse each string in the array
arr = reverseArr.reverse().map(function (val) {
return val.split('').reverse().join('');
});
You picked a tough character to match- a forward slash preceding a comma is apt to disappear while you pass it around in a string, since '\,'==','...
var s= 'My dog, the one with two \\, blue \\,eyes, is asleep.';
var a= [], M, rx=/(\\?),/g;
while((M= rx.exec(s))!= null){
if(M[1]) continue;
a.push(s.substring(0, rx.lastIndex-1));
s= s.substring(rx.lastIndex);
rx.lastIndex= 0;
};
a.push(s);
/* returned value: (Array)
My dog
the one with two \, blue \,eyes
is asleep.
*/
Find something which will not be present in your original string, say "###". Replace "\\," with it. Split the resulting string by ",". Replace "###" back with "\\,".
Something like this:
<script type="text/javascript">
var s1 = "\\,str\\,i,ing";
var s2 = s1.replace(/\\,/g,"###");
console.log(s2);
var s3 = s2.split(",");
for (var i=0;i<s3.length;i++)
{
s3[i] = s3[i].replace(/###/g,"\\,");
}
console.log(s3);
</script>
See JSFiddle

Case insensitive string replacement in JavaScript?

I need to highlight, case insensitively, given keywords in a JavaScript string.
For example:
highlight("foobar Foo bar FOO", "foo") should return "<b>foo</b>bar <b>Foo</b> bar <b>FOO</b>"
I need the code to work for any keyword, and therefore using a hardcoded regular expression like /foo/i is not a sufficient solution.
What is the easiest way to do this?
(This an instance of a more general problem detailed in the title, but I feel that it's best to tackle with a concrete, useful example.)
You can use regular expressions if you prepare the search string. In PHP e.g. there is a function preg_quote, which replaces all regex-chars in a string with their escaped versions.
Here is such a function for javascript (source):
function preg_quote (str, delimiter) {
// discuss at: https://locutus.io/php/preg_quote/
// original by: booeyOH
// improved by: Ates Goral (https://magnetiq.com)
// improved by: Kevin van Zonneveld (https://kvz.io)
// improved by: Brett Zamir (https://brett-zamir.me)
// bugfixed by: Onno Marsman (https://twitter.com/onnomarsman)
// example 1: preg_quote("$40")
// returns 1: '\\$40'
// example 2: preg_quote("*RRRING* Hello?")
// returns 2: '\\*RRRING\\* Hello\\?'
// example 3: preg_quote("\\.+*?[^]$(){}=!<>|:")
// returns 3: '\\\\\\.\\+\\*\\?\\[\\^\\]\\$\\(\\)\\{\\}\\=\\!\\<\\>\\|\\:'
return (str + '')
.replace(new RegExp('[.\\\\+*?\\[\\^\\]$(){}=!<>|:\\' + (delimiter || '') + '-]', 'g'), '\\$&')
}
So you could do the following:
function highlight(str, search) {
return str.replace(new RegExp("(" + preg_quote(search) + ")", 'gi'), "<b>$1</b>");
}
function highlightWords( line, word )
{
var regex = new RegExp( '(' + word + ')', 'gi' );
return line.replace( regex, "<b>$1</b>" );
}
You can enhance the RegExp object with a function that does special character escaping for you:
RegExp.escape = function(str)
{
var specials = /[.*+?|()\[\]{}\\$^]/g; // .*+?|()[]{}\$^
return str.replace(specials, "\\$&");
}
Then you would be able to use what the others suggested without any worries:
function highlightWordsNoCase(line, word)
{
var regex = new RegExp("(" + RegExp.escape(word) + ")", "gi");
return line.replace(regex, "<b>$1</b>");
}
Regular expressions are fine as long as keywords are really words, you can just use a RegExp constructor instead of a literal to create one from a variable:
var re= new RegExp('('+word+')', 'gi');
return s.replace(re, '<b>$1</b>');
The difficulty arises if ‘keywords’ can have punctuation in, as punctuation tends to have special meaning in regexps. Unfortunately unlike most other languages/libraries with regexp support, there is no standard function to escape punctation for regexps in JavaScript.
And you can't be totally sure exactly what characters need escaping because not every browser's implementation of regexp is guaranteed to be exactly the same. (In particular, newer browsers may add new functionality.) And backslash-escaping characters that are not special is not guaranteed to still work, although in practice it does.
So about the best you can do is one of:
attempting to catch each special character in common browser use today [add: see Sebastian's recipe]
backslash-escape all non-alphanumerics. care: \W will also match non-ASCII Unicode characters, which you don't really want.
just ensure that there are no non-alphanumerics in the keyword before searching
If you are using this to highlight words in HTML which already has markup in, though, you've got trouble. Your ‘word’ might appear in an element name or attribute value, in which case attempting to wrap a < b> around it will cause brokenness. In more complicated scenarios possibly even an HTML-injection to XSS security hole. If you have to cope with markup you will need a more complicated approach, splitting out ‘< ... >’ markup before attempting to process each stretch of text on its own.
What about something like this:
if(typeof String.prototype.highlight !== 'function') {
String.prototype.highlight = function(match, spanClass) {
var pattern = new RegExp( match, "gi" );
replacement = "<span class='" + spanClass + "'>$&</span>";
return this.replace(pattern, replacement);
}
}
This could then be called like so:
var result = "The Quick Brown Fox Jumped Over The Lazy Brown Dog".highlight("brown","text-highlight");
For those poor with disregexia or regexophobia:
function replacei(str, sub, f){
let A = str.toLowerCase().split(sub.toLowerCase());
let B = [];
let x = 0;
for (let i = 0; i < A.length; i++) {
let n = A[i].length;
B.push(str.substr(x, n));
if (i < A.length-1)
B.push(f(str.substr(x + n, sub.length)));
x += n + sub.length;
}
return B.join('');
}
s = 'Foo and FOO (and foo) are all -- Foo.'
t = replacei(s, 'Foo', sub=>'<'+sub+'>')
console.log(t)
Output:
<Foo> and <FOO> (and <foo>) are all -- <Foo>.
Why not just create a new regex on each call to your function? You can use:
new Regex([pat], [flags])
where [pat] is a string for the pattern, and [flags] are the flags.

Categories