javascript regex replace all occurrences, using variables & dynamic strings [duplicate] - javascript

I am trying to search a single whole word through a textbox. Say I search "me", I should find all occurrences of the word "me" in the text, but not "memmm" per say.
I am using JavaScript's search('my regex expression') to perform the current search (with no success).
After several proposals to use the \b switches (which don't seem to work) I am posting a revised explanation of my problem:
For some reason this doesn't seem to do the trick. Assume the following JavaScript search text:
var lookup = '\n\n\n\n\n\n2 PC Games \n\n\n\n';
lookup = lookup.trim() ;
alert(lookup );
var tttt = 'tttt';
alert((/\b(lookup)\b/g).test(2));
Moving lines is essential

To use a dynamic regular expression see my updated code:
new RegExp("\\b" + lookup + "\\b").test(textbox.value)
Your specific example is backwards:
alert((/\b(2)\b/g).test(lookup));
Regexpal
Regex Object

Use the word boundary assertion \b:
/\bme\b/

You may use the following code:
var stringTosearch ="test ,string, test"; //true
var stringTosearch ="test string test"; //true
var stringTosearch ="test stringtest"; //false
var stringTosearch ="teststring test"; //false
if (new RegExp("\\b"+"string"+"\\b").test(stringTosearch)) {
console.log('string found');
return true;
} else {
return false;
}

<script type='text/javascript'>
var lookup = '\n\n\n\n\n\n2 PC Games \n\n\n\n';
lookup = lookup.trim() ;
alert(lookup );
var tttt = 'tttt';
alert((/\b(lookup)\b/g).test(2));
</script>
It's a bit hard to tell what you're trying to do here. What is the tttt variable supposed to do?
Which string are you trying to search in? Are you trying to look for 2 within the string lookup? Then you would want:
/\b2\b/.test(lookup)
The following, from your regular expression, constructs a regular expression that consists of a word boundary, followed by the string "lookup" (not the value contained in the variable lookup), followed by a word boundary. It then tries to match this regular expression against the string "2", obtained by converting the number 2 to a string:
(/\b(lookup)\b/g).test(2)
For instance, the following returns true:
(/\b(lookup)\b/g).test("something to lookup somewhere")

Related

How can I inverse matched result of the pattern?

Here is my string:
Organization 2
info#something.org.au more#something.com market#gmail.com single#noidea.com
Organization 3
headmistress#money.com head#skull.com
Also this is my pattern:
/^.*?#[^ ]+|^.*$/gm
As you see in the demo, the pattern matches this:
Organization 2
info#something.org.au
Organization 3
headmistress#money.com
My question: How can I make it inverse? I mean I want to match this:
more#something.com market#gmail.com single#noidea.com
head#skull.com
How can I do that? Actually I can write a new (and completely different) pattern to grab expected result, but I want to know, Is "inverting the result of a pattern" possible?
No, I don't believe there is a way to directly inverse a Regular Expression but keeping it the same otherwise.
However, you could achieve something close to what you're after by using your existing RegExp to replace its matches with an empty string:
var everythingThatDidntMatchStr = str.replace(/^.*?#[^ ]+|^.*$/gm, '');
You can replace the matches from first RegExp by using Array.prototype.forEach() to replace matched RegExp with empty string using `String.ptototype.replace();
var re = str.match(/^.*?#[^ ]+|^.*$/gm);
var res = str;
re.forEach(val => res = res.replace(new RegExp(val), ""));

Splitting a string at special character with JavaScript

I am trying to "intelligently" pre-fill a form, I want to prefill the firstname and lastname inputs based on a user email address, so for example,
jon.doe#email.com RETURNS Jon Doe
jon_doe#email.com RETURN Jon Doe
jon-doe#email.com RETURNS Jon Doe
I have managed to get the string before the #,
var email = letters.substr(0, letters.indexOf('#'));
But cant work out how to split() when the separator can be multiple values, I can do this,
email.split("_")
but how can I split on other email address valid special characters?
JavaScript's string split method can take a regex.
For example the following will split on ., -, and _.
"i-am_john.doe".split(/[.\-_]/)
Returning the following.
["i", "am", "john", "doe"]
You can use a regular expression for what you want to split on. You can for example split on anything that isn't a letter:
var parts = email.split(/[^A-Za-z]/);
Demo: http://jsfiddle.net/Guffa/xt3Lb9e6/
You can split a string using a regular expression. To match ., _ or -, you can use a character class, for example [.\-_]. The syntax for regular expressions in JavaScript is /expression/, so your example would look like:
email.split(/[\.\-_]/);
Note that the backslashes are to prevent . and - being interpreted as special characters. . is a special character class representing any character. In a character class, - can be used to specify ranges, such as [a-z].
If you require a dynamic list of characters to split on, you can build a regular expression using the RegExp constructor. For example:
var specialChars = ['.', '\\-', '_'];
var specialRegex = new RegExp('[' + specialChars.join('') + ']');
email.split(specialRegex);
More information on regular expressions in JavaScript can be found on MDN.
Regular Expressions --
email.split(/[_\.-]/)
This one matches (therefore splits at) any of (a character set, indicated by []) _, ., or -.
Here's a good resource for learning regular expressions: http://qntm.org/files/re/re.html
You can use regex to do it, just provide a list of the characters in square brackets and escape if necessary.
email.split("[_-\.]");
Is that what you mean?
You are correct that you need to use the split function.
Split function works by taking an argument to split the string on. Multiple values can be split via regular expression. For you usage, try something like
var re = /[\._\-]/;
var split = email.split(re, 2);
This should result in an array with two values, first/second name. The second argument is the number of elements returned.
I created a jsFiddle to show how this could be done :
function printName(email){
var name = email.split('#')[0];
// source : http://stackoverflow.com/questions/650022/how-do-i-split-a-string-with-multiple-separators-in-javascript
var returnVal = name.split(/[._-]/g);
return returnVal;
}
http://jsfiddle.net/ts6nx9tt/1/
If you define your seperators, below code can return all alternatives for you.
var arr = ["_",".","-"];
var email = letters.substr(0, letters.indexOf('#'));
arr.map(function(val,index,rest){
var r = email.split(val);
if(r.length > 1){
return r.join(' ');
}
return "";
}
);

How to split a long regular expression into multiple lines in JavaScript?

I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)
[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();
Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.
Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.
There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g
The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');
You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);
I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);
Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");
#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i

How can I get a substring located between 2 quotes?

I have a string that looks like this: "the word you need is 'hello' ".
What's the best way to put 'hello' (but without the quotes) into a javascript variable? I imagine that the way to do this is with regex (which I know very little about) ?
Any help appreciated!
Use match():
> var s = "the word you need is 'hello' ";
> s.match(/'([^']+)'/)[1];
"hello"
This will match a starting ', followed by anything except ', and then the closing ', storing everything in between in the first captured group.
http://jsfiddle.net/Bbh6P/
var mystring = "the word you need is 'hello'"
var matches = mystring.match(/\'(.*?)\'/); //returns array
​alert(matches[1]);​
If you want to avoid regular expressions then you can use .split("'") to split the string at single quotes , then use jquery.map() to return just the odd indexed substrings, ie. an array of all single-quoted substrings.
var str = "the word you need is 'hello'";
var singleQuoted = $.map(str.split("'"), function(substr, i) {
return (i % 2) ? substr : null;
});
DEMO
CAUTION
This and other methods will get it wrong if one or more apostrophes (same as single quote) appear in the original string.

How do I make a regular expression that matches everything on a line after a given character?

If I have a String in JavaScript
key=value
How do I make a RegEx that matches key excluding =?
In other words:
var regex = //Regular Expression goes here
regex.exec("key=value")[0]//Should be "key"
How do I make a RegEx that matches value excluding =?
I am using this code to define a language for the Prism syntax highlighter so I do not control the JavaScript code doing the Regular Expression matching nor can I use split.
Well, you could do this:
/^[^=]*/ // anything not containing = at the start of a line
/[^=]*$/ // anything not containing = at the end of a line
It might be better to look into Prism's lookbehind property, and use something like this:
{
'pattern': /(=).*$/,
'lookbehind': true
}
According to the documentation this would cause the = character not to be part of the token this pattern matches.
use this regex (^.+?)=(.+?$)
group 1 contain key
group 2 contain value
but split is better solution
.*=(.*)
This will match anything after =
(.*)=.*
This will match anything before =
Look into greedy vs ungreedy quantifiers if you expect more than one = character.
Edit: as OP has clarified they're using javascript:
var str = "key=value";
var n=str.match(/(.*)=/i)[1]; // before =
var n=str.match(/=(.*)/i)[1]; // after =
var regex = /^[^=]*/;
regex.exec("key=value");

Categories