Javascript Regexp Duplicate Line Matching not working correctly

Javascript Regexp Duplicate Line Matching not working correctly - javascript

I am writing a Javascript code to parse some grammar files, it is quite some code but I will post relevant information here. I am using Javascript Regexp in order to match a duplicate line held within a string. The string contains, for example (assume the string name is lines):
if
else
;
print
{
}
test1
test1
=
+
-
*
/
(
)
num
string
comment
id
test2
test2
What should happen, is a match found on 'test1' and 'test2'. It should then delete the duplicate, leaving 1 instance of test1 and test2. What is happening is no match at all. I am confident in my regex but javascript may be doing something I am not expecting. Here is the code doing the work on the string given above:
var rex = new RegExp("(.*)(\r?\n\1)+","g");
var re = '/(.*)(\r?\n\1)+/g';
rex.lastIndex = 0;
var m = rex.exec(lines);
if (m) {
alert("Found Duplicate");
var linenum = lines.search(re); //Get line number of error
alert("Error: Symbol Defined twice\n");
alert("Error occured on line: " + linenum);
lines = lines.replace(rex,""); //Gets rid of the duplicate
}
It never gets into the if(m) statement. Therefore no match is found. I tested the regex here: http://regexpal.com/ using the regex in my code as well as the example text provided. It matches just fine, so I am at kind of a loss. If anyone can help, it would be great.
Thank you.
Edit:
Forgot to add, I am testing this in firefox, and it only has to work in firefox. Not sure if that matters.

First error: \ in a JS string is also an escape character.
var rex = new RegExp("(.*)(\r?\n\1)+","g");
should be written
var rex = new RegExp("(.*)(\\r?\\n\\1)+","g");
// or, shorter:
var rex = /(.*)(\r?\n\1)+/g;
if you want to make it work. In the case of the RegExp constructor, you’re passing the pattern as a string to the constructor function. This means you need to escape each \ backslash that occurs in the pattern. If you use a regexp literal, you don’t need to escape them, since they’re not in a string, but retain their ‘normal’ properties in the regexp pattern.
Second error, your expression
var re = '/(.*)(\r?\n\1)+/g';
is wrong. What you’re doing here is assigning a string literal to a variable. I’m assuming you meant to assign a regular expression literal, which should be written like this:
var re = /(.*)(\r?\n\1)+/g;
Third error: the last line
lines = lines.replace(rex,""); //Gets rid of the duplicate
removes both instances of all duplicate lines! If you want to keep the first instance of each duplicate, you should use
lines = lines.replace(rex, "$1");
And finally, this method only detects two consecutive identical lines. Is that what you want, or do you need to detect any duplicates, wherever they may be?

var str = 'if\nelse\n;\nprint\n{\n}\ntest1\ntest1\n=\n+\n-\n*\n/\n(\n)\nnum\nstring\ncomment\nid\ntest2\ntest2\ntest2\ntest2\ntest2';
console.log(str);
str = str.replace(/\r\n?/g,'');
// I prefer replacing all the newline characters with \n's here
str = str.replace(/(^|\n)([^\n]*)(\n\2)+/g,function(m0,m1,m2,m3,ind) {
var line = str.substr(0,ind).split(/\n/).length + 1;
var msg = '[Found duplicate]';
msg += '\nFollowing symbol defined more than once';
msg += '\n\tsymbol: ' + m2;
msg += '\n\ton line ' + line;
console.log(msg);
return m1 + m2;
});
console.log(str);
Otherwise you can skip the first line and change the pattern into
/(^|\r\n?|\n)([^\r\n]*)((?:\r\n?|\n)\2)+/g
Note that [^\n]* will also catch multiple empty lines. If you want to make sure it matches (and replaces) non-empty lines then you might want to use [^\n]+.
[EDIT]
For the record, each m represents each arguments object, so m0 is the whole match, m1 is the 1st subgroup ((^|\n)), m2 is the 2nd subgroup (([^\n]*)) and m3 is the last subgroup ((\n\2)). I could have used arguments[n] instead but these are shorter.
As with the return value, due to lack of lookbehind in the regex flavor used by Javascript, this pattern is catching a possible preceding newline (unless it is the first line) so it needs to return the match and that preceding newline if any. That's why it shouldn't be returning m2 only.

Related

Doubts in JavaScript RegExp and String.replace() method

I am trying to enter 'username' in a webpage using VBA. So in the source code of the webpage, there are some modifications done to the 'username' value.
I have attached the code,
function myFunction()
{
document.str.value = "Abc02023";
document.str.value = document.str.value.toUpperCase();
pattern = new RegExp("\\*", "g");
document.str.value = document.str.value.replace(pattern, "");
document.str.value = document.str.value.replace(/^\s+/, "");
document.str.value = document.str.value.replace(/\s+$/, "");
}
I read about these and from my understanding, after the modifications document.str.value is ABC02023.
Obviously I am wrong as there would not be no point in doing all these modifications then. Also, I am getting an 'incorrect username error'.
So can anybody please help me to understand these. What would be the value of document.str.value and how did you figure it out? I am new to JavaScript so please forgive me if I am being too slow...

Looks like you are using some very old code to learn from. ☹
Let's see if we can still learn something by bringing this code up to date, then you go find some newer learning materials. Here is a well-written book series with free online versions available: You Don't Know JS.
function myFunction() {
// Assuming your code runs in a browser, `document` is equal to the
// global object. So if in a browser and somewhere outside the function
// a variable `str` has been created, this will add an attribute `value`
// to `str` and set the value of `str.value` to 'Abc02023'. If there is
// no already existing object `document` (for instance not running in
// a browser) or if document does not have an already created property
// called`str` then this will throw a TypeError because you cannot add
// a property to `undefined`.
document.str.value = "Abc02023";
// You probably were just trying to create a new variable `str` so let's
// just start over
}
Second try
function myFunction() {
// create a variable `str` and set it to 'Abc02023'
var str = "Abc02023";
// Take value of str and convert it to all capital letters
// then overwrite current value of str with the result.
// So now `str === 'ABC02023'
str = str.toUpperCase();
// Create a regular expression representing all occurences of `*`
// and assign it to variable `pattern`.
var pattern = new RegExp("\\*", "g");
// Remove all instances of our pattern from the string. (which does not
// affect this string, but prevents user inputting some types of bad
// strings to hack your website.
str = str.replace(pattern, "");
// Remove any leading whitespace form our string (which does not
// affect this string, but cleans up strings input by a user).
str = str.replace(/^\s+/, "");
// Remove any trailing whitespace form our string (which does not
// affect this string, but cleans up strings input by a user).
str = str.replace(/\s+$/, "");
// Let's at least see our result behind the scenes. Press F12
// to see the developer console in most browsers.
console.log("`str` is equal to: ", str );
}
Third try, let's clean this up a little:
// The reason to use functions is so we can contain the logic
// separate from the data. Let's pull extract our data (the string value)
// and then pass it in as a function parameter
var result = myFunction('Abc02023')
console.log('result = ', result)
function myFunction(str) {
str = str.toUpperCase();
// Nicer syntax for defining regular expression.
var pattern = /\*/g;
str = str.replace(pattern, '');
// Unnecesarry use of regular expressions. Let's use trim instead
// to clean leading and trailing whitespace at once.
str = str.trim()
// let's return our result so the rest of the program can use it
// return str
}
Last go round. We can make this much shorter and easier to read by chaining together all the modifications to str. And let's also give our function a useful name and try it out against a bad string.
var cleanString1 = toCleanedCaps('Abc02023')
var cleanString2 = toCleanedCaps(' ** test * ')
console.log('cleanString1 = ', cleanString1)
console.log('cleanString2 = ', cleanString2)
function toCleanedCaps(str) {
return str
.toUpperCase()
.replace(/\\*/g, '')
.trim()
}

#skylize answer is close
what is equivalent to your code is actually
function toCleanedCaps(str) {
return str
.toUpperCase()
.replace(/\*/g, '') // he got this wrong
.trim()
}

Let's go over the statements one by one
document.str.value = document.str.value.toUpperCase();
makes the string uppercase
pattern = new RegExp("\\*", "g");
document.str.value = document.str.value.replace(pattern, "");
replaces between zero and unlimited occurences of the \ character , so no match in this case.
document.str.value = document.str.value.replace(/^\s+/, "");
replaces any whitespace character occurring between one and unlimited times at the beginning of the string, so no match.
document.str.value = document.str.value.replace(/\s+$/, "");
replaces any whitespace character occurring between one and unlimited times at the end of the string, so no match.
You are right. With "Abc02023" as input, the output is what you suggest.

Capitalize the first letter of each word

var name = "AlbERt EINstEiN";
function nameChanger(oldName) {
var finalName = oldName;
// Your code goes here!
finalName = oldName.toLowerCase();
finalName = finalName.replace(finalName.charAt(0), finalName.charAt(0).toUpperCase());
for(i = 0; i < finalName.length; i++) {
if (finalName.charAt(i) === " ")
finalName.replace(finalName.charAt(i+1), finalName.charAt(i+1).toUpperCase());
}
// Don't delete this line!
return finalName;
};
// Did your code work? The line below will tell you!
console.log(nameChanger(name));
My code as is, returns 'Albert einstein'. I'm wondering where I've gone wrong?
If I add in
console.log(finalName.charAt(i+1));
AFTER the if statement, and comment out the rest, it prints 'e', so it recognizes charAt(i+1) like it should... I just cannot get it to capitalize that first letter of the 2nd word.

There are two problems with your code sample. I'll go through them one-by-one.
Strings are immutable
This doesn't work the way you think it does:
finalName.replace(finalName.charAt(i+1), finalName.charAt(i+1).toUpperCase());
You need to change it to:
finalName = finalName.replace(finalName.charAt(i+1), finalName.charAt(i+1).toUpperCase());
In JavaScript, strings are immutable. This means that once a string is created, it can't be changed. That might sound strange since in your code, it seems like you are changing the string finalName throughout the loop with methods like replace().
But in reality, you aren't actually changing it! The replace() function takes an input string, does the replacement, and produces a new output string, since it isn't actually allowed to change the input string (immutability). So, tl;dr, if you don't capture the output of replace() by assigning it to a variable, the replaced string is lost.
Incidentally, it's okay to assign it back to the original variable name, which is why you can do finalName = finalName.replace(...).
Replace is greedy
The other problem you'll run into is when you use replace(), you'll be replacing all of the matching characters in the string, not just the ones at the position you are examining. This is because replace() is greedy - if you tell it to replace 'e' with 'E', it'll replace all of them!
What you need to do, essentially, is:
Find a space character (you've already done this)
Grab all of the string up to and including the space; this "side" of the string is good.
Convert the very next letter to uppercase, but only that letter.
Grab the rest of the string, past the letter you converted.
Put all three pieces together (beginning of string, capitalized letter, end of string).
The slice() method will do what you want:
if (finalName.charAt(i) === " ") {
// Get ONLY the letter after the space
var startLetter = finalName.slice(i+1, i+2);
// Concatenate the string up to the letter + the letter uppercased + the rest of the string
finalName = finalName.slice(0, i+1) + startLetter.toUpperCase() + finalName.slice(i+2);
}
Another option is regular expression (regex), which the other answers mentioned. This is probably a better option, since it's a lot cleaner. But, if you're learning programming for the first time, it's easier to understand this manual string work by writing the raw loops. Later you can mess with the efficient way to do it.
Working jsfiddle: http://jsfiddle.net/9dLw1Lfx/
Further reading:
Are JavaScript strings immutable? Do I need a "string builder" in JavaScript?
slice() method

You can simplify this down a lot if you pass a RegExp /pattern/flags and a function into str.replace instead of using substrings
function nameChanger(oldName) {
var lowerCase = oldName.toLowerCase(),
titleCase = lowerCase.replace(/\b./g, function ($0) {return $0.toUpperCase()});
return titleCase;
};
In this example I've applied the change to any character . after a word boundary \b, but you may want the more specific /(^| )./g

Another good answer to this question is to use RegEx to do this for you.
var re = /(\b[a-z](?!\s))/g;
var s = "fort collins, croton-on-hudson, harper's ferry, coeur d'alene, o'fallon";
s = s.replace(re, function(x){return x.toUpperCase();});
console.log(s); // "Fort Collins, Croton-On-Hudson, Harper's Ferry, Coeur D'Alene, O'Fallon"
The regular expression being used may need to be changed up slightly, but this should give you an idea of what you can do with regular expressions
Capitalize Letters with JavaScript

The problem is twofold:
1) You need to return a value for finalName.replace, as the method returns an element but doesn't alter the one on which it's predicated.
2) You're not iterating through the string values, so you're only changing the first word. Don't you want to change every word so it's in lower case capitalized?
This code would serve you better:
var name = "AlbERt EINstEiN";
function nameChanger(oldName) {
// Your code goes here!
var finalName = [];
oldName.toLowerCase().split(" ").forEach(function(word) {
newWord = word.replace(word.charAt(0), word.charAt(0).toUpperCase());
finalName.push(newWord);
});
// Don't delete this line!
return finalName.join(" ");
};
// Did your code work? The line below will tell you!
console.log(nameChanger(name));

if (finalName.charAt(i) === " ")
Shouldn't it be
if (finalName.charAt(i) == " ")
Doesn't === check if the object types are equal which should not be since one it a char and the other a string.

Why do these JavaScript regular expression capture parenthesis snag entire line instead of the suffixes appended to a word?

Can someone please tell me WHY my simple expression doesn't capture the optional arbitrary length .suffix fragments following hello, matching complete lines?
Instead, it matches the ENTIRE LINE (hello.aa.b goodbye) instead of the contents of the capture parenthesis.
Using this code (see JSFIDDLE):
//var line = "hello goodbye"; // desired: suffix null
//var line = "hello.aa goodbye"; // desired: suffix[0]=.aa
var line = "hello.aa.b goodbye"; // desired: suffix[0]=.aa suffix[1]=.b
var suffix = line.match(/^hello(\.[^\.]*)*\sgoodbye$/g);
I've been working on this simple expression for OVER three hours and I'm beginning to believe I have a fundamental misunderstanding of how capturing works: isn't there a "cursor" gobbling up each line character-by-character and capturing content inside the parenthesis ()?

I originally started from Perl and then PHP. When I started with JavaScript, I got stuck with this situation once myself.
In JavaScript, the GLOBAL match does NOT produce a multidimensional array. In other words, in GLOBAL match there is only match[0] (no sub-patterns).
Please note that suffix[0] matches the whole string.
Try this:
//var line = "hello goodbye"; // desired: suffix undefined
//var line = "hello.aa goodbye"; // desired: suffix[1]=.aa
var line = "hello.aa.b goodbye"; // desired: suffix[1]=.aa suffix[2]=.b
var suffix = line.match(/^hello(\.[^.]+)?(\.[^.]+)?\s+goodbye$/);
If you have to use a global match, then you have to capture the whole strings first, then run a second RegEx to get the sub-patterns.
Good luck
:)
Update: Further Explanation
If each string only has ONE matchable pattern (like var line = "hello.aa.b goodbye";)
then you can use the pattern I posted above (without the GLOBAL modifier)
If a sting has more than ONE matchable pattern, then look at the following:
// modifier g means it will match more than once in the string
// ^ at the start mean starting with, when you wan the match to start form the beginning of the string
// $ means the end of the string
// if you have ^.....$ it means the whole string should be a ONE match
var suffix = line.match(/^hello(\.[^.]+)?(\.[^.]+)?\s+goodbye$/g);
var line = 'hello.aa goodbye and more hello.aa.b goodbye and some more hello.cc.dd goodbye';
// no match here since the whole of the string doesn't match the RegEx
var suffix = line.match(/^hello(\.[^.]+)?(\.[^.]+)?\s+goodbye$/);
// one match here, only the first one since it is not a GLOBAL match (hello.aa goodbye)
// suffix[0] = hello.aa goodbye
// suffix[1] = .aa
// suffix[2] = undefined
var suffix = line.match(/hello(\.[^.]+)?(\.[^.]+)?\s+goodbye/);
// 3 matches here (but no sub-patterns), only a one dimensional array with GLOBAL match in JavaScript
// suffix[0] = hello.aa goodbye
// suffix[1] = hello.aa.b goodbye
// suffix[2] = hello.cc.dd goodbye
var suffix = line.match(/hello(\.[^.]+)?(\.[^.]+)?\s+goodbye/g);
I hope that helps.
:)

inside ()
please do not look for . and then some space , instead look for . and some characters and finally outside () look for that space

A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations.
var suffix = line.match(/^hello((\.[^\.]*)*)\sgoodbye$/g);
if (suffix !== null)
suffix = suffix[1].match(/(\.[^\.\s]*)/g)
and I recommand regex101 site.

Using the global flag with the match method doesn't return any capturing groups. See the specification.
Although you use ()* it's only one capturing group. The * only defines that the content has to be matched 0 or more time before the space comes.
As #EveryEvery has pointed out you can use a two-step approach.

Finding image url via using Regex

Any working Regex to find image url ?
Example :
var reg = /^url\(|url\(".*"\)|\)$/;
var string = 'url("http://domain.com/randompath/random4509324041123213.jpg")';
var string2 = 'url(http://domain.com/randompath/random4509324041123213.jpg)';
console.log(string.match(reg));
console.log(string2.match(reg));
I tied but fail with this reg
pattern will look like this, I just want image url between url(" ") or url( )
I just want to get output like http://domain.com/randompath/random4509324041123213.jpg
http://jsbin.com/ahewaq/1/edit

I'd simply use this expression:
/url.*\("?([^")]+)/
This returns an array, where the first index (0) contains the entire match, the second will be the url itself, like so:
'url("http://domain.com/randompath/random4509324041123213.jpg")'.match(/url.*\("?([^")]+)/)[1];
//returns "http://domain.com/randompath/random4509324041123213.jpg"
//or without the quotes, same return, same expression
'url(http://domain.com/randompath/random4509324041123213.jpg)'.match(/url.*\("?([^")]+)/)[1];
If there is a change that single and double quotes are used, you can simply replace all " by either '" or ['"], in this case:
/url.*\(["']?([^"')]+)/

Try this regexp:
var regex = /\burl\(\"?(.*?)\"?\)/;
var match = regex.exec(string);
console.log(match[1]);
The URL is captured in the first subgroup.

If the string will always be consistent, one option would be simply to remove the first 4 characters url(" and the last two "):
var string = 'url("http://domain.com/randompath/random4509324041123213.jpg")';
// Remove last two characters
string = string.substr(0, string.length - 2);
// Remove first five characters
string = string.substr(5, string.length);
Here's a working fiddle.
Benefit of this approach: You can edit it yourself, without asking StackOverflow to do it for you. RegEx is great, but if you don't know it, peppering your code with it makes for a frustrating refactor.

How to split a long regular expression into multiple lines in JavaScript?

I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;

Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)

[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();

Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.

Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.

There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g

The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));

To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');

You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);

I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);

Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");

#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i

We Keep Coding

JavaScript is the programming language of the Web.

Javascript Regexp Duplicate Line Matching not working correctly - javascript

Related

Doubts in JavaScript RegExp and String.replace() method

Capitalize the first letter of each word

Why do these JavaScript regular expression capture parenthesis snag entire line instead of the suffixes appended to a word?

Finding image url via using Regex

How to split a long regular expression into multiple lines in JavaScript?

Categories

Resources