$regex match pattern has different result in MongoDB vs. Javascript

$regex match pattern has different result in MongoDB vs. Javascript - javascript

I am using this $regex pattern in a mongo query:
{domain: {$regex: '^(.+\.)?youtube.com$'}}
Expecting it to match youtube.com, and sub.youtube.com.
The problem I'm seeing is that it's matching things like justyoutube.com.
In JavaScript, it does not match:
console.log(/^(.+\.)?youtube.com$/.test('justyoutube.com'));
// this returns `false` as expected.
Is there some better way to get this working? And is the issue with my regex, or the regex library used by MongoDB?
Update: looks like when I use /pattern/ vs. 'pattern' i get the results I'm expecting. Still curious how to get it using quotes since I could debug easier in MongoDB Compass.

I'm guessing the \ in the string you pass is acting as an escape character in the string itself, and doesn't make it into the actual regex. You can try doubling the backslash to sort of 'escape the escape':
const unescaped = new RegExp('^(.+\.)?youtube.com$')
const escaped = new RegExp('^(.+\\.)?youtube.com$')
console.log(unescaped.test('justyoutube.com'));
console.log(escaped.test('justyoutube.com'));
console.log(unescaped.test('sub.youtube.com'));
console.log(escaped.test('sub.youtube.com'));
// Or you can use a template literal, which interprents all its characters literally:
const withBacktick = new RegExp(`^(.+\.)?youtube.com$`);
console.log(withBacktick.test('sub.youtube.com'));
console.log(withBacktick.test('justyoutube.com'));

Related

How do you sanitize a string to pass a regex?

I have a regex validation I need my string to pass.
/^[0-9a-zA-Z-]+$/
I want to create a function that sanitizes the string for it to pass the regex.
I thought of doing something like
string.replace(/^[0-9a-zA-Z-]+$/,"");
Except I need to invert the above regex.
I tried to look up how to invert a regex but nothing seems to show up.

Try this string.replace(/\W/g,""). Also check this web site i always use it to test regular expressions, it also has hints on the right bottom

Negate the collection using ^ inside the []
const str = `abc*ç%ABC&(/())12345=?`
const newString = str.replace(/[^0-9a-zA-Z-]/g,"");
console.log(newString)

Javascript str.split(/[^a-zA-Z0-9.#]|(username|fname)/ not removing 'username' or 'fname' from string

I have a simple query string in my program:
?username=someone#email.com&fname=
I have come up with a regular expression that selects everything except the data I want:
[^a-zA-Z0-9.#]|(username|fname)
I am trying to use javascripts str.split() to split around everything that isn't actually data in the query, like so:
let userinfo = global.location.search.split(/[^a-zA-Z0-9.#]|(username|fname)/).filter(Boolean);
Unfortunately, when this runs, I get the following array:
['username', 'someone#email.com', 'fname'].
I expect just ['someone#email.com'] since "username" and "fname" should be split around from my regex.
I have tested this in https://regex101.com/ and it appears to work fine, so I'm not sure why it doesn't work in JS.
Any ideas?

When you have a capture group in the regexp used with .split() the captured strings are included in the resulting array.
If you need a group but don't want to include it in the result, use a non-capturing group (?:username|fname).
But there's no need for the group in this case at all. /xxx|(yyy|zzz)/ matches the same thing as /xxx|yyy|zzz/, they only differ in what they capture.
/[^a-zA-Z0-9.#]|username|fname/

You need Regex for such tasks, you can use standard URLSearchParams API
let searchParams = "?username=someone#email.com&fname="
let parsed = new URLSearchParams(searchParams)
console.log(parsed.get('username'))

RegEx match() in Javascript does not produce result as expected

I'm having trouble working out why a regex in Javascript is not working how I would expect it to.
The pattern is as follows:
\[(.+)\]\((.+)\)
trying to match text in the following format:
[Learn more](https://www.example.com)
const text = 'Lorem ipsum etc [Learn more](https://www.google.com), and produce costly [test link](https://www.google.com). [another test link](https://www.google.com).'
const regex = /\[(.+)\]\((.+)\)/
const found = text.match(regex)
console.log(found)
I am expecting the value of found to be the following:
[
"[Learn more](https://www.google.com)",
"[test link](https://www.google.com)",
"[another test link](https://www.google.com)"
]
But the value seems to be as follows:
[
"[Learn more](https://www.google.com), and produce costly [test link](https://www.google.com). [another test link](https://www.google.com)",
"Learn more](https://www.google.com), and produce costly [test link](https://www.google.com). [another test link",
"https://www.google.com"
]
I've tried the /ig flags but this doesn't seem to work. I'm trying in a different application (RegExRX) and getting the expected result but in Javascript, I can't get it to produce the same result.

The + quantifier is greedy and will "eat" as much of the source string as possible. You can use .+? instead:
const regex = /\[(.+?)\]\((.+?)\)/
Better yet, instead of . match "not ]":
const regex = /\[([^\]]+)\]\(([^)]+)\)/
Explicitly excluding the boundary characters can perform better anyway.

TL;DR: The regex \[(.+?)\]\((.+?)\) should do.
The reason the original pattern doesn't work is because the + quantifier is "greedy" by default—it will try to match as many characters as possible. Therefore, .+ means "as much of anything except new line character as possible". You can already tell that closing bracket fits the definition just fine.
To make it work properly, you have to say "as much of anything as possible, until the first closing bracket." To do that, you should either substitute .+ by [^\]]+ ([^\)]+ for the second group), or simply make the aforementioned quantifier not so greedy by appending it with ?, which turns both capturing groups into (.+?).

Rewrite this regex without using lookbehind - invalid regex with JS

Only minimal experience with Regex, I am trying to implement some email masking in node.js, all was well running it locally but once pushed up to the server I am getting invalid Regex errors.
The Regex code example can be found here
https://regexr.com/42uid
var email = 'foo#bar.com'
const regex = /(.)[^#\n](?=[^#\n]*[^#\n]#)|(?:(#.)|(?!^)\G(?=[^#]*$)).(?!$)/g;
const maskedEmail = email.replace(regex, '*');
maskedEmail should return
f*o#b*r.com
I have narrowed the issue down to being the 'lookbehind/lookahead' which as I understand it is not available in JS. However I am not aware how best to re-write it.

You can capture it in multiple groups and then retrieve that data in the replace with $1, $2, etc.
By using this regex: ^(.).*(.#.).*(.\.[^\.]+)$
and using the following replace string: $1*$2*$3
it will result in: f*o#b*r.com
Link to my Fiddle: https://regexr.com/42um8

javascript new regexp dynamic

I am creating a dynamic regex but I have a problem with how to escape character so can one put some light on this?
I am using PHP with some backend configuration and admin can add regexp from backend to validate invalidate character and I am getting this value on the PHP so what I did
var regex = RegExp(<?php echo $regex ?>);
but I am getting the error like SyntaxError: Invalid regular expression: I know I need to escape the dynamic character but not sure how.
EDIT
I am trying this value from backend
<>{}[\]!##$+=%^*()/;
New EDIT
As per the #anubhava suggested I am escaping the special character by preg_quote() but on Regex.test it always fails I mean it always getting the false even though It should return true.
Here is my code,
var invalidCharRe = new RegExp(SOME_MY_VARIABLE);
var result = invalidCharRe.test(value)
Where SOME_MY_VARIABLE is a dynamic special character(which I am getting from PHP by preg_quote() and value is my textbox value

Since you're using php to echo your regex you can leverage php's preg_quote function to escape all special regex meta-characters beforehand like this:
var regex = /<?php echo preg_quote($regex, '/'); ?>/
Note that there is no need to call new RegExp here since Javascript will be always be getting a static string for regex.

We Keep Coding

JavaScript is the programming language of the Web.

$regex match pattern has different result in MongoDB vs. Javascript - javascript

Related

How do you sanitize a string to pass a regex?

Javascript str.split(/[^a-zA-Z0-9.#]|(username|fname)/ not removing 'username' or 'fname' from string

RegEx match() in Javascript does not produce result as expected

Rewrite this regex without using lookbehind - invalid regex with JS

javascript new regexp dynamic

Categories

Resources