Mongoose.js and the query object with Regexp - javascript

In a locomotive app, in a search engine that queries my models (with rather specific metadata that I won't go into here for the usual reasons) I need to include a regexp engine to check against the keywords field.
My approach is as follows:
this.keywords = strings.makeSafe(this.param("keywords")).toLowerCase();
console.log(this.keywords);
if(strings.exists(this.keywords)) {
keywords = this.keywords.split(", ");
var len = keywords.length - 1;
do {
query.regex("/" + this.keywords[len] + "/ig", "keywords");
} while(len--);
}
(It's this.keywords so that I can pass it to the view should I need to).
however, I'm not matching data that I know to be available in the documents in the collection
the strings.makesafe call simply does this:
strings.makeSafe = function(str) {
str = String(str);
var re = /\$/gi;
str = str.replace(re, "U+FF04");
re = /\./gi;
return str.replace(re, "U+FF08");
};
and is an attempt to deal with mongoose's vulnerability to code injection via the "." and "$" characters. It's been tested and shouldn't be driving the issue.
I'm of the mind right now that it's something to do with the structure of the regexp or the calling method. is this the correct syntax to accomplish a search on comma separated list of keywords in mongoose.

There are a few problems with your approach:
Assuming that query is a Mongoose Query object, you've got the order swapped for the parameters to the regex method as the path comes first, then the regex value.
You need to construct the regular expression using the RegExp constructor function as the literal notation with the / chars can't be dynamically constructed from strings.
Calling query.regex in a loop like that doesn't OR the conditions together as each call simply overwrites the previous one. Instead, you can join the keywords together into a combined regex that matches any of them by using |.
Putting it all together it should be something like:
this.keywords = strings.makeSafe(this.param("keywords")).toLowerCase();
console.log(this.keywords);
if (strings.exists(this.keywords)) {
keywords = this.keywords.split(", ");
query.regex("keywords", new RegExp(keywords.join("|"), "i"));
}

Related

Possible to get 'regex source' from match?

I can get the source of a regex when it's defined separately. For example:
let r1 = new RegExp("el*");
console.log(r1.source);
// el*
Or:
let r2 = /el*/;
console.log(r2.source);
// el*
Is there a way to extract that if the regex isn't defined separately? For example, something along the lines of:
let m = "Hello".match(/el*/);
console.log(m.source?);
No,
quoting the documents of the match() function
Return value
An Array whose contents depend on the presence or absence of the
global (g) flag, or null if no matches are found.
So the return value is an array (you can test it by Array.isArray(m)// true)
However, the returned array has some extra information about the ocurred match (like groups, index and original input) but none of them include the original regex used to get the match
So there is no way to get that information from the match because its not returned by the matching function
The match result by itself cannot lead to the original regex, simply because different regexes can lead to the same result, even on the same string. Take for example the string "abcd" - all the following regexes: /abcd/, /a..d/ /a.*/ and many more, would match the string exactly the same way.
The only way you could retrive the original regex is if a reference to the regex was literally stored by the match() method inside the returned object. There is no reason to think that's the case, but you can implement your own match function that would do. Something like
function myMatch(str, regex) {
var match = str.match(regex);
if (match === null) {
match = [null];
}
match.source = regex;
return match;
}

How to combine two regex using AND operator

I have a dilemma in using regex as I am very new in using this:
I have the URL below:
var url = https://website.com/something-here/page.html?p=null#confirmation?order=123
My expected result is:
/something-here/page.html #confirmation
It could be a space or a comma or simply combine the two(/something-here/page.html#confirmation)
I can do this using two regex below:
var a= url.match(/som([^#]+).html/)[0];
var b= url.match(/#([^#]+).tion/)[0];
console.log(a,b);
But I would like to have it done as a single regex with the same result.
You can use RegExp's group system to your advantage. Here's a snippet:
var matches = url.match(/(som[^#]+.html).*?(#[^#]+.tion)/);
console.log(matches[1] + " " + matches[2]); // prints /something-here/page.html #confirmation
I combined your two RegExp conditions into one, while enclosing them with parenthesis in the correct areas to create two groups.
That way, you can get the specified group and add a space in between.
Aside the fact that your example url is malformed (you have two search params), therefore not very suitable to work with - I have e proposition:
Why not use the URL object and its properties?
url = new URL("https://website.com/something-here/page.html?p=null#confirmation?order=123");
and precisely grab the properties with explicit syntax as in:
url.pathname; >> "something-here/page.html"
url.hash; >> "#confirmation?order=123"
But in case you explicitly need a RegExp variant
here is one
var url = "https://website.com/something-here/page.html?p=null#confirmation?order=123";
var match = url.match(/\/som.*?html|\#.*?tion/g);
console.log(match.join(" "));
Use each your condition in scope "( )" More details answer try find here

How to search for accented characters in mongodb collection using nodejs

MongoDB treats É and E as two separate things, so when I search for E it will not find É.
Is there a way to make MongoDB think of them as the same thing?
I am running
var find =Users.find();
var re = new RegExp(name, 'i');
find.where('info.name').equals(re);
How do I match for strings containing accented characters and get the result?
This feature is not supported in mongodb and i doubt if it will be in the near future. What you could do to overcome is store a different field in each document containing the simple form of each name, in lowercase.
{
info:{"name":"Éva","search":"eva"};
}
{
info:{"name":"Eva","Search":"eva"}
}
When you have your document structure this, you have a some advantages,
You could create an index over the field search,
db.user.ensureIndex({"Search":1})
and fire a simple query, to find the match. When you search for a particular term, convert that term to its simple form, and to lower case and then do a find.
User.find({"Search":"eva"});
This would make use of the index as well, which a regex query would not.
See Also: Mongodb match accented characters as underlying character
But if you would want to do it the hard way, which is not recommended. Just for the records i am posting it here,
You need to have a mapping between the simple alphabets and their possible accented forms. For example:
var map = {"A":"[AÀÁÂÃÄÅ]"};
Say the search term is a, but the database document has its accented form, then, you would need to build a dynamic regex yourself before passing it to the find(), query.
var searchTerm = "a".toUpperCase();
var term = [];
for(var i=0;i<searchTerm.length;i++){
var char = searchTerm.charAt(i);
var reg = map[char];
term.push(reg);
}
var regexp = new RegExp(term.join(""));
User.find({"info.name":{$regex:regexp}})
Note, that the depicted example can handle a search term of length > 1 too.

Search and replace for a specific pattern

I have the following requirement:
I have a url as
var url = '/test/mybin/processes/{testName}/test?id={testId}';
The url can contain any number of {} in it. Lets say above it has {} at 2 places {testName} and {testId}. It can have any number of such params.
I need to replace the value of each such params with the real value.
I need to first know the name of attribute inside each {} and replace them with actual values before loading the url.
Lets say in above example, I have to first extract the names of each attribute inside {} and replace them with actual values.
In my method:
testName="Test" and testId=1
so, I want the url value to be
'/test/mybin/processes/Test/test?id=1'.
Could you please let me the best way to achieve this?
Thanks, Tani
Something like this should do the trick:
var obj = { testName : "mytest", testId:1 };
var url = '/test/mybin/processes/{testName}/test?id={testId}';
var regex = /{(.*?)}/g;
var replaced = url.replace(regex, function(m,p1) {
return obj[p1];
});
alert(replaced);
The regex /{(.*?)}/ will match any thing between curly braces (non-greedy) and puts the contents in a group that can be replaced.
Your regex from your comments could easily be fixed with the additon of the /g flag:
/\{[^\}]*\}/g
But you may find it easier to add brackets for grouping to make it easier to extract the actual contents between the braces without the braces themselves:
/\{([^\}]*)\}/g
Also note that in this context the { and } in your regex don't need escaping, but it doesn't hurt to include the escapes \
a regex combined with String.protoype.replace() is a nice way to go:
function f(map,url) {
var pattern = /{(\w+)}/g;
var replacer = function(a,b){
return map[b];
};
var o = url.replace(pattern,replacer);
return o;
}
http://jsfiddle.net/sean9999/poctcdLp

How to split a long regular expression into multiple lines in JavaScript?

I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)
[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();
Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.
Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.
There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g
The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');
You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);
I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);
Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");
#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i

Categories