Exact match multiple words in regex (datatables)

Exact match multiple words in regex (datatables) - javascript

I'm trying to create a regex search in Datatables which matches a user who belongs to all selected groups.
Example:
User A belongs to Users, Other, Other Users
User B belongs to Users, Other
If Users and Other Users are selected as filters, only user A should show in the table. The problem I'm having is that both users are showing when these filters are selected. I don't think the regex is matching exact strings and, after looking through multiple other answers, I don't seem to be able to get it to do so.
My solution:
if (!this.items.length) {
table.column(i).search('');
} else {
let regex = '^';
this.items.forEach(v => regex += `(?=.*\\b${v}\\b)`);
regex += '.*$'
table.column(i).search(regex, true, false, true)
}
Which results in: ^(?=.*\bUsers\b)(?=.*\bOther Users\b).*$
However, the user belonging to Users,Other is still being returned.

You can enforce a comma or start/end of string check before and after each of your search term:
this.items.forEach(v => regex += "(?=.*(?:[^,]|^)${v}(?![^,]))");
Or, if the JavaScript environment supports lookbehinds:
this.items.forEach(v => regex += "(?=.*(?<![^,])${v}(?![^,]))");
The (?:[^,]|^) / (?<![^,]) (equal to (?<=,|^)) part requires start of string position or a comma right before your search term and the (?![^,]) negative lookahead requires a comma or end of string immediately to the right of the current position ((?![^,]) is equal to (?=,|$) positive lookahead).

Related

How to get editable categories from a MediaWiki page

I am trying to get all the editable categories from a page for a user script I am working on on Wikipedia.
What I am struggling is figuring out how to filter the uneditable categories from the editable categories.
Here is the code that is supposed to get the list of editable categories (so far):
var categorylist = mw.config.get("wgCategories");
if (categorylist.length == 0) {
...
} else {
$.get(mw.config.get("wgScriptPath") + "/api.php", {
action: "parse",
format: "json",
page: mw.config.get("wgPageName"),
prop: "wikitext"
}).done(function(result) {
var editablelist = result.parse.wikitext.match(/* some regex here */);
})
}
I have been experimenting with RegExr to figure out what will match all the category links: /\[\[Category:.*\]\]/g
I do not want to match the spaces between category links, just the individual category links.
Is there an efficient way to match all the editable categories from the wikitext of a MediaWiki page?

For not matching the spaces between category links, use the following regex:
\[\[Category:[^\[\]]*\]\]
See here a demo.
[^\[\]] means "a character that is neither an open nor close squared brackets".
If you don't want to match neither the squared brackets nor eventual sortkeys, you will need a more complex regex:
(?<=\[\[)Category:[^\[\]|]*(?=(?:\|[^\[\]]*)?\]\])
Take a look here for a regex demo.
Now you are ensuring that the expression is preceded by two open squared brackets, followed by an optional sortkey (\|[^\[\]]*) and finaly by two closed squared brackets.
Every sub-expression contained within lookarounds ((?<=X) or (?=X)) is not captured.

DataTables global Regex Search in each column

I'd like to implement a feature, where you can start your search string with $ to get a startsWith search in all searchable columns.
This is what I've tried so far:
$('#myinput').on('keyup', (event) => {
let searchValue = $(event.currentTarget).val();
if (searchValue.startsWith('$')) {
searchValue = `^${searchValue.substr(1)}`;
}
this.dataTable.search(searchValue, true, false, false).draw();
});
But apparently, this only searches in the first column. If I don't use ^ in my search, it searches all columns. How can I check if any of the columns start with myValue instead of the entire row data?
Can be reproduced on https://datatables.net/examples/api/regex.html.
Enable global Regex and search ^Airi and after that ^Accountant.
You will get results for Airi, but not for Accountant.
How can I make the search for ^Accountant still display the first entry, as the second column starts with Accountant?

That happens because when global regex is set to true it will treat the whole row as text. Therefore your search should look like:
$('#myinput').on('keyup', (event) => {
let searchValue = $(event.currentTarget).val();
if (searchValue.startsWith('$')) {
searchValue = `\b${searchValue.substr(1)}`;
}
this.dataTable.search(searchValue, true, false, false).draw();
});
Regex token \b stands for "boundary". Each start of a word creates a new boundary (here is the Regex spec: https://regex101.com/r/s0MdwW/1). You can also try how it works in the Datatables website by searching "\bTok" which should make "Airi" appear.
Note however that there is a catch: Searching for "Sat" will match "Airi Satou" since there is a boundary between the words. If you want to avoid that particular case then you would simply need to perform a ^ regex search (like your current code) but for each column, and then collect all results. Per column regex search only use the column content for the match:
Update: Ok so the per column search was definitively more complex than expected because simply searching each column would create an AND search. For example searching for "Airi" would also require that "Airi" is in all other columns.
To solve this particular a custom search function must be built. Here you can find it: https://codepen.io/adelriosantiago/pen/XWKGoLx?editors=1011
This custom function OR searches through the columns using the $.fn.dataTable.ext.search.push function.

Looking for ways to validate a username

I'm trying to validate usernames when using a tag function against these criteria:
Only contains alphanumeric characters, underscore and dot.
Dot can't be at the end or start of a username (e.g .username / username.).
Dot and underscore can't be next to each other (e.g user_.name).
Dot can't be used multiple times in a row (e.g. user..name).
The username ends when there is a character other than the allowed characters (e.g. #user#next or #user/nnnn would only validate user as a valid username)
Another username can only be written after a space. (e.g. #user #next would validate two usernames while #user#next would only validate user)
I have tried this so far:
^(?=.{8,20}$)(?![.])(?!.*[.]{2})[a-zA-Z0-9.]+(?<![.])$ and have dealt the multiple usernames problem with for loops.
I was wondering what would be the best way to implement something like this (e.g. regex, for loops, combination). I tried using regex but realised it is very specific and can get complicated.

Use the following regex:
(?!\.)(?![a-zA-Z._]*(?:\._|_\.|\.\.))[a-zA-Z._]*[a-zA-Z_]
(?!\.): Negative lookahead assertion to ensure the name cannot begin with a '.'
(?![a-zA-Z._]*(?:\._|_\.|\.\.)): Negative lookahead assertion that the name does not contain ._ nor _. nor .. in succession.
[a-zA-Z._]*[a-zA-Z_]: Ensures the name is at least one-character long and does not end with ..
See Regex Demo
However, the results are not necessarily what you might expect since you want to stop scanning a name when you come to the first character that is not part of a valid name but you continue scanning looking for more valid names. So when the input is, for example, .user, you stop scanning when you see the . because you know that a name cannot begin with .. But you then resume scanning and still end up scanning user as a valid name.
let text = 'user user_xyx_abc user__def user_.abc user._abc user..abc user_abc. .user';
let regexp = /(?!\.)(?![a-zA-Z._]*(?:\._|_\.|\.\.))[a-zA-Z._]*[a-zA-Z_]/g;
let matches = text.matchAll(regexp);
for (let match of matches) {
console.log(match);
}
Ideally, your input would contain only a single user name that you are validating and the entire input should match your regex. Then, you would use anchors in your regex:
^(?!\.)(?![a-zA-Z._]*(?:\._|_\.|\.\.))[a-zA-Z._]*[a-zA-Z_]$
See Regex Demo
But given your current circumstances, you might consider splitting your input on whitespace, trimming extra whitespace from the beginning and end of the strings, and then use the above regex on each individual user name:
let text = 'user user_xyx_abc user__def user_.abc user._abc user..abc user_abc. .user ';
let names = text.split(/\s+/);
let regexp = /^(?!\.)(?![a-zA-Z._]*(?:\._|_\.|\.\.))[a-zA-Z._]*[a-zA-Z_]$/;
for (name of names) {
if (regexp.test(name))
console.log(name);
}

Get id from url

I have the following example url: #/reports/12/expense/11.
I need to get the id just after the reports -> 12. What I am asking here is the most suitable way to do this. I can search for reports in the url and get the content just after that ... but what if in some moment I decide to change the url, I will have to change my algorythm.
What do You think is the best way here. Some code examples will be also very helpfull.

It's hard to write code that is future-proof since it's hard to predict the crazy things we might do in the future!
However, if we assume that the id will always be the string of consecutive digits in the URL then you could simply look for that:
function getReportId(url) {
var match = url.match(/\d+/);
return (match) ? Number(match[0]) : null;
}
getReportId('#/reports/12/expense/11'); // => 12
getReportId('/some/new/url/report/12'); // => 12

You should use a regular expression to find the number inside the string. Passing the regular expression to the string's .match() method will return an array containing the matches based on the regular expression. In this case, the item of the returned array that you're interested in will be at the index of 1, assuming that the number will always be after reports/:
var text = "#/reports/12/expense/11";
var id = text.match(/reports\/(\d+)/);
alert(id[1]);
\d+ here means that you're looking for at least one number followed by zero to an infinite amount of numbers.

var text = "#/reports/12/expense/11";
var id = text.match("#/[a-zA-Z]*/([0-9]*)/[a-zA-Z]*/")
console.log(id[1])
Regex explanation:
#/ matches the characters #/ literally
[a-zA-Z]* - matches a word
/ matches the character / literally
1st Capturing group - ([0-9]*) - this matches a number.
[a-zA-Z]* - matches a word
/ matches the character / literally

Regular expressions can be tricky (add expensive). So usually if you can efficiently do the same thing without them you should. Looking at your URL format you would probably want to put at least a few constraints on it otherwise the problem will be very complex. For instance, you probably want to assume the value will always appear directly after the key so in your sample report=12 and expense=11, but report and expense could be switched (ex. expense/11/report/12) and you would get the same result.
I would just use string split:
var parts = url.split("/");
for(var i = 0; i < parts.length; i++) {
if(parts[i] === "report"){
this.reportValue = parts[i+1];
i+=2;
}
if(parts[i] === "expense"){
this.expenseValue = parts[i+1];
i+=2;
}
}
So this way your key/value parts can appear anywhere in the array
Note: you will also want to check that i+1 is in the range of the parts array. But that would just make this sample code ugly and it is pretty easy to add in. Depending on what values you are expecting (or not expecting) you might also want to check that values are numbers using isNaN

Regular Expression with multiple words (in any order) without repeat

I'm trying to execute a search of sorts (using JavaScript) on a list of strings. Each string in the list has multiple words.
A search query may also include multiple words, but the ordering of the words should not matter.
For example, on the string "This is a random string", the query "trin and is" should match. However, these terms cannot overlap. For example, "random random" as a query on the same string should not match.
I'm going to be sorting the results based on relevance, but I should have no problem doing that myself, I just can't figure out how to build up the regular expression(s). Any ideas?

The query trin and is becomes the following regular expression:
/trin.*(?:and.*is|is.*and)|and.*(?:trin.*is|is.*trin)|is.*(?:trin.*and|and.*trin)/
In other words, don't use regular expressions for this.

It probably isn't a good idea to do this with just a regular expression. A (pure, computer science) regular expression "can't count". The only "memory" it has at any point is the state of the DFA. To match multiple words in any order without repeat you'd need on the order of 2^n states. So probably a really horrible regex.
(Aside: I mention "pure, computer science" regular expressions because most implementations are actually an extension, and let you do things that are non-regular. I'm not aware of any extensions, certainly none in JavaScript, that make doing what you want to do any less painless with a single pattern.)
A better approach would be to keep a dictionary (Object, in JavaScript) that maps from words to counts. Initialize it to your set of words with the appropriate counts for each. You can use a regular expression to match words, and then for each word you find, decrement the corresponding entry in the dictionary. If the dictionary contains any non-0 values at the end, or if somewhere a long the way you try to over-decrement a value (or decrement one that doesn't exist), then you have a failed match.

I'm totally not sure if I get you right there, so I'll just post my suggestion for it.
var query = "trin and is",
target = "This is a random string",
search = { },
matches = 0;
query.split( /\s+/ ).forEach(function( word ) {
search[ word ] = true;
});
Object.keys( search ).forEach(function( word ) {
matches += +new RegExp( word ).test( target );
});
// do something useful with "matches" for the query, should be "3"
alert( matches );
So, the variable matches will contain the number of unique matches for the query. The first split-loop just makes sure that no "doubles" are counted since we would overwrite our search object. The second loop checks for the individuals words within the target string and uses the nifty + to cast the result (either true or false) into a number, hence, +1 on a match or +0.

I was looking for a solution to this issue and none of the solutions presented here was good enough, so this is what I came up with:
function filterMatch(itemStr, keyword){
var words = keyword.split(' '), i = 0, w, reg;
for(; w = words[i++] ;){
reg = new RegExp(w, 'ig');
if (reg.test(itemStr) === false) return false; // word not found
itemStr = itemStr.replace(reg, ''); // remove matched word from original string
}
return true;
}
// test
filterMatch('This is a random string', 'trin and is'); // true
filterMatch('This is a random string', 'trin not is'); // false

We Keep Coding

JavaScript is the programming language of the Web.

Exact match multiple words in regex (datatables) - javascript

Related

How to get editable categories from a MediaWiki page

DataTables global Regex Search in each column

Looking for ways to validate a username

Get id from url

Regular Expression with multiple words (in any order) without repeat

Categories

Resources