DataTables global Regex Search in each column

DataTables global Regex Search in each column - javascript

I'd like to implement a feature, where you can start your search string with $ to get a startsWith search in all searchable columns.
This is what I've tried so far:
$('#myinput').on('keyup', (event) => {
let searchValue = $(event.currentTarget).val();
if (searchValue.startsWith('$')) {
searchValue = `^${searchValue.substr(1)}`;
}
this.dataTable.search(searchValue, true, false, false).draw();
});
But apparently, this only searches in the first column. If I don't use ^ in my search, it searches all columns. How can I check if any of the columns start with myValue instead of the entire row data?
Can be reproduced on https://datatables.net/examples/api/regex.html.
Enable global Regex and search ^Airi and after that ^Accountant.
You will get results for Airi, but not for Accountant.
How can I make the search for ^Accountant still display the first entry, as the second column starts with Accountant?

That happens because when global regex is set to true it will treat the whole row as text. Therefore your search should look like:
$('#myinput').on('keyup', (event) => {
let searchValue = $(event.currentTarget).val();
if (searchValue.startsWith('$')) {
searchValue = `\b${searchValue.substr(1)}`;
}
this.dataTable.search(searchValue, true, false, false).draw();
});
Regex token \b stands for "boundary". Each start of a word creates a new boundary (here is the Regex spec: https://regex101.com/r/s0MdwW/1). You can also try how it works in the Datatables website by searching "\bTok" which should make "Airi" appear.
Note however that there is a catch: Searching for "Sat" will match "Airi Satou" since there is a boundary between the words. If you want to avoid that particular case then you would simply need to perform a ^ regex search (like your current code) but for each column, and then collect all results. Per column regex search only use the column content for the match:
Update: Ok so the per column search was definitively more complex than expected because simply searching each column would create an AND search. For example searching for "Airi" would also require that "Airi" is in all other columns.
To solve this particular a custom search function must be built. Here you can find it: https://codepen.io/adelriosantiago/pen/XWKGoLx?editors=1011
This custom function OR searches through the columns using the $.fn.dataTable.ext.search.push function.

Related

Exact match multiple words in regex (datatables)

I'm trying to create a regex search in Datatables which matches a user who belongs to all selected groups.
Example:
User A belongs to Users, Other, Other Users
User B belongs to Users, Other
If Users and Other Users are selected as filters, only user A should show in the table. The problem I'm having is that both users are showing when these filters are selected. I don't think the regex is matching exact strings and, after looking through multiple other answers, I don't seem to be able to get it to do so.
My solution:
if (!this.items.length) {
table.column(i).search('');
} else {
let regex = '^';
this.items.forEach(v => regex += `(?=.*\\b${v}\\b)`);
regex += '.*$'
table.column(i).search(regex, true, false, true)
}
Which results in: ^(?=.*\bUsers\b)(?=.*\bOther Users\b).*$
However, the user belonging to Users,Other is still being returned.

You can enforce a comma or start/end of string check before and after each of your search term:
this.items.forEach(v => regex += "(?=.*(?:[^,]|^)${v}(?![^,]))");
Or, if the JavaScript environment supports lookbehinds:
this.items.forEach(v => regex += "(?=.*(?<![^,])${v}(?![^,]))");
The (?:[^,]|^) / (?<![^,]) (equal to (?<=,|^)) part requires start of string position or a comma right before your search term and the (?![^,]) negative lookahead requires a comma or end of string immediately to the right of the current position ((?![^,]) is equal to (?=,|$) positive lookahead).

How to get editable categories from a MediaWiki page

I am trying to get all the editable categories from a page for a user script I am working on on Wikipedia.
What I am struggling is figuring out how to filter the uneditable categories from the editable categories.
Here is the code that is supposed to get the list of editable categories (so far):
var categorylist = mw.config.get("wgCategories");
if (categorylist.length == 0) {
...
} else {
$.get(mw.config.get("wgScriptPath") + "/api.php", {
action: "parse",
format: "json",
page: mw.config.get("wgPageName"),
prop: "wikitext"
}).done(function(result) {
var editablelist = result.parse.wikitext.match(/* some regex here */);
})
}
I have been experimenting with RegExr to figure out what will match all the category links: /\[\[Category:.*\]\]/g
I do not want to match the spaces between category links, just the individual category links.
Is there an efficient way to match all the editable categories from the wikitext of a MediaWiki page?

For not matching the spaces between category links, use the following regex:
\[\[Category:[^\[\]]*\]\]
See here a demo.
[^\[\]] means "a character that is neither an open nor close squared brackets".
If you don't want to match neither the squared brackets nor eventual sortkeys, you will need a more complex regex:
(?<=\[\[)Category:[^\[\]|]*(?=(?:\|[^\[\]]*)?\]\])
Take a look here for a regex demo.
Now you are ensuring that the expression is preceded by two open squared brackets, followed by an optional sortkey (\|[^\[\]]*) and finaly by two closed squared brackets.
Every sub-expression contained within lookarounds ((?<=X) or (?=X)) is not captured.

Regex or substring operation to strip out a URL from a keyword onwards [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I'm struggling to figure out the best way to strip out all the content in a URL from a specific keyword onwards (including the keyword), using either regex or a substring operation. So if I have an example dynamic URL http://example.com/category/subcat/filter/size/1/ - I would like to strip out the /filter/size/1 element of the URL and leave me with the remaining URL as a separate string. Grateful for any pointers. I should clarify that the number of arguments after the filter keyword isn't fixed and could be more than in my example and the number of category arguments prior to the filter keyword isn't fixed either

To be a little safer you could use the URL object to handle most of the parsing and then
just sanitize the pathname.
const filteredUrl = 'http://example.com/category/subcat/filter/test?param1&param2=test';
console.log(unfilterUrl(filteredUrl));
function unfilterUrl(urlString) {
const url = new URL(urlString);
url.pathname = url.pathname.replace(/(?<=\/)filter(\/|$).*/i, '');
return url.toString();
}

You can tweak this a little based on your need. Like it might be the case where filter is not present in the URL. but lets assume it is present then consider the following regex expression.
/(.*)\/filter\/(.*)/g
the first captured group ( can be obtained by $1 ) is the portion of the string behind the filter keyword and the second captured group ( obtained by $2 ) will contain all your filters present after the filter keyword
have a look at example i tried on regextester.com

Use the split() function.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split
url='http://example.com/category/subcat/filter/size/1/';
console.log(url.split('/filter')[0]);

Split
The simplest solution that occurs to me is the following:
const url = 'http://example.com/category/subcat/filter/size/1/';
const [base, filter] = url.split('/filter/');
// where:
// base == 'http://example.com/category/subcat'
// filter == 'size/1/'
If you expect more than one occurrence of '/filter/', use the limit parameter of String.split(): url.split('/filter/', 2);
RegExp
The assumption of the above is that after the filter parameter, everything is part of the filter. If you need more granularity, you can use a regex that terminates at the '?', for example. This will remove everything from 'filter/anything/that/follows' that immediately follows a / and until the first query string separator ?, not including.
const filterRegex = /(?<=\/)filter(\/|$)[^?]*/i;
function parseURL(url) {
const match = url.match(filterRegex);
if (!match) { return [url, null, null]; } // expect anything
const stripped = url.replace(filterRegex, '');
return [url, stripped, match[0]];
}
const [full, stripped, filter] = parseURL('http://example.com/category/subcat/filter/size/1/?query=string');
// where:
// stripped == 'http://example.com/category/subcat/?query=string'
// filter == 'filter/size/1/'

I'm sadly not able to post the full answer here, as i'ts telling me 'it looks like spam'. I created a gist with the original answer. In it i talk about the details of String.prototype.match and of JS/ES regex in general including named capture groups and pitfalls. And incude a link to a great regex tool: regex101. I'm not posting the link here in fear of triggering the filter again. But back to the topic:
In short, a simple regext can be used to split and format it (using filter as the keyword):
/^(.*)(\/filter\/.*)$/
or with named groups:
/^(?<main>.*)(?<stripped>\/filter\/.*)$/
(note that the forward slashes need to be escaped in a regex literal)
Using String.prototype.match with that regex will return an array of the matches: index 1 will be the first capture group (so everything before the keyword), index 2 will be everything after that (including the keyword).
Again, all the details can be found in the gist

remove Niqqud from string in javascript

I have the exact problem described here:
removing Hebrew "niqqud" using r
Have been struggling to remove niqqud ( diacritical signs used to represent vowels or distinguish between alternative pronunciations of letters of the Hebrew alphabet). I have for instance this variable: sample1 <- "הֻסְמַק"
And i cannot find effective way to remove the signs below the letters.
But in my case i have to do this in javascript.
Based of UTF-8 values table described here, I have tried this regex without success.

Just a slight problem with your regex. Try the following:
const input = "הֻסְמַק";
console.log(input)
console.log(input.replace(/[\u0591-\u05C7]/g, ''));
/*
$ node index.js
הֻסְמַק
הסמק
*/

nj_’s answer is great.
Just to add a bit (because I don’t have enough reputation points to comment directly) -
[\u0591-\u05C7] may be too broad a brush. See the relevant table here: https://en.wikipedia.org/wiki/Unicode_and_HTML_for_the_Hebrew_alphabet#Compact_table
Rows 059x and 05AX are for t'amim (accents/cantillation marks).
Niqud per se is in rows 05Bx and 05Cx.
And as Avraham commented, you can run into an issues if 2 words are joined by a makaf (05BE), then by removing that you will end up with run-on words.
If you want to remove only t’amim but keep nikud, use /[\u0591-\u05AF]/g. If you want to avoid the issue raised by Avraham, you have 2 options - either keep the maqaf, or replace it with a dash:
//keep the original makafim
const input = "כִּי־טוֹב"
console.log(input)
console.log(input.replace(/([\u05B0-\u05BD]|[\u05BF-\u05C7])/g,""));
//replace makafim with dashes
console.log(input.replace(/\u05BE/g,"-").replace(/[\u05B0-\u05C7]/g,""))
/*
$ node index.js
כִּי־טֽוֹב
כי־טוב
כי-טוב
*/

how do I get this regex to mimic a look-behind?

/(([$]*)([A-Z]{1,3})([$]*)([0-9]{1,5}))/gi
Regex running on Debuggex
This is for pulling cell refs out of spreadsheet formulas and checking to see if the formula contains an absolute ref. The problem is that it's matching an invalid cell, the last one here:
a1
$a1
$A$5
A5*4
A20+45
A34/A$23
A1*6
A1*A45
$AAA11
AAA33
AA33:A33
$AAAAA44 // <-- not a valid cell!
It's matching the AAA44 in $AAAAA44, but it shouldn't. All the rest of the capture groups etc are working correctly -- each of those rows but the last one are correctly grabbing 1 or more cell refs. A negative lookahead seems like the right way to go, but after mucking with it for a good long while I must admit to being stuck.

If you can't match for ^...$ then you may still be able to introduce some \b matching
/foo\bbar/.test('foobar'); // false
/foo\b\d/.test('foo1'); // false
/foo\b.\d/.test('foo+1'); // true
So your RegExp would look like (I left in your capture groups)
var re = /(?:\b|^)((\$?)([a-z]{1,3})(\$?)(\d{1,5}))(?:\b|$)/i;
re.test('$AAAAA44'); // false
re.test('$AAA44'); // true
Demo

We Keep Coding

JavaScript is the programming language of the Web.

DataTables global Regex Search in each column - javascript

Related

Exact match multiple words in regex (datatables)

How to get editable categories from a MediaWiki page

Regex or substring operation to strip out a URL from a keyword onwards [duplicate]

remove Niqqud from string in javascript

how do I get this regex to mimic a look-behind?

Categories

Resources