How to get editable categories from a MediaWiki page - javascript

I am trying to get all the editable categories from a page for a user script I am working on on Wikipedia.
What I am struggling is figuring out how to filter the uneditable categories from the editable categories.
Here is the code that is supposed to get the list of editable categories (so far):
var categorylist = mw.config.get("wgCategories");
if (categorylist.length == 0) {
...
} else {
$.get(mw.config.get("wgScriptPath") + "/api.php", {
action: "parse",
format: "json",
page: mw.config.get("wgPageName"),
prop: "wikitext"
}).done(function(result) {
var editablelist = result.parse.wikitext.match(/* some regex here */);
})
}
I have been experimenting with RegExr to figure out what will match all the category links: /\[\[Category:.*\]\]/g
I do not want to match the spaces between category links, just the individual category links.
Is there an efficient way to match all the editable categories from the wikitext of a MediaWiki page?

For not matching the spaces between category links, use the following regex:
\[\[Category:[^\[\]]*\]\]
See here a demo.
[^\[\]] means "a character that is neither an open nor close squared brackets".
If you don't want to match neither the squared brackets nor eventual sortkeys, you will need a more complex regex:
(?<=\[\[)Category:[^\[\]|]*(?=(?:\|[^\[\]]*)?\]\])
Take a look here for a regex demo.
Now you are ensuring that the expression is preceded by two open squared brackets, followed by an optional sortkey (\|[^\[\]]*) and finaly by two closed squared brackets.
Every sub-expression contained within lookarounds ((?<=X) or (?=X)) is not captured.

Related

Exact match multiple words in regex (datatables)

I'm trying to create a regex search in Datatables which matches a user who belongs to all selected groups.
Example:
User A belongs to Users, Other, Other Users
User B belongs to Users, Other
If Users and Other Users are selected as filters, only user A should show in the table. The problem I'm having is that both users are showing when these filters are selected. I don't think the regex is matching exact strings and, after looking through multiple other answers, I don't seem to be able to get it to do so.
My solution:
if (!this.items.length) {
table.column(i).search('');
} else {
let regex = '^';
this.items.forEach(v => regex += `(?=.*\\b${v}\\b)`);
regex += '.*$'
table.column(i).search(regex, true, false, true)
}
Which results in: ^(?=.*\bUsers\b)(?=.*\bOther Users\b).*$
However, the user belonging to Users,Other is still being returned.
You can enforce a comma or start/end of string check before and after each of your search term:
this.items.forEach(v => regex += "(?=.*(?:[^,]|^)${v}(?![^,]))");
Or, if the JavaScript environment supports lookbehinds:
this.items.forEach(v => regex += "(?=.*(?<![^,])${v}(?![^,]))");
The (?:[^,]|^) / (?<![^,]) (equal to (?<=,|^)) part requires start of string position or a comma right before your search term and the (?![^,]) negative lookahead requires a comma or end of string immediately to the right of the current position ((?![^,]) is equal to (?=,|$) positive lookahead).

DataTables global Regex Search in each column

I'd like to implement a feature, where you can start your search string with $ to get a startsWith search in all searchable columns.
This is what I've tried so far:
$('#myinput').on('keyup', (event) => {
let searchValue = $(event.currentTarget).val();
if (searchValue.startsWith('$')) {
searchValue = `^${searchValue.substr(1)}`;
}
this.dataTable.search(searchValue, true, false, false).draw();
});
But apparently, this only searches in the first column. If I don't use ^ in my search, it searches all columns. How can I check if any of the columns start with myValue instead of the entire row data?
Can be reproduced on https://datatables.net/examples/api/regex.html.
Enable global Regex and search ^Airi and after that ^Accountant.
You will get results for Airi, but not for Accountant.
How can I make the search for ^Accountant still display the first entry, as the second column starts with Accountant?
That happens because when global regex is set to true it will treat the whole row as text. Therefore your search should look like:
$('#myinput').on('keyup', (event) => {
let searchValue = $(event.currentTarget).val();
if (searchValue.startsWith('$')) {
searchValue = `\b${searchValue.substr(1)}`;
}
this.dataTable.search(searchValue, true, false, false).draw();
});
Regex token \b stands for "boundary". Each start of a word creates a new boundary (here is the Regex spec: https://regex101.com/r/s0MdwW/1). You can also try how it works in the Datatables website by searching "\bTok" which should make "Airi" appear.
Note however that there is a catch: Searching for "Sat" will match "Airi Satou" since there is a boundary between the words. If you want to avoid that particular case then you would simply need to perform a ^ regex search (like your current code) but for each column, and then collect all results. Per column regex search only use the column content for the match:
Update: Ok so the per column search was definitively more complex than expected because simply searching each column would create an AND search. For example searching for "Airi" would also require that "Airi" is in all other columns.
To solve this particular a custom search function must be built. Here you can find it: https://codepen.io/adelriosantiago/pen/XWKGoLx?editors=1011
This custom function OR searches through the columns using the $.fn.dataTable.ext.search.push function.

Velocity RegEx for case insensitive match

I want to implement a sort of Glossary in a Velocity template (using Javascript). The following is the use case:
there are a large number of items whose description may contain references to predefined terms
there is a list of terms which are defined -> allGlossary
I want to automatically find and mark all items in the allGlossary list that appear in the description of all my items
Example:
allGlossary = ["GUI","RBG","fine","Color Range"]
Item Description: The interface (GUI) shall be generated using only a pre-defined RGB color range.
After running the script, I would expect the Description to look like this:
"The interface (GUI) shall be generated using only a pre-defined RGB Color Range."
NOTE: even though "fine" does appear in the Description (defined), it shall not be marked.
I thought of splitting the description of each item into words but then I miss all the glossary items which have more than 1 word. My current idea is to look for each item in the list in each of the descriptions but I have the following limitations:
I need to find only matches for the exact items in the 2 lists (single and multiple words)
The search has to be case insensitive
Items found may be at the beginning or end of the text and separated by various symbols: space, comma, period, parentheses, brackets, etc.
I have the following code which works but is not case-insensitive:
#set($desc = $item.description)
#foreach($g in $allGlossary)
#set($desc = $desc.replaceAll("\b$g\b", "*$g*"))
#end##foreach
Can someone please help with making this case-insensitive? Or does anyone have a better way to do this?
Thanks!
UPDATE:
based on the answer below, I tried to do the following in my Velocity Template page:
#set($allGlossary = ["GUI","RGB","fine","Color Range"])
#set($itemDescription = "The interface (GUI) shall be generated using only a pre-defined RGB color range.")
<script type="text/javascript">
var allGlossary = new Array();
var itemDescription = "$itemDescription";
</script>
#foreach($a in $allGlossary)
<script type="text/javascript">
allGlossary.push("$a");
console.log(allGlossary);
</script>
#end##foreach
<script type="text/javascript">
console.log(allGlossary[0]);
</script>
The issue is that if I try to display the whole allGlossary Array, it contains the correct elements. As soon as I try to display just one of them (as in the example), I get the error Uncaught SyntaxError: missing ) after argument list.
You mentioned, that you are using JavaScript for these calculations. So one simple way would be to just iterate over your allGlossary array and create a regular expression for each iteration and use this expression to find and replace all occurrences in the text.
To find only values which are between word boundaries, you can use \b. This allows matches like (RGB) or Color Range?. To match case insensitive, you can use the /i flag and to find every instance in a string (not just the first one), you can use the global flag /g.
Dynamic creation of regular expressions (with variables in it) is only supported in JavaScript, if you use the constructor notation of a regular expression (don't forget to escape slashes). For static regular expressions, you could also use: /\bRGB\b/ig. Here is the dynamic one:
new RegExp("\\b("+item+")\\b", 'gi');
Here is a fully functional example based on your sample string. It replaces every item of the allGlossary array with a HTML wrapped version of it.
var allGlossary = ["GUI","RGB","fine","Color Range"]
var itemDescription = "The interface (GUI) shall be generated using only a pre-defined RGB color range.";
for(var i=0; i<allGlossary.length; i++) {
var item = allGlossary[i];
var regex = new RegExp("\\b("+item+")\\b", 'gi');
itemDescription = itemDescription.replace(regex, "<b>$1</b>");
}
console.log(itemDescription);
If this is not your expected solution, you can leave a comment below.

How to highlight a substring containing a random character between two known characters using javascript?

I have a bunch of strings in a data frame as given below.
v1 v2
ARSTNFGATTATNMGATGHTGNKGTEEFR SEQUENCE1
BRCTNIGATGATNLGATGHTGNQGTEEFR SEQUENCE2
ARSTNFGATTATNMGATGHTGNKGTEEFR SEQUENCE3
I want to search and highlight some selected substrings within each string in v1 column. For example, assuming first letter in the substring being searched as "N" and the last letter as "G", and the middle one could be any letter as in "NAG" or "NBG" or "NCG" or "NDG" and so on. To highlight the substring of three characters as shown below, I am writing 26 lines of code to display in R Shiny tab assuming there could be any of the 26 letters in between "N" and "G". I am just trying to optimize the code. I am new to JS. Hope I was clear. If not before down voting please let me know should you need more explanation or details.
ARSTNFGATTATNMGATGHTGNKGTEEFR
BRCTNIGATGATNLGATGHTGNQGTEEFR
ARSTNFGATTATNMGATGHTGNKGTEEFR
The abridged code with representative 2 lines (first and last line) of the 26 lines of the code I use are provided here.
datatable(DF, options = list(rowCallback=JS("function(row,data) {
data[0] = data[0].replace(/NAG/g,'<span style=\"color:blue; font-weight:bold\">NAG</span>');
.....
data[0] = data[0].replace(/NZG/g, '<span style=\"color:blue; font-weight:bold\"\">NZG</span>');
$('td:eq(0)', row).html(data[0]);}"), dom = 't'))
I think the regex you want is: /N[A-Z]G/g
If you also want it to work for lower case: /N[A-Za-z]G/g
I found a simple solution. May be it will be useful to someone like me.
datatable(DF, options = list(rowCallback = JS("function(row,data) {
data[0] = data[0].replace(/N[A-Z]G/g,'<span style=\"color:blue; font-weight:bold\">$&</span>');
$('td:eq(0)', row).html(data[0]);}"), dom = 't'))

Wrap variable words in a span

Based on a variety of user inputs, we put an array of words in a hidden div (#words) and then perform functions using that info.
What I would like to do is check the div for the existing words, i.e.:
terms = $("#words").html();
And then, in a visible and separate div elsewhere on the page (.ocrText), wrap only those words in a strong tag.
$('.ocrText').each(function() {
$(this).html($(this).html().replace(/term/g, "<strong>term</strong>"));
});
So, if they'd searched for "Tallant France" and we stored that, then the following sentence:
"Mark Tallant served in France."
Would become:
"Mark <strong>Tallant</strong> served in <strong>France</strong>."
But I don't know how to inject that variable in to .replace()
///
EDIT: The terms are inserted in to the #words div in this format:
["Tallant","France","War"] ... and so on.
$('.ocrText').each(function() {
var term = 'term'
var pattern = RegExp(term, 'g')
$(this).html($(this).html().replace(pattern, "<strong>" + term + "</strong>"));
});
Assuming your words contain only alphnumeric characters you can construct a single regexp to search of all of them at once as follows:
html = html.replace (
new RegExp(terms.split(/\s*,\s*|\s+/).join('|'), 'g'), '<strong>$&</strong>');
The split should convert the terms string into an array containing the individual words, in the example I have coded it to split on commas optionally surround by whitespace or just whitespace.

Categories