How to exclusively detect subdomains of a URL with a regular expression

How to exclusively detect subdomains of a URL with a regular expression - javascript

I am making a chrome extension that is given a list of domains that needs to be compared against the active URL of a tab. For example if the list of domains has
"google" then the extension should detect "docs.google.com" as part of the domain list. I have gotten this part to work. The issue is when the domain list contains a subdomain. For example: if "docs.google" is on the list then if the user is on "google.com" the extension should not recognize this as a URL on the domain list.
I am attempting this by constructing a regular expression. for each domain and subdomain. As I said, when you are given a domain (as opposed to a subdomain) it works properly although I have tested this with subdomains and it does not seem to work. I assume the issue is with how I constructed the RegEx. Anything that stands out? thank you in advance!
let onDomainList = false;
for(let i = 0; i < domainListLength-1; i++){
if(!domainList[i].includes(".")){ //if this domain is not a subdomain
let strPattern = "^https://www\\." + list.domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + "|https://[a-z_]+\\." + list.domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
let domainRegEx = new RegExp(strPattern,'i');
if(domainRegEx.test(activeTab.url)){
onDomainList = true;
execute_script(activeTab);
}
} else{ //if this domain is a subdomain
let strPattern = "^https://www\\." + list.domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
let domainRegEx = new RegExp(strPattern,'i');
if(domainRegEx.test(activeTab.url)){
onDomainList = true;
execute_script(activeTab);
}
}
}
EDIT: Changed RegEx to what Wiktor Stribizew suggested, although still the issue of not detecting subdomains.

Here is a fixed snippet:
let onDomainList = false;
for (let i = 0; i < domainListLength - 1; i++) {
if (!domainList[i].includes(".")) { //if this domain is not a subdomain
let strPattern =
let strPattern = "^https://www\\." + domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + "|https://[a-z_]+\\." + domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
let domainRegEx = new RegExp(strPattern, 'i');
if (domainRegEx.test(activeTab.url)) {
onDomainList = true;
execute_script(activeTab);
}
} else { //if this domain is a subdomain
let strPattern = "^https://(?:[^\\s/]*\\.)?" + list.domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
let domainRegEx = new RegExp(strPattern, 'i');
if (domainRegEx.test(activeTab.url)) {
onDomainList = true;
execute_script(activeTab);
}
}
}
Notes:
Since you are using a RegExp constructor notation, and define the regex with a regular string literal, you need to properly introduce backslashes used to escape special chars. Here, there is no need to escape / and the . needs two backslashes, the "\\." string literal is actually a \. text
The variable texts need escaping to be used properly in the code, hence domainList[i].replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&')
The / before ^ renders the regex useless since there can be no / before the start of string, and thus /^ is a regex that never matches any string. / as regex delimiters should not be used in RegExp constructor notation
A subdomain regex does not actually match anything but https://www. + the domain from your list. To allow anything before the domain, you can replace www\. with (?:[^\s/]*\.)? that matches an optional sequence ((?:...)? is an optional non-capturing group) of zero or more chars other than whitespace and / (with the [^\/s]* negated character class) and then a dot.

Related

Javascript Regular Expression with multiple conditions

I am using Nginx njs module for some url modifications.
My use case is to return the redirection uri for the given uri.
URI's will be as follows:
/books
/books/economic-genious
/books/flight-mechanics
My regular expression to match the above URI's as follows -
/books/(.*)|/books$
First part of expression /books/(.*) is to match below URI's:
/books/economic-genious
/books/flight-mechanics
Second part of expression /books$ is to match below URI's:
/books
My destination is configured as follows: /ebooks/$1. So that the above URI's will be converted to:
/ebooks
/ebooks/economic-genious
/ebooks/flight-mechanics
Javascript code:
function getMappedURI(uri) {
var exp = new RegExp('/books/(.*)|/books$');
var destUri = '/ebooks/$1';
var redirectUri = uri.replace(exp, destUri);
return redirectUri;
}
Above code is working fine for the below URI's:
/books/economic-genious
/books/flight-mechanics
But for the URI /books, it should return /ebooks/. But it is appending some non printable special character at the end of /ebooks/.
I think it is trying to replace $1 with some special character.
How to avoid adding of special character at the end ?

Try with this regex: \/books(\/(.*))?$
Demo here...
code:
function getMappedURI(uri) {
var exp = new RegExp('\/books(\/(.*))?$');
var destUri = '/ebooks$1';
var redirectUri = uri.replace(exp, destUri);
return redirectUri;
}

The OR | operator only works in parens. So you should make the match to (/books/(.*)|/books$) and I don't think the $ word match because, for anything to be matched It should be in parens, too, making the new match URL: (/books/(.*)|/books). You'll then have to use $2 instead of $1 as substitute instead.
function getMappedURI(uri) {
var exp = new RegExp('(/books/(.*)|/books)');
var destUri = '/ebooks/$2';
var redirectUri = uri.replace(exp, destUri);
return redirectUri;
}
But, if you want to want everything from /books/foo to /ebooks/foo, use this instead: /books/(.*) with $1 as substitute.
function getMappedURI(uri) {
var exp = new RegExp('/books/(.*)');
var destUri = '/ebooks/$1';
var redirectUri = uri.replace(exp, destUri);
return redirectUri;
}

Javascript replace all "%20" with a space

Is there a way to replace every "%20" with a space using JavaScript. I know how to replace a single "%20" with a space but how do I replace all of them?
var str = "Passwords%20do%20not%20match";
var replaced = str.replace("%20", " "); // "Passwords do%20not%20match"

Check this out:
How to replace all occurrences of a string in JavaScript?
Short answer:
str.replace(/%20/g, " ");
EDIT:
In this case you could also do the following:
decodeURI(str)

The percentage % sign followed by two hexadecimal numbers (UTF-8 character representation) typically denotes a string which has been encoded to be part of a URI. This ensures that characters that would otherwise have special meaning don't interfere. In your case %20 is immediately recognisable as a whitespace character - while not really having any meaning in a URI it is encoded in order to avoid breaking the string into multiple "parts".
Don't get me wrong, regex is the bomb! However any web technology worth caring about will already have tools available in it's library to handle standards like this for you. Why re-invent the wheel...?
var str = 'xPasswords%20do%20not%20match';
console.log( decodeURI(str) ); // "xPasswords do not match"
Javascript has both decodeURI and decodeURIComponent which differ slightly in respect to their encodeURI and encodeURIComponent counterparts - you should familiarise yourself with the documentation.

Use the global flag in regexp:
var replaced = str.replace(/%20/g, " ");
^

using unescape(stringValue)
var str = "Passwords%20do%20not%20match%21";
document.write(unescape(str))
//Output
Passwords do not match!
use decodeURI(stringValue)
var str = "Passwords%20do%20not%20match%21";
document.write(decodeURI(str))
Space = %20
? = %3F
! = %21
# = %23
...etc

This method uses the decodeURIComponent() (See edit below) method, which is the best one.
var str = "Passwords%20do%20not%20match%21";
alert(decodeURIComponent(str))
Here it how it works:
Space = %20
? = %3F
! = %21
# = %23
...etc
There's a good example of that at the Mozilla docs [https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/decodeURI]
Edit: decodeURIComponent works better, look at the example.

If you want to use jQuery you can use .replaceAll()

If you need to remove white spaces at the end then here is a solution:
https://www.geeksforgeeks.org/urlify-given-string-replace-spaces/
const stringQ1 = (string)=>{
//remove white space at the end
const arrString = string.split("")
for(let i = arrString.length -1 ; i>=0 ; i--){
let char = arrString[i];
if(char.indexOf(" ") >=0){
arrString.splice(i,1)
}else{
break;
}
}
let start =0;
let end = arrString.length -1;
//add %20
while(start < end){
if(arrString[start].indexOf(' ') >=0){
arrString[start] ="%20"
}
start++;
}
return arrString.join('');
}
console.log(stringQ1("Mr John Smith "))

Remove special character from the starting of a string and search # symbol.in javascript

I want to remove special characters from the starting of the string only.
i.e, if my string is like {abc#xyz.com then I want to remove the { from the starting. The string shoould look like abc#xyz.com
But if my string is like abc{#xyz.com then I want to retain the same string as it is ie., abc{#xyz.com.
Also I want to check that if my string has # symbol present or not. If it is present then OK else show a message.

The following demonstrates what you specified (or it's close):
var pat = /^[^a-z0-9]*([a-z0-9].*?#.*?$)/i; //pattern for optional non-alphabetic start followed by alphabetic, followed by '#' somewhere
var testString = "{abc#xyz.com"; //Try with {abcxyz.com for alert
arr = pat.exec(testString);
var adjustedString;
if (arr != null) { adjustedString = arr[1]; } //The potentially adjustedString (chopped off non-alphabetic start) will be in capture group 1
else { adjustedString = ""; alert(testString + " does not conform to pattern"); }
adjustedString;

I have used two separate regex objects to achieve what you require .It checks for both the conditions in the string.I know its not very efficient but it will serve your purpose.
var regex = new RegExp(/(^{)/);
var regex1 = new RegExp(/(^[^#]*$)/);
var str = "abc#gmail.com";
if(!regex1.test(str)){
if(regex.test(str))
alert("Bracket found at the beginning")
else
alert("Bracket not found at the beginning")
}
else{
alert("doesnt contain #");
}
Hope this helps

Regex Wildcard for Array Search

I have a json array that I currently search through by flipping a boolean flag:
for (var c=0; c<json.archives.length; c++) {
if ((json.archives[c].archive_num.toLowerCase().indexOf(query)>-1)){
inSearch = true;
} }
And I have been trying to create a wildcard regex search by using a special character '*' but I haven't been able to loop through the array with my wildcard.
So what I'm trying to accomplish is when query = '199*', replace the '*' with /[\w]/ and essentially search for 1990,1991,1992,1993,1994 + ... + 199a,199b, etc.
All my attempts turn literal and I end up searching '199/[\w]/'.
Any ideas on how to create a regex wildcard to search an array?
Thanks!

You should write something like this:
var query = '199*';
var queryPattern = query.replace(/\*/g, '\\w');
var queryRegex = new RegExp(queryPattern, 'i');
Next, to check each word:
if(json.archives[c].archive_num.match(queryRegex))
Notes:
Consider using ? instead of *, * usually stands for many letters, not one.
Note that we have to escape the backslash so it will create a valid string literal. The string '\w' is the same as the string w - the escape is ignored in this case.
You don't need delimiters (/.../) when creating a RegExp object from a string.
[\w] is the same as \w. Yeah, minor one.
You can avoid partial matching by using the pattern:
var queryPattern = '\\b' query.replace(/\*/g, '\\w') + '\\b';
Or, similarly:
var queryPattern = '^' query.replace(/\*/g, '\\w') + '$';

var qre = query.replace(/[^\w\s]/g, "\\$&") // escape special chars so they dont mess up the regex
.replace("\\*", "\\w"); // replace the now escaped * with '\w'
qre = new RegExp(qre, "i"); // create a regex object from the built string
if(json.archives[c].archive_num.match(qre)){
//...
}

Validate twitter URL

I am trying to validate a twitter url, so that at least it contains a username. I do not care if it exists or not, just that there is one.
I am using the below javascript regex
var re = new RegExp('((http://)|(www\.))twitter\.com/(\w+)');
alert(re.test('http://twitter.com/test_user'));
but it is not working.
The strange thing is that I created and tested the above regex at this URL
http://www.regular-expressions.info/javascriptexample.html
where it works just fine.
Any ideas?
Thanks

function isTwitterHandle(handle) {
if (handle.match(/^((?:http:\/\/)?|(?:https:\/\/)?)?(?:www\.)?twitter\.com\/(\w+)$/i) || handle.match(/^#?(\w+)$/)) return true;
return false;
}

You need to escape the backslashes in the escape sequences too:
var re = new RegExp('((http://)|(www\\.))twitter\\.com/(\\w+)');
And I would recommend this regular expression:
new RegExp('^(?:http://)?(?:www\\.)?twitter\\.com/(\\w+)$', 'i')

It's because of the way you're defining the regex by using a string literal. You need to escape the escape characters (double backslash):
'^(http://)?(www\.)?twitter\.com/(\\w+)'
In the above, I also changed the start so that it would match http://www.twitter.com/test_user.
Alternatively, use the RegExp literal syntax, though this means you have to escape /:
var re = /^http:\/\/)?(www\.)?twitter\.com\/(\w+)/;

-http://twitter.com/username (this is http)
-https://twitter.com/username (this is https)
-twitter.com/username (without http)
-#username( with #)
-username (without #)
var username = "#test";
var r1 = new RegExp('^((?:http://)?|(?:https://)?)?(?:www\\.)?twitter\\.com/(\\w+)$', 'i');
if (r1.test(username) == false) {
var r2 = new RegExp('^#?(\\w+)$', 'j');
if (r2.test(username) == true)
return true;
else
return false;
} else {
return true;
}

We Keep Coding

JavaScript is the programming language of the Web.

How to exclusively detect subdomains of a URL with a regular expression - javascript

Related

Javascript Regular Expression with multiple conditions

Javascript replace all "%20" with a space

Remove special character from the starting of a string and search # symbol.in javascript

Regex Wildcard for Array Search

Validate twitter URL

Categories

Resources