JS regex return string from url - javascript

I have the following URL structure:
https://api.bestschool.com/student/1102003120009/tests/json
I want to cut the student ID from the URL. So far I've came up with this:
/(student\/.*[^\/]*)/
which returns
student/1102003120009/tests/json
I only want the ID.

Your regex (student\/.*[^\/]*) matches and captures into Group 1 a literal sequence student/, then matches any characters other than a newline, 0 or more occurrences (.*) - that can match the whole line at once! - and then 0 or more characters other than /. It does not work because of .*. Also, a capturing group should be moved to the [^\/]* pattern.
You can use the following regex and grab Group 1 value:
student\/([^\/]*)
See regex demo
The regex matches student/ literally, and then matches and captures into Group 1 zero or more symbols other than /.
Alternatively, if you want to avoid using capturing, and assuming that the ID is always numeric and is followed by /tests/, you can use the following regex:
\d+(?=\/tests\/)
The \d+ matches 1 or more digits, and (?=\/tests\/) checks if right after the digits there is a /tests/ character sequence.
var re = /student\/([^\/]*)/;
var str = 'https://api.bestschool.com/student/1102003120009/tests/json';
var m = str.match(re);
if (m !== null) {
document.getElementById("r").innerHTML = "First method : " + m[1] + "<br/>";
}
var m2 = str.match(/\d+(?=\/tests\/)/);
if (m2 !== null) {
document.getElementById("r").innerHTML += "Second method: " + m2;
}
<div id="r"/>

Related

Check if first and last character contains given special char

I have input string
..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''
I want to check if the first and last char place contains - or ' or ..
If yes then trim until we get name.
Expected output : VAibhavs.sharma
I am using like this.
while (
myString.charAt(0) == "." ||
myString.charAt(0) == "'" ||
myString.charAt(0) == "-" ||
myString.charAt(myString.length - 1) == "." ||
myString.charAt(myString.length - 1) == "'" ||
myString.charAt(myString.length - 1) == "-"
)
I know this is not correct way. How can I use regex?
I tried /^\'$. But this only checks or first char for a single special char.
You can use regular expression:
input = "..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''"
output = input.replace(/^[-'\.]+/,"").replace(/[-'\.]+$/,"")
console.log(output)
[-'\.] ... -, ' or . character
+ ... one or more times
^ ... beginning of the string
$ ... end of the string
EDIT:
using match:
input = "..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''"
output = input.match(/^[-'\.]+(.*?)[-'\.]+$/)[1]
console.log(output)
(...) ... (1st) group
.*? ... any chacter, zero or more times, ? means non-greedy
.match(...)[1] ... 1 means 1st group
There is already one accepted answer but still, this is how I would do.
var pattern = /\b[A-Za-z.]+\b/gm;
var str = "..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''";
console.log(str.match(pattern));
// Output
// ["VAibhavs.sharma"]
\b is a zero-width word boundary. It matches positions where one side is a word character (usually a letter, digit or underscore) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).

Finding exact words in text, excluding quoted words

In the javascript code below I need to find in a text exact words, but excluding the words that are between quotes. This is my attempt, what's wrong with the regex? It should find all the words excluding word22 and "word3". If I use only \b in the regex it selects exact words but it doesn't exclude the words between quotes.
var text = 'word1, word2, word22, "word3" and word4';
var words = [ 'word1', 'word2', 'word3' , 'word4' ];
words.forEach(function(word){
var re = new RegExp('\\b^"' + word + '^"\\b', 'i');
var pos = text.search(re);
if (pos > -1)
alert(word + " found in position " + pos);
});
First, we'll use a function to escape the characters of the word, just in case there's some that have special meaning for regexp.
// from https://stackoverflow.com/a/30851002/240443
function regExpEscape(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}
Then, we construct a regular expression as an alternation between individual word regexps. For each word, we assert that it starts with a word boundary, ends with a word boundary, and has an even number of quote characters between its end, and the end of string. (Note that from the end of word3, there is only one quote till the end of string, which is odd.)
let text = 'word1, word2, word22, "word3" and word4';
let words = [ 'word1', 'word2', 'word3' , 'word4' ];
let regexp = new RegExp(words.map(word =>
'\\b' + regExpEscape(word) + '\\b(?=(?:[^"]*"[^"]*")*[^"]*$)').join('|'), 'g')
text.match(regexp)
// => word1, word2, word4
while ((m = regexp.exec(text))) {
console.log(m[0], m.index);
}
// word1 0
// word2 7
// word4 34
EDIT: Actually, we can speed the regexp up a bit if we factor out the surrounding conditions:
let regexp = new RegExp(
'\\b(?:' +
words.map(regExpEscape).join('|') +
')\\b(?=(?:[^"]*"[^"]*")*[^"]*$)', 'g')
Your excluding of the quote character is wrong, that's actually matching the beginning of the string followed by a quote. Trying this instead
var re = new RegExp('\\b[^"]' + word + '[^"]\\b', 'i');
Also, this site is amazing to help you debug regex : https://regexpal.com
Edit: Because \b will match on quotation marks, this needs to be tweaked further. Unfortunately javascript doesn't support lookbehinds, so we have to get a little tricky.
var re = new RegExp('(?:^|[^"\\w])' + word + '(?:$|[^"\\w])','i')
So what this is doing is saying
(?: Don't capture this group
^ | [^"\w]) either match the start of the line, or any non word (alphanumeric and underscore) character that isn't a quote
word capture and match your word here
(?: Don't capture this group either
$|[^"\w) either match the end of the line, or any non word character that isn't a quote again

regex to remove number (year only) from string

I know the regex that separates two words as following:
input:
'WonderWorld'
output:
'Wonder World'
"WonderWorld".replace(/([A-Z])/g, ' $1');
Now I am looking to remove number in year format from string, what changes should be done in the above code to get:
input
'WonderWorld 2016'
output
'Wonder World'
You can match the location before an uppercase letter (but excluding the beginning of a line) with \B(?=[A-Z]) and match the trailing spaces if any with 4 digits right before the end (\s*\b\d{4}\b). In a callback, check if the match is not empty, and replace accordingly. If a match is empty, we matched the location before an uppercase letter (=> replace with a space) and if not, we matched the year at the end (=> replace with empty string). The four digit chunks are only matched as whole words due to the \b word boundaries around the \d{4}.
var re = /\B(?=[A-Z])|\s*\d{4}\b/g;
var str = 'WonderWorld 2016';
var result = str.replace(re, function(match) {
return match ? "" : " ";
});
document.body.innerHTML = "<pre>'" + result + "'</pre>";
A similar approach, just a different pattern for matching glued words (might turn out more reliable):
var re = /([a-z])(?=[A-Z])|\s*\b\d{4}\b/g;
var str = 'WonderWorld 2016';
var result = str.replace(re, function(match, group1) {
return group1 ? group1 + " " : "";
});
document.body.innerHTML = "<pre>'" + result + "'</pre>";
Here, ([a-z])(?=[A-Z]) matches and captures into Group 1 a lowercase letter that is followed with an uppercase one, and inside the callback, we check if Group 1 matched (with group1 ?). If it matched, we return the group1 + a space. If not, we matched the year at the end, and remove it.
Try this:
"WonderWorld 2016".replace(/([A-Z])|\b[0-9]{4}\b/g, ' $1')
How about this, a single regex to do what you want:
"WonderWorld 2016".replace(/([A-Z][a-z]+)([A-Z].*)\s.*/g, '$1 $2');
"Wonder World"
get everything apart from digits and spaces.
re-code of #Wiktor Stribiżew's solution:
str can be any "WonderWorld 2016" | "OneTwo 1000 ThreeFour" | "Ruby 1999 IamOnline"
str.replace(/([a-z])(?=[A-Z])|\s*\d{4}\b/g, function(m, g) {
return g ? g + " " : "";
});
import re
remove_year_regex = re.compile(r"[0-9]{4}")
Test regex expression here

RegExp match word till space or character

I'm trying to match all the words starting with # and words between 2 # (see example)
var str = "#The test# rain in #SPAIN stays mainly in the #plain";
var res = str.match(/(#)[^\s]+/gi);
The result will be ["#The", "#SPAIN", "#plain"] but it should be ["#The test#", "#SPAIN", "#plain"]
Extra: would be nice if the result would be without the #.
Does anyone has a solution for this?
You can use
/#\w+(?:(?: +\w+)*#)?/g
See the demo here
The regex matches:
# - a hash symbol
\w+ - one or more alphanumeric and underscore characters
(?:(?: +\w+)*#)? - one or zero occurrence of:
(?: +\w+)* - zero or more occurrences of one or more spaces followed with one or more word characters followed with
# - a hash symbol
NOTE: If there can be characters other than word characters (those in the [A-Za-z0-9_] range), you can replace \w with [^ #]:
/#[^ #]+(?:(?: +[^ #]+)*#)?/g
See another demo
var re = /#[^ #]+(?:(?: +[^ #]+)*#)?/g;
var str = '#The test-mode# rain in #SPAIN stays mainly in the #plain #SPAIN has #the test# and more #here';
var m = str.match(re);
if (m) {
// Using ES6 Arrow functions
m = m.map(s => s.replace(/#$/g, ''));
// ES5 Equivalent
/*m = m.map(function(s) {
return s.replace(/#$/g, '');
});*/ // getting rid of the trailing #
document.body.innerHTML = "<pre>" + JSON.stringify(m, 0, 4) + "</pre>";
}
You can also try this regex.
#(?:\b[\s\S]*?\b#|\w+)
(?: opens a non capture group for alternation
\b matches a word boundary
\w matches a word character
[\s\S] matches any character
See demo at regex101 (use with g global flag)

JS string replace only replacing every other occurence

I have the following JS:
"a a a a".replace(/(^|\s)a(\s|$)/g, '$1')
I expect the result to be '', but am instead getting 'a a'. Can anyone explain to me what I am doing wrong?
Clarification: What I am trying to do is remove all occurrences of 'a' that are surronded by whitespace (i.e. a whole token)
It's because this regex /(^|\s)a(\s|$)/g match the previous char and the next char to each a
in string "a a a a" the regex matches :
"a " , then the string to check become "a a a"$ (but now the start of the string is not the beginning and there is not space before)
" a " (the third a) , then become "a"$ (that not match because no space before)
Edit:
Little bit tricky but working (without regex):
var a = "a a a a";
// Handle beginning case 'a '
var startI = a.indexOf("a ");
if (startI === 0){
var off = a.charAt(startI + 2) !== "a" ? 2 : 1; // test if "a" come next to keep the space before
a = a.slice(startI + off);
}
// Handle middle case ' a '
var iOf = -1;
while ((iOf = a.indexOf(" a ")) > -1){
var off = a.charAt(iOf + 3) !== "a" ? 3 : 2; // same here
a = a.slice(0, iOf) + a.slice(iOf+off, a.length);
}
// Handle end case ' a'
var endI = a.indexOf(" a");
if (endI === a.length - 2){
a = a.slice(0, endI);
}
a; // ""
First "a " matches.
Then it will try to match against "a a a", which will skip first a, and then match "a ".
Then it will try to match against "a", which will not match.
First match will be replaced to beginning of line. => "^"
Then we have "a" that didn't match => "a"
Second match will be replaced to " " => " "
Then we have "a" that didn't match => "a"
The result will be "a a".
To get your desired result you can do this:
"a a a a".replace(/(?:\s+a(?=\s))+\s+|^a\s+(?=[^a]|$|a\S)|^a|\s*a$/g, '')
As others have tried to point out, the issue is that the regex consumes the surrounding spaces as part of the match. Here's a [hopefully] more straight forward explanation of why that regex doesn't work as you expect:
First let's breakdown the regex, it says match the a space or start of string, followed by an 'a' followed by a space or the end of the string.
Now let's apply it to the string. I've added character indexes beneath the string to make things easier to talk about:
a a a a
0123456
The regex looks at the 0 index char, and finds an 'a' at that location, followed by a space at index 2. This is a match because it is the start of the string, followed by an a followed by a space. The length of our match is 2 (the 'a' and the space), so we consume two characters and start our next search at index 2.
Character 2 ('a') is neither a space nor the start of the string, and therefore it doesn't match the start of our regular expression, so we consume that character (without replacing it) and move on to the next.
Character 3 is a space, followed by an 'a' followed by another space, which is a match for our regex. We replace it with an empty string, consume the length of the match (3 characters - " a ") and move on to index 6.
Character 6 ('a') is neither a space nor the start of the string, and therefore it doesn't match the start of our regular expression, so we consume that character (without replacing it) and move on to the next.
Now we're at the end of the string, so we're done.
The reason why the regex #caeth suggested (/(^|\s+)a(?=\s|$)/g) works is because of the ?= quantifier. From the MDN Regexp Documentation:
Matches x only if x is followed by y. For example, /Jack(?=Sprat)/ matches "Jack" only if it is followed by "Sprat". /Jack(?=Sprat|Frost)/ matches "Jack" only if it is followed by "Sprat" or "Frost". However, neither "Sprat" nor "Frost" is part of the match results.
So, in this case, the ?= quantifier checks to see if the following character is a space, without actually consuming that character.
(^|\s)a(?=\s|$)
Try this.Replace by $1.See demo.
https://regex101.com/r/gQ3kS4/3
Use this instead:
"a a a a".replace(/(^|\s*)a(\s|$)/g, '$1')
With "* this you replace all the "a" occurrences
Greetings
Or you can just split the string up, filter it and glue it back:
"a ba sl lf a df a a df r a".split(/\s+/).filter(function (x) { return x != "a" }).join(" ")
>>> "ba sl lf df df r"
"a a a a".split(/\s+/).filter(function (x) { return x != "a" }).join(" ")
>>> ""
Or in ECMAScript 6:
"a ba sl lf a df a a df r a".split(/\s+/).filter(x => x != "a").join(" ")
>>> "ba sl lf df df r"
"a a a a".split(/\s+/).filter(x => x != "a").join(" ")
>>> ""
I assume that there is no leading and trailing spaces. You can change the filter to x && x != 'a' if you want to remove the assumption.

Categories