Remove Last Instance Of Character From String - Javascript - Revisited - javascript

According to the accepted answer from this question, the following is the syntax for removing the last instance of a certain character from a string (In this case I want to remove the last &):
function remove (string) {
string = string.replace(/&([^&]*)$/, '$1');
return string;
}
console.log(remove("height=74&width=12&"));
But I'm trying to fully understand why it works.
According to regex101.com,
/&([^&]*)$/
& matches the character & literally (case sensitive)
1st Capturing Group ([^&]*)
Match a single character not present in the list below [^&]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
& matches the character & literally (case sensitive)
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
So if we're matching the character & literally with the first &:
Then why are we also "matching a single character not present in the following list"?
Seems counter productive.
And then, "$ asserts position at the end of the string" - what does this mean? That it starts searching for matches from the back of the string first?
And finally, what is the $1 doing in the replaceValue? Why is it $1 instead of an empty string? ""

1- The solution for that problem I think is different to the solution you want:
That regex will replace the last "&" no matter where it is, in the middle or in the end of the string.
If you apply this regex to this two examples you will see that the first get "incorrectly" replaced:
height=74&width=12&test=1
height=74&width=12&test=1&
They get replaced as :
height=74&width=12test=1
height=74&width=12&test=1
So to really replace the last "&" the only thing you need to do is :
string.replace(/&$/, '');
Now, if you want to replace the last ocurrence of "&" no matter where it is, I will explain that regex :
$1 Represents a (capturing group), everything inside those ([^&]*) are captured inside that $1. This is a oversimplification.
&([^&]*)$
& Will match a literal "&" then in the following capturing group this regex will look for any amount (0 to infinite) of characters (NOT EQUAL TO "&", explained latter) until the end of the string or line (Depending on the flag you use in the regex, /m for matching lines ). Anything captured in this capturing group will go to $1 when you apply the replacement.
So, If you apply this logic in your mind you will see that it will always match the last & and replace it with anything on its right that does not contain a single "&""
&(<nothing-like-a-&>*)<until-we-reach-the-end> replaced by anything found inside (<nothing-like-a-&>*) == $1. In this case because of the use of * , it means 0 or more times, sometimes the capturing group $1 will be empty.
NOT EQUAL TO part:
The regex uses a [^], in simple terms [] represents a group of independent characters, example: [ab] or [ba] represents the same, it will always look for "a" or "b". Inside this you can also look for ranges like 0 to 9 like this [0-9ba], it will always match anything from 0 to 9, a or b.
The "^" here [^] represents a negation of the content, so, it will match anything not in this group, like [^0-9] will always match anything that is not a number. In your regex [^&] it was used for looking for anything that is not a "&"

Related

How to form regex to match everything up to a "("

In javascript, how can a regular expression be formed to match everything up to and NOT including an opening parenthesis "("?
example input:
"12(pm):00"
"12(am):))"
"8(am):00"
ive found /^(.*?)\(/ to be successful with the "up to" part, but the match returned includes the "("
In regex101.com, its says the first capturing group is what im looking for, is there a way to return only the captured group?
There are three ways to deal with this. The first is to restrict the characters you match to not include the parenthesis:
let match = "12(pm):00".match(/[^(]*/);
console.log(match[0]);
The second is to only get the part of the match you are interested in, using capture groups:
let match = "12(pm):00".match(/(.*?)\(/);
console.log(match[1]);
The third is to use lookahead to explicitly exclude the parenthesis from the match:
let match = "12(pm):00".match(/.*?(?=\()/);
console.log(match[0]);
As in OP, note the non-greedy modifier in the second and third case: it is necessary to restrict the quantifier in case there is another open parenthesis further inside the string. This is not necessary in the first place, since the quantifier is explicitly forbidden to gobble up the parenthesis.
Try
^\d+
^ asserts position at start of a line
\d matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
https://regex101.com/r/C9XNT4/1

JavaScript - making my regular expression work

I have these 2 expressions:
1: [^a-zA-Z0-9]
2: [^a-zA-Z]
The first one must be used whenever my string starts with data- and the second one if it doesn't. However, I need this built-in into my regular expression (so using .slice(0, 5) == "data-" is no option for this situation).
Is it possible to do this inlined (so by just having to use 1 regular expression)? Or do I first have to validate (if string starts with data-) and then use the correct expression?
Some examples:
data-attribute#!#!19 => data-attribute19
data-attribute17 => data-attribute17
attribute19 => attribute
attribute1#!#!##183 => attribute
You can do something a bit like this:
/^(data-[a-zA-Z0-9]+).+?(\d*)$|^([a-zA-Z]+).+$/
Which will match what you want, and then return the results inside either one or two capture groups (depending on which option it matches).
Breaking it Down
Going from left to right:
The ^ character means "beginning of line" - in this case, the beginning of a single string.
The parentheses () indicate a capture group - some substring that you want to capture and output separately from your main match string.
data- indicates the literal string "data-", with the hyphen at the end.
[a-zA-Z0-9]+ is a character class, repeated one or more times.
.+? is one or more of any characters, matched lazily - meaning it will "give up" some of its match to enable the next token to match as much as possible.
\d* means zero or more digits (equivalent to [0-9]*).
The $ character means "match the end of the line" (again, in this case, the end of your string).
The | character means "alternate" - basically, it will match either the pattern on the left or the pattern on the right, enabling this single regex to match either of your two strings.
str.replace('/[#!#]/', '')
str.match('/^data-(.+)$/') // Contains true or false
This should do the trick.
First we remove every special chars (you can add your own.)
[abc] is a class of characters, wich says to JavaScript : match any of the characters between square brackets
Then we test if it matches with data-attribute
^ and $ match beginning and end of the input (it can't start or end with a space or any other character)
() catches the characters inside them. You can access what was catched with RegExp.$1-9
. means any characters, excepts line terminators.
+ is a quantifier for 1 time or more. It is the same as {1,}.
You just have now to test if it matches with the input. If it matches the attribute starts with data-

Grab full regex word if pattern inside it matches

How do I retrieve an entire word that has a specific portion of it that matches a regex?
For example, I have the below text.
Using ^.[\.\?\!:;,]{2,} , I match the first 3, but not the last. The last should be matched as well, but $ doesn't seem to produce anything.
a!!!!!!
n.......
c..,;,;,,
huhuhu..
I want to get all strings that have an occurrence of certain characters equal to or more than twice. I produced the aforementioned regex, but on Rubular it only matches the characters themselves, not the entire string. Using ^ and $
I've read a few stackoverflow posts similar, but not quite what I'm looking for.
Change your regex to:
/^.*[.?!:;,]{2,}/gm
i.e. match 0 more character before 2 of those special characters.
RegEx Demo
If I understand well you are trying to match an entire string that contains at least the same punctuation character two times:
^.*?([.?!:;,])\1.*
Note: if your string has newline characters, change .* to [\s\S]*
The trick is here:
([.?!:;,]) # captures the punct character in group 1
\1 # refers to the character captured in group 1

what's the meaning of the below regex in javascript

data.replace(/(.*)/g, '$1')
I encountered the above in smashing nodejs, can someone quickly explain this syntax? I'm new to Regex.
. means match characters except new line.
* matches 0 or more of the preceeding token. This is a greedy match, and will match as many characters as possible before satisfying the next token.
$1 refers to the matched group.
g modifier means global, which in turn means,
"don't stop at the first match. Continue to match even after that"
Basically what it is doing is capturing every character into a group until it encounters a \n(newline) and replacing it with the same.
There is no change in this operation and you should avoid doing this.
. can be any character, except the newline character, and * quantifier means that . can be matched 0 to unlimited times. So, it matches all the characters in the data. The parenthesis around .*, group all the matched characters into a group and $1 refers to the first captured group. So, we basically match all the characters and replace that with the matched characters.
It is similar to doing
str.replace(str1, str1)
You found it in "Smashing Node.js". I tried and found it too. There is the code: data.replace(/(.*)/g, ' $1') there. Please notice the two leading spaces before $1. It makes the indentation of the whole text.
.* matches the whole line,
replaces it with " " + the same line,
repeats it until eof because g modifier is there

Help interpreting a javascript Regex

I have found the following expression which is intended to modify the id of a cloned html element e.g. change contactDetails[0] to contactDetails[1]:
var nel = 1;
var s = $(this).attr(attribute);
s.replace(/([^\[]+)\[0\]/, "$1["+nel+"]");
$(this).attr(attribute, s);
I am not terribly familiar with regex, but have tried to interpret it and with the help of The Regex Coach however I am still struggling. It appears that ([^\[]+) matches one or more characters which are not '[' and \[0\]/ matches [0]. The / in the middle I interpret as an 'include both', so I don't understand why the author has even included the first expression.
I dont understand what the $1 in the replace string is and if I use the Regex Coach replace functionality if I simply use [0] as the search and 1 as the replace I get the correct result, however if I change the javascript to s.replace(/\[0\]/, "["+nel+"]"); the string s remains unchanged.
I would be grateful for any advice as to what the original author intended and help in finding a solution which will successfully replace the a number in square brackets anywhere within a search string.
Find
/ # Signifies the start of a regex expression like " for a string
([^\[]+) # Capture the character that isn't [ 1 or more times into $1
\[0\] # Find [0]
/ # Signifies the end of a regex expression
Replace
"$1[" # Insert the item captured above And [
+nel+ # New index
"]" # Close with ]
To create an expression that captures any digit, you can replace the 0 with \d+ which will match a digit 1 or more times.
s.replace(/([^\[]+)\[\d+\]/, "$1["+nel+"]");
The $1 is a backreference to the first group in the regex. Groups are the pieces inside (). So, in this case $1 will be replaced by whatever the ([^\[]+) part matched.
If the string was contactDetails[0] the resulting string would be contactDetails[1].
Note that this regex only replaces 0s inside square brackets. If you want to replace any number you will need something like:
([^\[]+)\[\d+\]
The \d matches any digit character. \d+ then becomes any sequence of at least one digit.
But your code will still not work, because Javascript strings are immutable. That means they can't be changed once created. The replace method returns a new string, instead of changing the original one. You should use:
s = s.replace(...)
looks like it replaces arrays of 0 with 1.
For example: array[0] goes to array[1]
Explanation:
([^[]+) - This part means save everything that is not a [ into variable $1
[0]/ - This part limits Part 1 to save everything up to a [0]
"$1["+nel+"]" - Print out the contents of $1 (loaded from part 1) and add the brackets with the value of nel. (in your example nel = 1)
Square braces define a set of characters to match. [abc] will match the letters a, b or c.
By adding the carat you are now specifying that you want characters not in the set. [^abc] will match any character that is not an a, b or c.
Because square braces have special meaning in RegExps you need to escape them with a slash if you want to match one. [ starts a character set, \[ matches a brace. (Same concept for closing braces.)
So, [^\[]+ captures 1 or more characters that are not [.
Wrapping that in parenthesis "captures" the matched portion of the string (in this case "contactDetails" so that you can use it in the replacement.
$1 uses the "captured" string (i.e. "contactDetails") in the replacement string.
This regex matches "something" followed by a [0].
"something" is identified by the expression [^\[]+ which matches all charactes that are not a [. You can see the () around this expression, because the match is reused with $1, later. The rest of your regex - that is \[0\] just matches the index [0]. The author had to write \[ and \] because [ and ] are special charactes for regular expressions and have to be escaped.
$1 is a reference to the value of the first paranthesis pair. In your case the value of
[^\[]+
which matches one or more characters which are not a '['
The remaining part of the regexp matches string '[0]'.
So if s is 'foobar[0]' the result will be 'foobar[1]'.
[^\[] will match any character that is not [, the '+' means one or more times. So [^[]+ will match contactDetails. The brackets will capture this for later use. The '\' is an escape symbol so the end \[0\] will match [0]. The replace string will use $1 which is what was captured in the brackets and add the new index.
Your interpretation of the regular expression is correct. It is intended to match one or more characters which are not [, followed by a literal [0]. And used in the replace method, the match would be replaced with the match of the first grouping (that’s what $1 is replaced with) together with the sequence [ followed by the value of nel and ] (that’s how "$1["+nel+"]" is to be interpreted).
And again, a simple s.replace(/\[0\]/, "["+nel+"]") does the same. Except if there is nothing in front of [0], because in that case the first regex wouldn’t find a match.

Categories