Remove beginning of string where characters do not match - javascript

Say I have this url:
git+https://github.com/ORESoftware/npp.git
I want to remove the first characters that do not match "http". I also want to remove the .git, but not sure how to do that reliably.
So I am looking to get this string:
https://github.com/ORESoftware/npp
as a total side conversation, not sure how that url differs from:
www.github.com/ORESoftware/npp

You could try this:
let s = 'git+https://github.com/ORESoftware/npp.git';
console.log(s.replace(/^.*?(http.*?)\.git$/, '$1'))
Output:
https://github.com/ORESoftware/npp
This regex works as follows:
^.*? is a non-greedy match from the beginning of the string until the next element which does match, in this case the (http.*?) capturing group.
(http.*?) is a capturing group which captures everything from the http until the next match (since .*? is again non-greedy)
\.git$ matches a trailing .git on the string.
The replacement string $1 replaces the contents of the original string with only the contents of the capturing group. In this case that is everything from http until the last character before .git.

Related

Regex to Extract Last Part of URL and Before Another Character

In the URLs
https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e--wedding-vendors-wedding-receptions.jpg
https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e.jpg
I'm trying to capture 5b026cdb06921e7ca5f7a24aff46512e in both of these strings. The string will always happen after the last slash, it will be a random assortment of letters and numbers, it may or may not have --randomtext appended, and it will have .jpg at the end.
I currently have ([^\/]+)$ to extract any string after the last slash, but would like to know how to capture everything before .jpg and --randomtext(if present). I will be using this in javascript.
If what is after the last forward slash is a random assortment of letters and numbers a-z0-9, on option is to use a capturing group.
^.*\/([a-z0-9]+).*\.jpg$
In parts
^ Start of string
.*\/ Match until including the last /
([a-z0-9]+) Capture in group 1 matching 1+ chars a-z or digits 0-9
.* Match any char except a newline 0+ times
\.jpg Match .jpg
$ End of string
Regex demo
const regex = /^.*\/([a-z0-9]+).*\.jpg$/;
["https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e--wedding-vendors-wedding-receptions.jpg",
"https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e.jpg"
].forEach(s => console.log(s.match(regex)[1]));
You can split by / and take the last part, and then replace anything after -- or .jpg from end with empty string
let arr = ["https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e--wedding-vendors-wedding-receptions.jpg","https://image/4x/c1/abc/5b026cdb06921e7ca5f7a24aff46512e.jpg"]
let getText = (url) =>{
return url.split('/').pop().replace(/(--.*|\.jpg)$/g,'')
}
arr.forEach(url=> console.log(getText(url)))
If there are chances to have -- more than one time than instead of replacing you can simply match match(/^[a-z0-9]+/g) and take the first element from matched array
Use:
([^\/]*?)(?:--.*)?\.jpg$
and your desired match will be in $1
https://regex101.com/r/gZ9kSi/1

Remove Last Instance Of Character From String - Javascript - Revisited

According to the accepted answer from this question, the following is the syntax for removing the last instance of a certain character from a string (In this case I want to remove the last &):
function remove (string) {
string = string.replace(/&([^&]*)$/, '$1');
return string;
}
console.log(remove("height=74&width=12&"));
But I'm trying to fully understand why it works.
According to regex101.com,
/&([^&]*)$/
& matches the character & literally (case sensitive)
1st Capturing Group ([^&]*)
Match a single character not present in the list below [^&]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
& matches the character & literally (case sensitive)
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
So if we're matching the character & literally with the first &:
Then why are we also "matching a single character not present in the following list"?
Seems counter productive.
And then, "$ asserts position at the end of the string" - what does this mean? That it starts searching for matches from the back of the string first?
And finally, what is the $1 doing in the replaceValue? Why is it $1 instead of an empty string? ""
1- The solution for that problem I think is different to the solution you want:
That regex will replace the last "&" no matter where it is, in the middle or in the end of the string.
If you apply this regex to this two examples you will see that the first get "incorrectly" replaced:
height=74&width=12&test=1
height=74&width=12&test=1&
They get replaced as :
height=74&width=12test=1
height=74&width=12&test=1
So to really replace the last "&" the only thing you need to do is :
string.replace(/&$/, '');
Now, if you want to replace the last ocurrence of "&" no matter where it is, I will explain that regex :
$1 Represents a (capturing group), everything inside those ([^&]*) are captured inside that $1. This is a oversimplification.
&([^&]*)$
& Will match a literal "&" then in the following capturing group this regex will look for any amount (0 to infinite) of characters (NOT EQUAL TO "&", explained latter) until the end of the string or line (Depending on the flag you use in the regex, /m for matching lines ). Anything captured in this capturing group will go to $1 when you apply the replacement.
So, If you apply this logic in your mind you will see that it will always match the last & and replace it with anything on its right that does not contain a single "&""
&(<nothing-like-a-&>*)<until-we-reach-the-end> replaced by anything found inside (<nothing-like-a-&>*) == $1. In this case because of the use of * , it means 0 or more times, sometimes the capturing group $1 will be empty.
NOT EQUAL TO part:
The regex uses a [^], in simple terms [] represents a group of independent characters, example: [ab] or [ba] represents the same, it will always look for "a" or "b". Inside this you can also look for ranges like 0 to 9 like this [0-9ba], it will always match anything from 0 to 9, a or b.
The "^" here [^] represents a negation of the content, so, it will match anything not in this group, like [^0-9] will always match anything that is not a number. In your regex [^&] it was used for looking for anything that is not a "&"

java-script Regex filtering on words

I have the following Regex:
The regex is in a bit of code in our app, I can see it splits words. It obviously removes characters such as $#* and so on. I need it to do the same thing exactly but allow the a hash tag, since the words can now have #hashtags.
"Test #words".toLowerCase().split(/\b/).filter(function(w){return w.match(/^\w+$/) }) // returns ["test", "words"]
The current Regex removes the hash, i want it to remain. So i get:
["test", "#words"]
Your "Test #words".toLowerCase().split(/\b/).filter(function(w){return w.match(/^\w+$/) }) does the following:
The whole string is turned to lower case
The string is split at any word boundary (leading and trailing, meaning Test #words is split into [,Test, #,words,])
The parts that match the ^\w+$ regex (1+ word chars from the start till end of string) are kept in the array.
You may use an identical matching approach to also include # with /(?:\B#)?\w+/g:
console.log("Test #words".toLowerCase().match(/(?:\B#)?\w+/g))
The pattern matches:
(?:\B#)? - an optional # preceded with a non-word boundary
\w+ - 1 or more word chars (from [a-zA-Z0-9_] ranges)
If context is not so important, use a simpler /#?\w+/g regex that will match an optional # anywhere in the string, followed with 1+ word chars.
Just add optional # at the beginning of the regexp to support #hashtags.
"Test #words".toLowerCase().match(/#?\w+/g);

Regex: How do I remove the character BEFORE the matched string?

I am intercepting messages which contain the following characters:
*_-
However, whenever any one of these characters comes through, it will always be preceded by a \. The \ is just for formatting though and I want to remove it before sending it off to my server. I know how to easily create a regex which would remove this backslash from a single letter:
'omg\_bbq\_everywhere'.replace(/\\_/g, '')
And I recognize I could just do this operation 3 times: once for each character I want to remove the preceding backslash for. But how can I create a single regex which would detect all three characters and remove the preceding backslash in all 3 cases?
You can use a character class like [*_-].
To remove only the backslash before these characters:
document.body.innerHTML =
"omg\\-bbq\\*everywhere\\-".replace(/\\([*_-])/g, '$1');
When you place a subpattern into a capturing group ((...)), you capture that subtext into a numbered buffer, and then you can reference it with a $1 backreference (1 because there is only one (...) in the pattern.)
This is a good time to use atomic matching. Specifically you want to check for the slash and then positive lookahead for any of those characters.
Ignoring the code, the raw regex you want is:
\\(?=[*_-])
A literal backslash, with one of these characters in front of it: *_-
So now you are matching the slash. The atomic match is a 0 length match, so it doesn't match anything, but sets a requirement that "for this to be a valid match, it needs to be followed by [*_-]"
Atomic groups: http://www.regular-expressions.info/atomic.html
Lookaround statements: http://www.regular-expressions.info/lookaround.html
Positive and negative lookahead and lookbehind matches are available.

Help interpreting a javascript Regex

I have found the following expression which is intended to modify the id of a cloned html element e.g. change contactDetails[0] to contactDetails[1]:
var nel = 1;
var s = $(this).attr(attribute);
s.replace(/([^\[]+)\[0\]/, "$1["+nel+"]");
$(this).attr(attribute, s);
I am not terribly familiar with regex, but have tried to interpret it and with the help of The Regex Coach however I am still struggling. It appears that ([^\[]+) matches one or more characters which are not '[' and \[0\]/ matches [0]. The / in the middle I interpret as an 'include both', so I don't understand why the author has even included the first expression.
I dont understand what the $1 in the replace string is and if I use the Regex Coach replace functionality if I simply use [0] as the search and 1 as the replace I get the correct result, however if I change the javascript to s.replace(/\[0\]/, "["+nel+"]"); the string s remains unchanged.
I would be grateful for any advice as to what the original author intended and help in finding a solution which will successfully replace the a number in square brackets anywhere within a search string.
Find
/ # Signifies the start of a regex expression like " for a string
([^\[]+) # Capture the character that isn't [ 1 or more times into $1
\[0\] # Find [0]
/ # Signifies the end of a regex expression
Replace
"$1[" # Insert the item captured above And [
+nel+ # New index
"]" # Close with ]
To create an expression that captures any digit, you can replace the 0 with \d+ which will match a digit 1 or more times.
s.replace(/([^\[]+)\[\d+\]/, "$1["+nel+"]");
The $1 is a backreference to the first group in the regex. Groups are the pieces inside (). So, in this case $1 will be replaced by whatever the ([^\[]+) part matched.
If the string was contactDetails[0] the resulting string would be contactDetails[1].
Note that this regex only replaces 0s inside square brackets. If you want to replace any number you will need something like:
([^\[]+)\[\d+\]
The \d matches any digit character. \d+ then becomes any sequence of at least one digit.
But your code will still not work, because Javascript strings are immutable. That means they can't be changed once created. The replace method returns a new string, instead of changing the original one. You should use:
s = s.replace(...)
looks like it replaces arrays of 0 with 1.
For example: array[0] goes to array[1]
Explanation:
([^[]+) - This part means save everything that is not a [ into variable $1
[0]/ - This part limits Part 1 to save everything up to a [0]
"$1["+nel+"]" - Print out the contents of $1 (loaded from part 1) and add the brackets with the value of nel. (in your example nel = 1)
Square braces define a set of characters to match. [abc] will match the letters a, b or c.
By adding the carat you are now specifying that you want characters not in the set. [^abc] will match any character that is not an a, b or c.
Because square braces have special meaning in RegExps you need to escape them with a slash if you want to match one. [ starts a character set, \[ matches a brace. (Same concept for closing braces.)
So, [^\[]+ captures 1 or more characters that are not [.
Wrapping that in parenthesis "captures" the matched portion of the string (in this case "contactDetails" so that you can use it in the replacement.
$1 uses the "captured" string (i.e. "contactDetails") in the replacement string.
This regex matches "something" followed by a [0].
"something" is identified by the expression [^\[]+ which matches all charactes that are not a [. You can see the () around this expression, because the match is reused with $1, later. The rest of your regex - that is \[0\] just matches the index [0]. The author had to write \[ and \] because [ and ] are special charactes for regular expressions and have to be escaped.
$1 is a reference to the value of the first paranthesis pair. In your case the value of
[^\[]+
which matches one or more characters which are not a '['
The remaining part of the regexp matches string '[0]'.
So if s is 'foobar[0]' the result will be 'foobar[1]'.
[^\[] will match any character that is not [, the '+' means one or more times. So [^[]+ will match contactDetails. The brackets will capture this for later use. The '\' is an escape symbol so the end \[0\] will match [0]. The replace string will use $1 which is what was captured in the brackets and add the new index.
Your interpretation of the regular expression is correct. It is intended to match one or more characters which are not [, followed by a literal [0]. And used in the replace method, the match would be replaced with the match of the first grouping (that’s what $1 is replaced with) together with the sequence [ followed by the value of nel and ] (that’s how "$1["+nel+"]" is to be interpreted).
And again, a simple s.replace(/\[0\]/, "["+nel+"]") does the same. Except if there is nothing in front of [0], because in that case the first regex wouldn’t find a match.

Categories