Regex Positive Lookbehind on url segment - javascript

I am parsing a number from a URL string. The URL looks like:
https://www.myapi.com/player/?url=https%3A//myapi.com/users/11468859&color=788b78&auto_play=false&show_artwork=false
I would like to match the number between 'users/' and '&'. In this case '11468859'. So I using a positive lookahead and lookbehind to accomplish this.
This is what I have so far:
(?<=users/)([0-9]*?)(?=\&)
This doesn't match anything. My lookbehind is wrong. So if I omit the lookbehind I can match on users/11468859
([0-9]*?)(?=\&) matches >> 'users/11468859'
How do I correctly create a positive lookbehind to match on users/?
Thanks!

Putting aside your lookbehind question for a moment, this regex works:
users/([0-9]+)
Debuggex Demo
The id is in capture group one.
In debuggex your lookbehind works fine but not in JavaScript:
(?<=users/)([0-9]*?)(?=\&)
Debuggex Demo
(You could also get away with just
(?<=users/)([0-9]*)
Debuggex Demo
since [0-9]* is greedy.)
However, as you're using JavaScript, I recommend the regex at the top of my answer.

If you're certain that the desired segment will be a series of integers immediately after user/, you don't need the look ahead. Also, I would recommend escaping any sort of slash: \/
(?<=users\/)([0-9]*?)
Also, you don't need to tell the regex not to be greedy unless you know it will run into other numbers, and I would consider telling the regex that there must be numbers so it won't match if they are missing:
thus
([0-9]*?)
becomes
(\d+)

There are a couple of approaches avaiable in most languages. To match a number use the positive look ahead fromat (?<=STUFF). To match numbers try \d+ or [0-9]+. Each of the following lines work. The second includes a positive look ahead for including letters in an id but will fail if the ampersand is moved.
(?<=users.)\d+
(?<=users.).*?(?=&)
(?<=users.)[0-9]+
For more information: http://myregextester.com/index.php#highlighttab

How do I correctly create a positive lookbehind to match on users/?
You don't, because JavaScript does not support lookbehinds:
From javascript regex - look behind alternative?:
Javascript doesn't have regex lookbehind.
http://regexadvice.com/forums/thread/58678.aspx:
The JavaScript regex engine does not support look-behinds
As an alternative, you can capture the number like this:
users\/(.*?)\&
And just access the first capturing group. Explanation and demonstration: http://regex101.com/r/aZ3bL0

try
string = "https://www.myapi.com/player/?url=https%3A//myapi.com/users/11468859&color=788b78&auto_play=false&show_artwork=false"
regex = /users.([\d]*)/;
arr = regex.exec(a);
result = arr[1];

Related

JS Regexp to exclude forward slash after .com in url

I have this URL for e.g https://www.example.com/filters/test.jpg and in JS, I want to retrieve this part: filters/test.jpg.
I am using match() but the element of the first position of match is /filters/test.jpg.
This is my regexp:/(?!com)\/((\w+)\/(.*))/
What am I missing to remove the forward slash / from the match array?
If your interest is in regex itself rather than just the result, how about this expression?
(?<=.+\.com\/).+
This uses a positive lookbehind and will give you everything after any amount of text ending in ".com/". Note my use of escape slashes for the period and the forward slash. If you want more specificity, you can do the same thing with the word group and second slash in your original regex:
(?<=.com\/)((\w+)\/(.*))
UPDATE: As requested, a note on negative vs. positive lookahead/lookbehind: lookahead instructs the query to "look for X, but match only if followed by Y." Negative lookahead "look for X, but match only if not followed by Y." In your case, you want a lookbehind because that will "look for X, but match only if preceded by Y." A negative lookbehind, which you were trying, allows to match a pattern only if there isn't something before it, so doing this in your case would be a mistake. For more information, see https://javascript.info/regexp-lookahead-lookbehind
If your goal is just to get the result, I think using the URL object in javascript (as in the comment) is actually better than regex because it's more tuned to the specific problem. See https://dev.to/attacomsian/introduction-to-javascript-url-object-27hn.
If code for new JS engines /(?<=\/)(\w+)\/.*/
If code for old JS engines /\b(?!(?:com|net|org|uk)\/)(\w+)\/.*/
Best way though is store array using /\/((\w+)\/.*)/

Javascript RegEx positive lookahead not working as expected

First of all i am not very good in dealing with regex But I am trying to create a regex to match specific string while replace it by skipping first character of matcher string using positive look ahead. Please see detail below
Test String asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka
RegEx (?=[^\w])df\.
Replacement String kkk.
Expected Result asdf.wakawaka asdf.waka kkk.waka [kkk.waka (kkk.waka _df.waka {df,waka
But regex above dos not found any match thus it replaces nothing as a result and give original test string in the result.
Without positive lookahead (skip first character strategy) it matches my requirement. see matching regex sample on regex101.com
With positive lookahead giving unexpected results regex with positive look aheah on regex101.com
Thanks in advance for any help.
Using [^\w] means you want to match and consume a char other than a word char before the match you need to replace.
However, this char is consumed and you cannot restore it without capturing it first. You might use Gurman's approach to match /(^|\W)df\./g and replace with '$1kkk., but you may also use a word boundary:
\bdf\.
See the regex demo
JS demo:
var s = "asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka";
console.log(
s.replace(/\bdf\./g, 'kkk.')
);
However, if you do not want to replace df. at the start of the string, use
var s = "df. asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka";
console.log(
s.replace(/(\W)df\./g, '$1kkk.')
);

How to match all words starting with dollar sign but not slash dollar

I want to match all words which are starting with dollar sign but not slash and dollar sign.
I already try few regex.
(?:(?!\\)\$\w+)
\\(\\?\$\w+)\b
String
$10<i class="">$i01d</i>\$id
Expected result
*$10*
*$i01d*
but not this
*$id*
After find all expected matching word i want to replace this my object.
One option is to eliminate escape sequences first, and then match the cleaned-up string:
s = String.raw`$10<i class="">$i01d</i>\$id`
found = s.replace(/\\./g, '').match(/\$\w+/g)
console.log(found)
The big problem here is that you need a negative lookbehind, however, JavaScript does not support it. It's possible to emulate it crudely, but I will offer an alternative which, while not great, will work:
var input = '$10<i class="">$i01d</i>\\$id';
var regex = /\b\w+\b\$(?!\\)/g;
//sample implementation of a string reversal function. There are better implementations out there
function reverseString(string) {
return string.split("").reverse().join("");
}
var reverseInput = reverseString(input);
var matches = reverseInput
.match(regex)
.map(reverseString);
console.log(matches);
It is not elegant but it will do the job. Here is how it works:
JavaScript does support a lookahead expression ((?>)) and a negative lookahead ((?!)). Since this is the reverse of of a negative lookbehind, you can reverse the string and reverse the regex, which will match exactly what you want. Since all the matches are going to be in reverse, you need to also reverse them back to the original.
It is not elegant, as I said, since it does a lot of string manipulations but it does produce exactly what you want.
See this in action on Regex101
Regex explanation Normally, the "match x long as it's not preceded by y" will be expressed as (?<!y)x, so in your case, the regex will be
/(?<!\\)\$\b\w+\b/g
demonstration (not JavaScript)
where
(?<!\\) //do not match a preceding "\"
\$ //match literal "$"
\b //word boundary
\w+ //one or more word characters
\b //second word boundary, hence making the match a word
When the input is reversed, so do all the tokens in order to match. Furthermore, the negative lookbehind gets inverted into a negative lookahead of the form x(?!y) so the new regular expression is
/\b\w+\b\$(?!\\)/g;
This is more difficult than it appears at first blush. How like Regular Expressions!
If you have look-behind available, you can try:
/(?<!\\)\$\w+/g
This is NOT available in JS. Alternatively, you could specify a boundary that you know exists and use a capture group like:
/\s(\$\w+)/g
Unfortunately, you cannot rely on word boundaries via /b because there's no such boundary before '\'.
Also, this is a cool site for testing your regex expressions. And this explains the word boundary anchor.
If you're using a language that supports negative lookback assertions you can use something like this.
(?<!\\)\$\w+
I think this is the cleanest approach, but unfortunately it's not supported by all languages.
This is a hackier implementation that may work as well.
(?:(^\$\w+)|[^\\](\$\w+))
This matches either
A literal $ at the beginning of a line followed by multiple word characters. Or...
A literal $ this is preceded by any character except a backslash.
Here is a working example.

Regular expression match specific key words

I am trying to use regexp to match some specific key words.
For those codes as below, I'd like to only match those IFs at first and second line, which have no prefix and postfix. The regexp I am using now is \b(IF|ELSE)\b, and it will give me all the IFs back.
IF A > B THEN STOP
IF B < C THEN STOP
LOL.IF
IF.LOL
IF.ELSE
Thanks for any help in advance.
And I am using http://regexr.com/ for test.
Need to work with JS.
I'm guessing this is what you're looking for, assuming you've added the m flag for multiline:
(?:^|\s)(IF|ELSE)(?:$|\s)
It's comprised of three groups:
(?:^|\s) - Matches either the beginning of the line, or a single space character
(IF|ELSE) - Matches one of your keywords
(?:$|\s) - Matches either the end of the line, or a single space character.
Regexr
you can do it with lookaround (lookahead + lookbehind). this is what you really want as it explicitly matches what you are searching. you don't want to check for other characters like string start or whitespaces around the match but exactly match "IF or ELSE not surrounded by dots"
/(?<!\.)(IF|ELSE)(?!\.)/g
explanation:
use the g-flag to find all occurrences
(?<!X)Y is a negative lookbehind which matches a Y not preceeded by an X
Y(?!X) is a negative lookahead which matches a Y not followed by an X
working example: https://regex101.com/r/oS2dZ6/1
PS: if you don't have to write regex for JS better use a tool which supports the posix standard like regex101.com

Javascript regex: is there anyway to write a regex which gives true if backreference is NOT matched

so here is my problem: I'm checking an input of 2 years with a hyphen. Like:
2001-2015
To test this, I use the simple regex
/^([0-9]{4})-([0-9]{4})$/
I know groups aren't needed, and (19|20)[0-9]{2}, is a closer match to the basic year exp, but bear with me.
Now, if my requirement was to match the two years only if they are the same, i could have used a backreference like:
/^([0-9]{4})-\1$/
which matches 2000-2000 but not 2000-2014
My actual requirement is exactly the opposite. I want it to match if the years are different but not if they're same. That is, 2000-2014 should match. 2000-2000 should not.
And using the negative of the boolean I find is not an option. I need this for a huuuge regex which is supposed to match a whole lot of different date formats. This is just a part of it.
Is there any way to achieve this?
You can use a negative lookahead to achieve this:
^([0-9]{4})-(?!\1)[0-9]{4}$
Demo
This is almost the same pattern, except it inserts a condition check using the backreference.
(?!\1) will fail if \1 matches at its position.
You can use negative lookahead:
\b(\d{4})-(?!\1)\d{4}\b
RegEx Demo
Use Negative Lookahead.
Like this :
^([0-9]{4})-(?!\1)[0-9]{4}$
It does work on your example.
Explanation : (?!\1) Assert that it is impossible to match the regex \1. Then you just put your 4 digits requirement.

Categories