Javascript RegEx positive lookahead not working as expected - javascript

First of all i am not very good in dealing with regex But I am trying to create a regex to match specific string while replace it by skipping first character of matcher string using positive look ahead. Please see detail below
Test String asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka
RegEx (?=[^\w])df\.
Replacement String kkk.
Expected Result asdf.wakawaka asdf.waka kkk.waka [kkk.waka (kkk.waka _df.waka {df,waka
But regex above dos not found any match thus it replaces nothing as a result and give original test string in the result.
Without positive lookahead (skip first character strategy) it matches my requirement. see matching regex sample on regex101.com
With positive lookahead giving unexpected results regex with positive look aheah on regex101.com
Thanks in advance for any help.

Using [^\w] means you want to match and consume a char other than a word char before the match you need to replace.
However, this char is consumed and you cannot restore it without capturing it first. You might use Gurman's approach to match /(^|\W)df\./g and replace with '$1kkk., but you may also use a word boundary:
\bdf\.
See the regex demo
JS demo:
var s = "asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka";
console.log(
s.replace(/\bdf\./g, 'kkk.')
);
However, if you do not want to replace df. at the start of the string, use
var s = "df. asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka";
console.log(
s.replace(/(\W)df\./g, '$1kkk.')
);

Related

Regex replace not removing characters properly

I have the regular expression:
const regex = /^\d*\.?\d{0,2}$/
and its inverse (I believe) of
const inverse = /^(?!\d*\.?\d{0,2}$)/
The first regex is validating the string fits any positive number, allowing a decimal and two decimal digits (e.g. 150, 14., 7.4, 12.68). The second regex is the inverse of the first, and doing some testing I'm fairly confident it's giving the expected result, as it only validates when the string is anything but a number that may have a decimal and two digits after (e.g. 12..05, a5, 54.357).
My goal is to remove any characters from the string that do not fit the first regex. I thought I could do that this way:
let myString = '123M.45';
let fixed = myString.replace(inverse, '');
But this does not work as intended. To debug, I tried having the replace character changed to something I would be able to see:
let fixed = myString.replace(inverse, 'ZZZ');
When I do this, fixed becomes: ZZZ123M.45
Any help would be greatly appreciated.
I think I understand your logic here trying to find a regex that is the inverse of the regex that matches your valid string, in the hopes that it will allow you to remove any characters that make your string invalid and leave only the valid string. However, I don't think replace() will allow you to solve your problem in this way. From the MDN docs:
The replace() method returns a new string with some or all matches of a pattern replaced by a replacement.
In your inverse pattern you are using a negative lookahead. If we take a simple example of X(?!Y) we can think of this as "match X if not followed by Y". In your pattern your "X" is ^ and your "Y" is \d*\.?\d{0,2}$. From my understanding, the reason you are getting ZZZ123M.45 is that it is finding the first ^ (i.e, the start of the string) that is not followed by your pattern \d*\.?\d{0,2}$, and since 123M.45 doesn't match your "Y" pattern, your negative lookahead is satisfied and the beginning of your string is matched and "replaced" with ZZZ.
That (I think) is an explanation of what you are seeing.
I would propose an alternative solution to your problem that better fits with how I understand the .replace() method. Instead of your inverse pattern, try this one:
const invalidChars = /[^\d\.]|\.(?=\.)|(?<=\.\d\d)\d*/g
const myString = '123M..456444';
const fixed = myString.replace(invalidChars, '');
Here I am using a pattern that I think will match the individual characters that you want to remove. Let's break down what this one is doing:
[^\d\.]: match characters that are not digits
\.(?=\.): match . character if it is followed by another . character.
(?<=\.\d\d)\d*: match digits that are preceded by a decimal and 2 digits
Then I join all these with ORs (|) so it will match any one of the above patterns, and I use the g flag so that it will replace all the matches, not just the first one.
I am not sure if this will cover all your use cases, but I thought I would give it a shot. Here's a link to a breakdown that might be more helpful than mine, and you can use this tool to tweak the pattern if necessary.
I don't think you can do this
remove any characters from the string that do not fit the first regex
Because regex matching is meant for the entire string, and replace is used to replace just a PART inside that string. So the Regex inside replace must be a Regex to match unwanted characters only, not inverted Regex.
What you could do is to validate the string with your original regex, then if it's not valid, replace and validate again.
//if (notValid), replace unwanted character
// replace everything that's not a dot or digit
const replaceRegex = /[^\d.]/g; // notice g flag here to match every occurrence
const myString = '123M.45';
const fixed = myString.replace(replaceRegex, '');
console.log(fixed)
// validate again

How to replace string between two string with the same length

I have an input string like this:
ABCDEFG[HIJKLMN]OPQRSTUVWXYZ
How can I replace each character in the string between the [] with an X (resulting in the same number of Xs as there were characters)?
For example, with the input above, I would like an output of:
ABCDEFG[XXXXXXX]OPQRSTUVWXYZ
I am using JavaScript's RegEx for this and would prefer if answers could be an implementation that does this using JavaScript's RegEx Replace function.
I am new to RegEx so please explain what you do and (if possible) link articles to where I can get further help.
Using replace() and passing the match to a function as parameter, and then Array(m.length).join("X") to generate the X's needed:
var str = "ABCDEFG[HIJKLMN]OPQRSTUVWXYZ"
str = str.replace(/\[[A-Z]*\]/g,(m)=>"["+Array(m.length-1).join("X")+"]")
console.log(str);
We could use also .* instead of [A-Z] in the regex to match any character.
About regular expressions there are thousands of resources, specifically in JavaScript, you could see Regular Expressions MDN but the best way to learn, in my opinion, is practicing, I find regex101 useful.
const str="ABCDEFG[HIJKLMN]OPQRSTUVWXYZ";
const run=str=>str.replace(/\[.*]/,(a,b,c)=>c=a.replace(/[^\[\]]/g,x=>x="X"));
console.log(run(str));
The first pattern /\[.*]/ is to select letters inside bracket [] and the second pattern /[^\[\]]/ is to replace the letters to "X"
We can observe that every individual letter you wish to match is followed by a series of zero or more non-'[' characters, until a ']' is found. This is quite simple to express in JavaScript-friendly regex:
/[A-Z](?=[^\[]*\])/g
regex101 example
(?= ) is a "positive lookahead assertion"; it peeks ahead of the current matching point, without consuming characters, to verify its contents are matched. In this case, "[^[]*]" matches exactly what I described above.
Now you can substitute each [A-Z] matched with a single 'X'.
You can use the following solution to replace a string between two square brackets:
const rxp = /\[.*?\]/g;
"ABCDEFG[HIJKLMN]OPQRSTUVWXYZ".replace(rxp, (x) => {
return x.replace(rxp, "X".repeat(x.length)-2);
});

How to match all words starting with dollar sign but not slash dollar

I want to match all words which are starting with dollar sign but not slash and dollar sign.
I already try few regex.
(?:(?!\\)\$\w+)
\\(\\?\$\w+)\b
String
$10<i class="">$i01d</i>\$id
Expected result
*$10*
*$i01d*
but not this
*$id*
After find all expected matching word i want to replace this my object.
One option is to eliminate escape sequences first, and then match the cleaned-up string:
s = String.raw`$10<i class="">$i01d</i>\$id`
found = s.replace(/\\./g, '').match(/\$\w+/g)
console.log(found)
The big problem here is that you need a negative lookbehind, however, JavaScript does not support it. It's possible to emulate it crudely, but I will offer an alternative which, while not great, will work:
var input = '$10<i class="">$i01d</i>\\$id';
var regex = /\b\w+\b\$(?!\\)/g;
//sample implementation of a string reversal function. There are better implementations out there
function reverseString(string) {
return string.split("").reverse().join("");
}
var reverseInput = reverseString(input);
var matches = reverseInput
.match(regex)
.map(reverseString);
console.log(matches);
It is not elegant but it will do the job. Here is how it works:
JavaScript does support a lookahead expression ((?>)) and a negative lookahead ((?!)). Since this is the reverse of of a negative lookbehind, you can reverse the string and reverse the regex, which will match exactly what you want. Since all the matches are going to be in reverse, you need to also reverse them back to the original.
It is not elegant, as I said, since it does a lot of string manipulations but it does produce exactly what you want.
See this in action on Regex101
Regex explanation Normally, the "match x long as it's not preceded by y" will be expressed as (?<!y)x, so in your case, the regex will be
/(?<!\\)\$\b\w+\b/g
demonstration (not JavaScript)
where
(?<!\\) //do not match a preceding "\"
\$ //match literal "$"
\b //word boundary
\w+ //one or more word characters
\b //second word boundary, hence making the match a word
When the input is reversed, so do all the tokens in order to match. Furthermore, the negative lookbehind gets inverted into a negative lookahead of the form x(?!y) so the new regular expression is
/\b\w+\b\$(?!\\)/g;
This is more difficult than it appears at first blush. How like Regular Expressions!
If you have look-behind available, you can try:
/(?<!\\)\$\w+/g
This is NOT available in JS. Alternatively, you could specify a boundary that you know exists and use a capture group like:
/\s(\$\w+)/g
Unfortunately, you cannot rely on word boundaries via /b because there's no such boundary before '\'.
Also, this is a cool site for testing your regex expressions. And this explains the word boundary anchor.
If you're using a language that supports negative lookback assertions you can use something like this.
(?<!\\)\$\w+
I think this is the cleanest approach, but unfortunately it's not supported by all languages.
This is a hackier implementation that may work as well.
(?:(^\$\w+)|[^\\](\$\w+))
This matches either
A literal $ at the beginning of a line followed by multiple word characters. Or...
A literal $ this is preceded by any character except a backslash.
Here is a working example.

Regex exclude doesn't exclude string only first character

Firstly we have the following string:
aaa{ignoreme}asdebla bla f{}asdfdsaignoreme}asd
We want our regex to find the whitespaces and any special charsacters like {}, but if after { comes exactly ignoreme} then exclude it
This is where we are right now:
(?!{ignoreme})[\s\[\]{}()<>\\'"|^`]
The problem is that our regex finds the } after ignoreme
Here is the link https://regex101.com/r/bU1oG0/2
Any help is appreciated,
Thanks
The point is that the } is matched since your (?!{ignoreme}) lookahead only skips a { followed with ignoreme} and matches a } since it is not starting a {ignoreme} char sequence. Also, in JS, you cannot use a lookbehind, like (?<!{ignoreme)}.
This is a kind of issue that can be handled with a regex that matches what you do not need, and matches and captures what you need:
/{ignoreme}|([\s[\]{}()<>\\'"|^`])/g
See the regex demo
Now, {ignoreme} is matched (and you do not have to use this value) and ([\s[]{}()<>\\'"|^`]) is captured into Group 1 the value of which you need to use.

Regex Positive Lookbehind on url segment

I am parsing a number from a URL string. The URL looks like:
https://www.myapi.com/player/?url=https%3A//myapi.com/users/11468859&color=788b78&auto_play=false&show_artwork=false
I would like to match the number between 'users/' and '&'. In this case '11468859'. So I using a positive lookahead and lookbehind to accomplish this.
This is what I have so far:
(?<=users/)([0-9]*?)(?=\&)
This doesn't match anything. My lookbehind is wrong. So if I omit the lookbehind I can match on users/11468859
([0-9]*?)(?=\&) matches >> 'users/11468859'
How do I correctly create a positive lookbehind to match on users/?
Thanks!
Putting aside your lookbehind question for a moment, this regex works:
users/([0-9]+)
Debuggex Demo
The id is in capture group one.
In debuggex your lookbehind works fine but not in JavaScript:
(?<=users/)([0-9]*?)(?=\&)
Debuggex Demo
(You could also get away with just
(?<=users/)([0-9]*)
Debuggex Demo
since [0-9]* is greedy.)
However, as you're using JavaScript, I recommend the regex at the top of my answer.
If you're certain that the desired segment will be a series of integers immediately after user/, you don't need the look ahead. Also, I would recommend escaping any sort of slash: \/
(?<=users\/)([0-9]*?)
Also, you don't need to tell the regex not to be greedy unless you know it will run into other numbers, and I would consider telling the regex that there must be numbers so it won't match if they are missing:
thus
([0-9]*?)
becomes
(\d+)
There are a couple of approaches avaiable in most languages. To match a number use the positive look ahead fromat (?<=STUFF). To match numbers try \d+ or [0-9]+. Each of the following lines work. The second includes a positive look ahead for including letters in an id but will fail if the ampersand is moved.
(?<=users.)\d+
(?<=users.).*?(?=&)
(?<=users.)[0-9]+
For more information: http://myregextester.com/index.php#highlighttab
How do I correctly create a positive lookbehind to match on users/?
You don't, because JavaScript does not support lookbehinds:
From javascript regex - look behind alternative?:
Javascript doesn't have regex lookbehind.
http://regexadvice.com/forums/thread/58678.aspx:
The JavaScript regex engine does not support look-behinds
As an alternative, you can capture the number like this:
users\/(.*?)\&
And just access the first capturing group. Explanation and demonstration: http://regex101.com/r/aZ3bL0
try
string = "https://www.myapi.com/player/?url=https%3A//myapi.com/users/11468859&color=788b78&auto_play=false&show_artwork=false"
regex = /users.([\d]*)/;
arr = regex.exec(a);
result = arr[1];

Categories