Regex replace not removing characters properly

Regex replace not removing characters properly - javascript

I have the regular expression:
const regex = /^\d*\.?\d{0,2}$/
and its inverse (I believe) of
const inverse = /^(?!\d*\.?\d{0,2}$)/
The first regex is validating the string fits any positive number, allowing a decimal and two decimal digits (e.g. 150, 14., 7.4, 12.68). The second regex is the inverse of the first, and doing some testing I'm fairly confident it's giving the expected result, as it only validates when the string is anything but a number that may have a decimal and two digits after (e.g. 12..05, a5, 54.357).
My goal is to remove any characters from the string that do not fit the first regex. I thought I could do that this way:
let myString = '123M.45';
let fixed = myString.replace(inverse, '');
But this does not work as intended. To debug, I tried having the replace character changed to something I would be able to see:
let fixed = myString.replace(inverse, 'ZZZ');
When I do this, fixed becomes: ZZZ123M.45
Any help would be greatly appreciated.

I think I understand your logic here trying to find a regex that is the inverse of the regex that matches your valid string, in the hopes that it will allow you to remove any characters that make your string invalid and leave only the valid string. However, I don't think replace() will allow you to solve your problem in this way. From the MDN docs:
The replace() method returns a new string with some or all matches of a pattern replaced by a replacement.
In your inverse pattern you are using a negative lookahead. If we take a simple example of X(?!Y) we can think of this as "match X if not followed by Y". In your pattern your "X" is ^ and your "Y" is \d*\.?\d{0,2}$. From my understanding, the reason you are getting ZZZ123M.45 is that it is finding the first ^ (i.e, the start of the string) that is not followed by your pattern \d*\.?\d{0,2}$, and since 123M.45 doesn't match your "Y" pattern, your negative lookahead is satisfied and the beginning of your string is matched and "replaced" with ZZZ.
That (I think) is an explanation of what you are seeing.
I would propose an alternative solution to your problem that better fits with how I understand the .replace() method. Instead of your inverse pattern, try this one:
const invalidChars = /[^\d\.]|\.(?=\.)|(?<=\.\d\d)\d*/g
const myString = '123M..456444';
const fixed = myString.replace(invalidChars, '');
Here I am using a pattern that I think will match the individual characters that you want to remove. Let's break down what this one is doing:
[^\d\.]: match characters that are not digits
\.(?=\.): match . character if it is followed by another . character.
(?<=\.\d\d)\d*: match digits that are preceded by a decimal and 2 digits
Then I join all these with ORs (|) so it will match any one of the above patterns, and I use the g flag so that it will replace all the matches, not just the first one.
I am not sure if this will cover all your use cases, but I thought I would give it a shot. Here's a link to a breakdown that might be more helpful than mine, and you can use this tool to tweak the pattern if necessary.

I don't think you can do this
remove any characters from the string that do not fit the first regex
Because regex matching is meant for the entire string, and replace is used to replace just a PART inside that string. So the Regex inside replace must be a Regex to match unwanted characters only, not inverted Regex.
What you could do is to validate the string with your original regex, then if it's not valid, replace and validate again.
//if (notValid), replace unwanted character
// replace everything that's not a dot or digit
const replaceRegex = /[^\d.]/g; // notice g flag here to match every occurrence
const myString = '123M.45';
const fixed = myString.replace(replaceRegex, '');
console.log(fixed)
// validate again

Related

How to replace string between two string with the same length

I have an input string like this:
ABCDEFG[HIJKLMN]OPQRSTUVWXYZ
How can I replace each character in the string between the [] with an X (resulting in the same number of Xs as there were characters)?
For example, with the input above, I would like an output of:
ABCDEFG[XXXXXXX]OPQRSTUVWXYZ
I am using JavaScript's RegEx for this and would prefer if answers could be an implementation that does this using JavaScript's RegEx Replace function.
I am new to RegEx so please explain what you do and (if possible) link articles to where I can get further help.

Using replace() and passing the match to a function as parameter, and then Array(m.length).join("X") to generate the X's needed:
var str = "ABCDEFG[HIJKLMN]OPQRSTUVWXYZ"
str = str.replace(/\[[A-Z]*\]/g,(m)=>"["+Array(m.length-1).join("X")+"]")
console.log(str);
We could use also .* instead of [A-Z] in the regex to match any character.
About regular expressions there are thousands of resources, specifically in JavaScript, you could see Regular Expressions MDN but the best way to learn, in my opinion, is practicing, I find regex101 useful.

const str="ABCDEFG[HIJKLMN]OPQRSTUVWXYZ";
const run=str=>str.replace(/\[.*]/,(a,b,c)=>c=a.replace(/[^\[\]]/g,x=>x="X"));
console.log(run(str));
The first pattern /\[.*]/ is to select letters inside bracket [] and the second pattern /[^\[\]]/ is to replace the letters to "X"

We can observe that every individual letter you wish to match is followed by a series of zero or more non-'[' characters, until a ']' is found. This is quite simple to express in JavaScript-friendly regex:
/[A-Z](?=[^\[]*\])/g
regex101 example
(?= ) is a "positive lookahead assertion"; it peeks ahead of the current matching point, without consuming characters, to verify its contents are matched. In this case, "[^[]*]" matches exactly what I described above.
Now you can substitute each [A-Z] matched with a single 'X'.

You can use the following solution to replace a string between two square brackets:
const rxp = /\[.*?\]/g;
"ABCDEFG[HIJKLMN]OPQRSTUVWXYZ".replace(rxp, (x) => {
return x.replace(rxp, "X".repeat(x.length)-2);
});

Javascript RegEx positive lookahead not working as expected

First of all i am not very good in dealing with regex But I am trying to create a regex to match specific string while replace it by skipping first character of matcher string using positive look ahead. Please see detail below
Test String asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka
RegEx (?=[^\w])df\.
Replacement String kkk.
Expected Result asdf.wakawaka asdf.waka kkk.waka [kkk.waka (kkk.waka _df.waka {df,waka
But regex above dos not found any match thus it replaces nothing as a result and give original test string in the result.
Without positive lookahead (skip first character strategy) it matches my requirement. see matching regex sample on regex101.com
With positive lookahead giving unexpected results regex with positive look aheah on regex101.com
Thanks in advance for any help.

Using [^\w] means you want to match and consume a char other than a word char before the match you need to replace.
However, this char is consumed and you cannot restore it without capturing it first. You might use Gurman's approach to match /(^|\W)df\./g and replace with '$1kkk., but you may also use a word boundary:
\bdf\.
See the regex demo
JS demo:
var s = "asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka";
console.log(
s.replace(/\bdf\./g, 'kkk.')
);
However, if you do not want to replace df. at the start of the string, use
var s = "df. asdf.wakawaka asdf.waka df.waka [df.waka (df.waka _df.waka {df,waka";
console.log(
s.replace(/(\W)df\./g, '$1kkk.')
);

Find number that follows certain string which includes both letters and punctuation

I am trying to find a way to extract the numbers that occur after abc/ immediately succeeding the / and before any further letters, numbers or punctuation.
E.g:
abc/134567/something should return 1234567
abc/1234567?foo=bar should still only return 1234567
blah/1234/abc/678 should only return 678 as I'm looking only for the number that succeeds abc/
I'm aware there are two options: regex or substring match.
In order to perform the substring match I need the index point but I'm dubious about merely doing an indexOf("abc/") as it only returns the index of the first letter - a - which could be present elsewhere in the string.
With regex I have struggled as I find that searching for a mixture of the letters and the slashes seems to cause it to return null.
So what's the best way?

You can use this regexpression :
var rgx = new RegExp("abc\/([0-9]+)","gi");
Then :
var m = rgx.exec("abc/1234567?foo=bar");
console.log(m[0]);
edited after comments

You could use a regular expression and seach for abc/ and following digits.
var array = ['abc/134567/something', 'abc/1234567?foo=bar', 'blah/1234/abc/678'];
console.log(array.map(s => s.match(/abc\/(\d+)/)[1]));

We accept string that has abc/, after it an integer number, that is taken as a matched group and either the end of string or some non-digit symbol after it.
abc\/(\d+)(?:$|\D)
test
You'll use in Javascript for matched group extraction:
var myRegexp = /abc\/(\d+)(?:$|\D)/g;
var match = myRegexp.exec(inputString);
var result=match[1]; // the number after abc/
In another regex engine than that of JavaScript, lookahead and lookbehind could be used. But in JS lookbehinds are forbidden. :-(. So we have to use this, a bit more complicated, way.

Are you after something like this:
^(.*\/)(\d+)(.*)
Where the second group will give you the digits after the slash.
Look at the regex here

JQuery match with RegEx not working

I have a filename that will be something along the lines of this:
Annual-GDS-Valuation-30th-Dec-2016-082564K.docx
It will contain 5 numbers followed by a single letter, but it may be in a different position in the file name. The leading zero may or may not be there, but it is not required.
This is the code I come up with after checking examples, however SelectedFileClientID is always null
var SelectedFileClientID = files.match(/^d{5}\[a-zA-Z]{1}$/);
I'm not sure what is it I am doing wrong.
Edit:
The 0 has nothing to do with the code I am trying to extract. It may or may not be there, and it could even be a completely different character, or more than one, but has nothing to do with it at all. The client has decided they want to put additional characters there.

There are at least 3 issues with your regex: 1) the pattern is enclosed with anchors, and thus requires a full string match, 2) the d matches a letter d, not a digit, you need \d to match a digit, 3) a \[ matches a literal [, so the character class is ruined.
Use
/\d{5}[a-zA-Z]/
Details:
\d{5} - 5 digits
[a-zA-Z] - an ASCII letter
JS demo:
var s = 'Annual-GDS-Valuation-30th-Dec-2016-082564K.docx';
var m = s.match(/\d{5}[a-zA-Z]/);
console.log(m[0]);

All right, there are a few things wrong...
var matches = files.match(/\-0?(\d{5}[a-zA-Z])\.[a-z]{3,}$/);
var SelectedFileClientID = matches ? matches[1] : '';
So:
First, I get the matches on your string -- .match()
Then, your file name will not start with the digits - so drop the ^
You had forgotten the backslash for digits: \d
Do not backslash your square bracket - it's here used as a regular expression token
no need for the {1} for your letters: the square bracket content is enough as it will match one, and only one letter.
Hope this helps!

Try this pattern , \d{5}[a-zA-Z]

Try - 0?\d{5}[azA-Z]
As you mentioned 0 may or may not be there. so 0? will take that into account.
Alternatively it can be done like this. which can match any random character.
(\w+|\W+|\d+)?\d{5}[azA-Z]

javascript find protocol, domain, plus first slash with regexp from a src tag, replace with empty string

I tried to construct a regex for this task but I'm afraid I am still failing to have an intuitive understanding of regexp.
The problem is the regex matches until the last slash in a string. I want it to stop at the first match of the string.
My pathetic attempt at regex:
/^http(s?):\/\/.+\/{1}/
Test subject:
http://foo.com/bar/test/foo.jpeg
The goal is to obtain bar/test/foo.jpeg, so that I may then split the string, pop the last element and then join the remainder, resulting in having the path to the JavaScript file.
Example
var str = 'http://foo.com/bar/test/foo.jpeg';
str.replace(regexp,'');

While the other answer shows how to match a part of a string, I think a replace solution is more appropriate for the current task.
The issue you have is that .+ matches one or more characters other than a newline greedily, that is, all the string is grabbed first in one go, and then the regex engine starts backtracking (moving backwards along the input string looking for a / to accommodate in the match). Thus, you get the match from http until the last /.
To restrict the match from http to the first / use a negated character class [^/]+ instead of .+.
^https?:\/\/[^\/]+\/
^^^^^^
See the regex demo
Note that you do not need to place s into a capturing group to make it optional, unescaped ? is a quantifier that makes the preceding character match one or zero times. Also, {1} is a redundant quantifier since this is default behavior, c will only match 1 c, (?:something) will only match one something.
var re = /^https?:\/\/[^\/]+\//;
var str = 'http://foo.com/bar/test/foo.jpeg';
var result = str.replace(re, '');
document.getElementById("r").innerHTML = result;
<div id="r"/>
Note that you will need to assign the replace result to some variable, since in JS, strings are immutable.
Regex explanation:
^ - start of string
https? - either http or https substring
:\/\/ - a literal sequence of ://
[^\/]+ - 1 or more characters other than a /
\/ - a literal / symbol

Use capturing group based regex.
> var s = "http://foo.com/bar/test/foo.jpeg"
> s.match(/^https?:\/\/[^\/]+((?:\/[^\/]*)*)/)[1]
'/bar/test/foo.jpeg'

We Keep Coding

JavaScript is the programming language of the Web.

Regex replace not removing characters properly - javascript

Related

How to replace string between two string with the same length

Javascript RegEx positive lookahead not working as expected

Find number that follows certain string which includes both letters and punctuation

JQuery match with RegEx not working

javascript find protocol, domain, plus first slash with regexp from a src tag, replace with empty string

Categories

Resources