Use regex to find and replace every second backtick in a string - javascript

I am trying to write a script that will replace every SECOND backtick with a backtick and semi colon. See below for expected behavior:
"`Here is my string`"
Needs to become:
"`Here is my string`;"
I have found a few helpful answers on stack, such as this one, this one and this one but when I try the replacement on this solution it selects all occurrences, rather than every second occurrence. And on this solution it selects every FIRST occurrence instead of every second one.
As of now I have tried...
str.replace(/\`.*?\`*/g, '`;')
...as well as...
str.replace('\w*\`\b/gm, '`;')
Both have gotten me close but I can't seem to just get every SECOND backtick by itself.

If you want to replace every second backtick, you might use a capturing group and a negated character class
In the replacement you could use $1;`
(`[^`]*)`
Explanation
( Capture group
`[^`]* Match a backtick, match 0+ times any char except a backtick using a negated character class
)` Close group 1 and match a backtick
Regex demo
const regex = /(`[^`]*)`/g;
const str = `\`Here is my string\` this is another test \`Here is my string\``;
const result = str.replace(regex, `$1\`;`);
console.log(result);

Finding the second backtick is easy, but finding every second backtick is harder. I think this should work:
"`Here is my string` and `another` and `another`".replace(/`.*?(`.*?`)*?`/g, '$&;');
// -> "`Here is my string`; and `another`; and `another`;"
Let's dig into what that regex means.
it finds 1st backtick, followed by anything. Note the ? in .*?: this makes the match lazy, so that finds the shortest match, not the longest.
it then finds an even number (0, 2, 4) of following backticks (1 + even = odd number of backticks in total), again separated by anything lazily (.*?).
it then finds a final backtick
it replaces that in the string with $& (= everything was matched) then adds the semicolon.
The g flag at the end then makes it global, so we replace every available match, not just the first one.
Depending on your input you might want to make this more rigorous, it's just a proof of concept. For potentially large inputs especially you may need to watch out for catastrophic backtracking with regular expressions including multiple .* sections like this.

You can try this
(`[^`]*)`
let str = "`Here is my string` some more string with ``` some more ``` and` and `"
let final = str.replace(/(`[^`]*)`/g,'$1`;')
console.log(final)

I will share the trick i spotted, then was learning regex. To match everything between two specific char, use this construction:
# -- character, that enclosing some content
#[^#]*#
In your case, i believe, you want to use this approach:
`[^`]+(`)
^
use * here, if you want match case, then two backticks do not
contain anything
Here you match every second backtick into first group. After that you can substitute this group to `;

This expression might simply do that:
const regex = /(.*`.*`)/gm;
const str = `\`Here is my string\``;
const subst = `$1;`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log(result);
The expression is explained on the top right panel of regex101.com, if you wish to explore/simplify/modify it, and in this link, you can watch how it would match against some sample inputs, if you like.

Related

Regex replace not removing characters properly

I have the regular expression:
const regex = /^\d*\.?\d{0,2}$/
and its inverse (I believe) of
const inverse = /^(?!\d*\.?\d{0,2}$)/
The first regex is validating the string fits any positive number, allowing a decimal and two decimal digits (e.g. 150, 14., 7.4, 12.68). The second regex is the inverse of the first, and doing some testing I'm fairly confident it's giving the expected result, as it only validates when the string is anything but a number that may have a decimal and two digits after (e.g. 12..05, a5, 54.357).
My goal is to remove any characters from the string that do not fit the first regex. I thought I could do that this way:
let myString = '123M.45';
let fixed = myString.replace(inverse, '');
But this does not work as intended. To debug, I tried having the replace character changed to something I would be able to see:
let fixed = myString.replace(inverse, 'ZZZ');
When I do this, fixed becomes: ZZZ123M.45
Any help would be greatly appreciated.
I think I understand your logic here trying to find a regex that is the inverse of the regex that matches your valid string, in the hopes that it will allow you to remove any characters that make your string invalid and leave only the valid string. However, I don't think replace() will allow you to solve your problem in this way. From the MDN docs:
The replace() method returns a new string with some or all matches of a pattern replaced by a replacement.
In your inverse pattern you are using a negative lookahead. If we take a simple example of X(?!Y) we can think of this as "match X if not followed by Y". In your pattern your "X" is ^ and your "Y" is \d*\.?\d{0,2}$. From my understanding, the reason you are getting ZZZ123M.45 is that it is finding the first ^ (i.e, the start of the string) that is not followed by your pattern \d*\.?\d{0,2}$, and since 123M.45 doesn't match your "Y" pattern, your negative lookahead is satisfied and the beginning of your string is matched and "replaced" with ZZZ.
That (I think) is an explanation of what you are seeing.
I would propose an alternative solution to your problem that better fits with how I understand the .replace() method. Instead of your inverse pattern, try this one:
const invalidChars = /[^\d\.]|\.(?=\.)|(?<=\.\d\d)\d*/g
const myString = '123M..456444';
const fixed = myString.replace(invalidChars, '');
Here I am using a pattern that I think will match the individual characters that you want to remove. Let's break down what this one is doing:
[^\d\.]: match characters that are not digits
\.(?=\.): match . character if it is followed by another . character.
(?<=\.\d\d)\d*: match digits that are preceded by a decimal and 2 digits
Then I join all these with ORs (|) so it will match any one of the above patterns, and I use the g flag so that it will replace all the matches, not just the first one.
I am not sure if this will cover all your use cases, but I thought I would give it a shot. Here's a link to a breakdown that might be more helpful than mine, and you can use this tool to tweak the pattern if necessary.
I don't think you can do this
remove any characters from the string that do not fit the first regex
Because regex matching is meant for the entire string, and replace is used to replace just a PART inside that string. So the Regex inside replace must be a Regex to match unwanted characters only, not inverted Regex.
What you could do is to validate the string with your original regex, then if it's not valid, replace and validate again.
//if (notValid), replace unwanted character
// replace everything that's not a dot or digit
const replaceRegex = /[^\d.]/g; // notice g flag here to match every occurrence
const myString = '123M.45';
const fixed = myString.replace(replaceRegex, '');
console.log(fixed)
// validate again

How to code a Regex for a shared character between two correspondent patterns?

I am going to find all 'aa' sub-strings in the 'caaab'. So, I've used the following regular expression.
/aa/g
Using the cited expression, I expect that JavaScript's match method returns two correspondent patterns. As you can see, the middle, shared 'a' causes two 'aa' patterns! Nonetheless, it merely returns the first one. What is the problem with the Regex, and how can I fix it?
let foundArray=d.match(/aa/g);
Here is one way to approach this. We can first record the length of the input string, for use later. Then, do a global regex replacement of a(?=a) with empty string. One by one, this will replace each occurrence of the substring aa in the input. Then, we can compare the length of the output against the input to figure out how many times aa occurred.
var input = "caaab";
var sLen = input.length;
var output = input.replace(/a(?=a)/g, "");
var eLen = output.length;
console.log("There were " + (sLen - eLen) + " occurrences of aa in the input");
Note that the difficulty you are encountering has to do with the behavior of JavaScript's regex engine. If you replace aa, it will consume everything, and so might be consuming the first letter a of the next sequential aa match. Using a(?=a) gets around this problem, because the lookahead (?=a) does not consume the next a.
Use a lookahead
As mentioned in a comment that's how regexes are designed to work:
it's working exactly as it's supposed to; once it consumes a character, it moves past it
Matches do not overlap, this isn't a limitation of js it's simply how regular expressions work.
The way to get around that is to use a zero-length match, i.e. a look-ahead or look-behind
Tim's existing answer already does this, but can be simplified as follows:
match = "caaab".match(/a(?=a)/g);
console.log(match);
This is finding an a followed by another a (which is not returned as part of the match). So technically it's finding:
caaab
^ first match, single character
^ second match, single character

How to replace string between two string with the same length

I have an input string like this:
ABCDEFG[HIJKLMN]OPQRSTUVWXYZ
How can I replace each character in the string between the [] with an X (resulting in the same number of Xs as there were characters)?
For example, with the input above, I would like an output of:
ABCDEFG[XXXXXXX]OPQRSTUVWXYZ
I am using JavaScript's RegEx for this and would prefer if answers could be an implementation that does this using JavaScript's RegEx Replace function.
I am new to RegEx so please explain what you do and (if possible) link articles to where I can get further help.
Using replace() and passing the match to a function as parameter, and then Array(m.length).join("X") to generate the X's needed:
var str = "ABCDEFG[HIJKLMN]OPQRSTUVWXYZ"
str = str.replace(/\[[A-Z]*\]/g,(m)=>"["+Array(m.length-1).join("X")+"]")
console.log(str);
We could use also .* instead of [A-Z] in the regex to match any character.
About regular expressions there are thousands of resources, specifically in JavaScript, you could see Regular Expressions MDN but the best way to learn, in my opinion, is practicing, I find regex101 useful.
const str="ABCDEFG[HIJKLMN]OPQRSTUVWXYZ";
const run=str=>str.replace(/\[.*]/,(a,b,c)=>c=a.replace(/[^\[\]]/g,x=>x="X"));
console.log(run(str));
The first pattern /\[.*]/ is to select letters inside bracket [] and the second pattern /[^\[\]]/ is to replace the letters to "X"
We can observe that every individual letter you wish to match is followed by a series of zero or more non-'[' characters, until a ']' is found. This is quite simple to express in JavaScript-friendly regex:
/[A-Z](?=[^\[]*\])/g
regex101 example
(?= ) is a "positive lookahead assertion"; it peeks ahead of the current matching point, without consuming characters, to verify its contents are matched. In this case, "[^[]*]" matches exactly what I described above.
Now you can substitute each [A-Z] matched with a single 'X'.
You can use the following solution to replace a string between two square brackets:
const rxp = /\[.*?\]/g;
"ABCDEFG[HIJKLMN]OPQRSTUVWXYZ".replace(rxp, (x) => {
return x.replace(rxp, "X".repeat(x.length)-2);
});

Need help writing a regex pattern

I am trying to find a pattern in a string that has a value that starts with ${ and ends with }. There will be a word between the curly brackets, but I won't know what word it is.
This is what I have \$\\{[a-zA-Z]\\}
${a} works, but ${aa} doesn't. It seems it's only looking for a single character.
I am unsure what I am doing wrong, or how to fix it and would appreciate any help anyone can provide.
I think this could help you
var str = "The quick brown ${fox} jumps over the lazy ${dog}";
var re = /\$\{([a-z]+)\}/gi;
var match;
while (match = re.exec(str)) {
console.log(match[1]);
}
Click Run code snippet and check your developer console for output
"fox"
"dog"
Explanation
+ means match 1 or more of the previous term — in this example, match 1 or more of [a-z]
the (...) parentheses will "capture" the match so you can actually do something with it — in my example, I'm just using console.log to output it
the i modifier (at the end of the regexp) means perform a case-insensitive match
the g modifier means match all instances of this regexp in the target string
The while loop will continue running for each match that re.exec finds. Once re.exec cannot match another instance, it will return null and the loop will exit.
Additional information
Try console.log(match) using the code above. Each match comes with other useful information such as the string index where the match occurred
Gotchas
This will not work for nested ${} sets
For example, this regexp will not work on "The quick brown ${fox jumps ${over}} the lazy ${dog}."
You're close!
All you need is to use a + to tell the expression that there will be one or more of whatever was just before it (in this case [a-zA-Z]) like this:
\${[a-zA-Z]+}
A good website for regex reference and testing is http://rubular.com/
It looks like you need to add a +, which tells the regex to look for one or more of a character.
Try: \${[a-zA-Z]+}
You need to use * (zero or more) or + (one or more). So this [a-zA-Z] would be [a-zA-Z]+, meaning 1 or more letters. The entire regex would look like:
\$\{[a-zA-Z]+\}

Nice way to do this regex substitution

I'm writing a javascript function which takes a regex and some elements against which it matches the regex against the name attribute.
Let's say i'm passed this regex
/cmw_step_attributes\]\[\d*\]/
and a string that is structured like this
"foo[bar][]chicken[123][cmw_step_attributes][456][name]"
where all the numbers could vary, or be missing. I want to match the regex against the string in order to swap out the 456 for another number (which will vary), eg 789. So, i want to end up with
"foo[bar][]chicken[123][cmw_step_attributes][789][name]"
The regex will match the string, but i can't swap out the whole regex for 789 as that will wipe out the "[cmw_step_attributes][" bit. There must be a clean and simple way to do this but i can't get my head round it. Any ideas?
thanks, max
Capture the first part and put it back into the string.
.replace(/(cmw_step_attributes\]\[)\d*/, '$1789');
// note I removed the closing ] from the end - quantifiers are greedy so all numbers are selected
// alternatively:
.replace(/cmw_step_attributes\]\[\d*\]/, 'cmw_step_attributes][789]')
Either literally rewrite part that must remain the same in replacement string, or place it inside capturing brackets and reference it in replace.
See answer on: Regular Expression to match outer brackets.
Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.
Have you tried:
var str = 'foo[bar][]chicken[123][cmw_step_attributes][456][name]';
str.replace(/cmw_step_attributes\]\[\d*?\]/gi, 'cmw_step_attributes][XXX]');

Categories