Regarding JavaScript RegEx - Replace all punctuation including underscore

Regarding JavaScript RegEx - Replace all punctuation including underscore - javascript

In javascript, how can I replace all punctuation (including underscore) marks with hyphen? Moreover, it should not contain more than one hyphen sequentially.
I tried "h....e l___l^^0".replace(/[^\w]/g, "-") but it gives me h----e---l___l--0
What should I do so that it returns me h-e-l-l-0 instead?

+ repeats the previous token one or more times.
> "h....e l___l^^0".replace(/[\W_]+/g, "-")
'h-e-l-l-0'
[\W_]+ matches non-word characters or _ one or more times.

All you need to do is to add an quatifier + to regex
"h....e l___l^^0".replace(/[^a-zA-Z0-9]+/g, "-")
NOTE
instead of [^\w] give [^a-zA-Z0-9]+ because \w contains _ hence it wont be replaced if you give [^\w]

Regex101
[!"#$%&'()*+,\-.\/:;<=>?#[\\\]^_`{|}~ ]+
Description
[!"#$%&'()*+,\-.\/:;<=>?#[\\\]^_`{|}~ ]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
!"#$%&'()*+, a single character in the list !"#$%&'()*+, literally (case sensitive)
\- matches the character - literally
. the literal character .
\/ matches the character / literally
:;<=>?#[ a single character in the list :;<=>?#[ literally (case sensitive)
\\ matches the character \ literally
\] matches the character ] literally
^_`{|}~ a single character in the list ^_`{|}~ literally
g modifier: global. All matches (don't return on first match)
JS
alert("h....e l___l^^0".replace(/[!"#$%&'()*+,\-.\/ :;<=>?#[\\\]^_`{|}~]+/g, "-"));
Result:
h-e-l-l-0

Related

how does javascript regex lazy match work?

For this string
abc.com/file/some.png?v=123
how do I match .png? I use
/\..*?\?/
but it is matching .com/file/some.png?, so why is the lazy match rule not working here?

There are lots of variants to this answer. I will propose matching the first file suffix after the last / character.
That can be done with this regex
/(?!.*\/)\.\w+\?/
Explaination
(?!.*/)\.\w+\?
Options: Case insensitive; Dot doesn’t match line breaks; ^$ match at line breaks
Assert that it is impossible to match the regex below starting at this position (negative lookahead) (?!.*/)
Match any single character that is NOT a line break character (line feed, carriage return, line separator, paragraph separator) .*
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) *
Match the character “/” literally /
Match the character “.” literally \.
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) \w+
Between one and unlimited times, as many times as possible, giving back as needed (greedy) +
Match the question mark character \?
\1
Insert a backslash \
Insert the character “1” literally 1
Created with RegexBuddy

(/\s+(\W)/g, '$1') - how are the spaces being removed?

let a = ' lots of spaces in this ! '
console.log(a.replace(/\s+(\W)/g, '$1'))
log shows lots of spaces in this!
The above regex does exactly what I want, but I am trying to understand why?
I understand the following:
s+ is looking for 1 or more spaces
(\W) is capturing the non-alphanumeric characters
/g - global, search/replace all
$1 returns the prior alphanumeric character
The capture/$1 is what removes the space between the words This and !
I get it, but what I don't get is HOW are all the other spaces being removed?? I don't believe I have asked for them to (although I am happy they are).
I get this one console.log(a.replace(/\s+/g, ' ')); because the replace is replacing 1 or more spaces between alphanumeric characters with a single space ' '.
I'm scratching my head to understand HOW the first RegEx /\s+(\W)/g, '$1'replaces 1 or more spaces with a single space.

What your regex says is "match one or more spaces, followed by one or more non-alphanumeric character, and replace that whole result with that one or more non-alphanumeric character". The key is that the \s+ is greedy, meaning that it will try and match as many characters as possible. So in any given string of spaces it will try and match all of the spaces it can. However, your regex also requires one or more non-word characters (\W+). Because in your case the next character after each final space is a word character (i.e. a letter), this last part of the regex must match the last space.
Therefore, given the string a b, and using parens to mark the \s+ and \W+ matches, a( )( )b is the only way for the regex to be valid (\s+ matches the first two spaces and \W+ matches the last space). Now it's just a simple substitution. Since you wrapped the \W+ in parentheses that makes it the first and only capturing group, so replacing the match with $1 will replace it with that final space.
As another example, running this replace against a !b will result in the match looking like a( )(!)b (since ! is now the last non-word character), so the final replaced result will be a!b.

Lets take this string 'aaa &bbb' and run it through.
We get 'aaa&bbb'
\s+ grabs the 3 spaces before the ampersand
(\W) grabs the ampersand
$1 is the ampersand and replaces ' &' with '&'
That same principal applies to the spaces. You are forcing one of the spaces to satisfy the (\W) capture group for the replacement. It's also why your exclamation point isn't nuked.

List of matches would be the following. I replaced space with ☹ so it is easier to see
"☹☹☹☹(☹)",
"☹☹☹☹(☹)",
"☹☹(!)",
"☹(☹)"
And the code is saying to replace the match with what is in the capture group.
' lots of☹☹☹☹(☹)spaces☹☹☹☹(☹)in this☹☹(!)☹(☹)'
so when you replace it you get
' lots of☹spaces☹in this!☹'

Regex related confusion

I would like to extract mkghj.bmg and pp.kp from the following string using a regex used in javascript
avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp
Everything enclosed within ,,,, needs to be ignored. There could be multiple instances of ,,,, But they will always occur even number of times (non occurrence is also a possibility) in the string.
Also, avb#gh.lk has an # sign, therefore it needs to be ignored
I guess the rule I am looking for is this - if there is a dot (.) look ahead and look behind :-
If the dot is enclosed inside ,,,, then ignore it
If the dot has an # before it with no space between the dot and #, ignore it
In all other cases, capture an unbroken set of characters (until a space is encountered) on either side of the dot
I came up with this regex, but it is not helpful
[^\, ]+([^# \,]+\w+)[^\, ]+

Generally speaking that is (mind the capturing group):
not_this | neither_this_nor_this | (but_this_interesting_stuff)
For your specific example, this could be
,,,,.*?,,,,|\S+#\S+|(\S+)
You need to check for the existance of group 1, see a demo on regex101.com.
In JS this would be:
var myString = "avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp";
var myRegexp = /,,,,.*?,,,,|\S+#\S+|(\S+)/g;
match = myRegexp.exec(myString);
while (match != null) {
if (typeof(match[1]) != 'undefined') {
console.log(match[1]);
}
match = myRegexp.exec(myString);
}

Regex
^[^ ]+ ([^ ]+) ,,,,.*,,,,\s+(.*)
Description
^ asserts position at start of a line
Match a single character not present in the list below [^ ]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
matches the character literally (case sensitive)
1st Capturing Group ([^ ]+)
Match a single character not present in the list below [^ ]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
,,,, matches the characters ,,,, literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
,,,, matches the characters ,,,, literally (case sensitive)
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (.*)
.* matches any character (except for line terminators)

You can replace the string between ,,, by '' and than from remaining you can split and search for # sign and filter.
let str = `avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp`
let op = str.replace(/,,,.*?,,,,|/g,'').split(' ').filter(e=> !e.includes('#') && e );
console.log(op)

Javascript RegEx : Match repeating character exactly twice (ignore if it matches 3 times)

I am parsing strings for tokens that have 2 types of delimiter (similar to mustache templates).
I need a pure reg ex solution that matches {{bob}} in this is {{bob}} a double token. But does NOT match in this is {{{bob}}} a triple token
I am matching the double with
\{\{[^\{]([\s\S]+?)[^\}]\}\}
However, it matches the {{bob}} within the triple {{{bob}}}.
Without the negative look behind i'm struggling to find a pure regex solution. Any pointers?

You could search any whitespace char outside the brackets, something like this:
\s(\{\{([^\{\}]+?)\}\})\s

RegEx:
^(\}?)\{\{[^\{]([\s\S]+?)[^\}]\}\}
Auto Generated Explanation:
^ asserts position at start of the string
1st Capturing Group (\}?)
\}? matches the character } literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
\{ matches the character { literally (case sensitive)
\{ matches the character { literally (case sensitive)
Match a single character not present in the list below [^\{]
\{ matches the character { literally (case sensitive)
2nd Capturing Group ([\s\S]+?)
Match a single character present in the list below [\s\S]+?
+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\S matches any non-whitespace character (equal to [^\r\n\t\f ])
Match a single character not present in the list below [^\}]
\} matches the character } literally (case sensitive)
\} matches the character } literally (case sensitive)
\} matches the character } literally (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)

You may use the following to extract the matches:
var s = "this is {{{ bad bob triple}}} a triple token {{bob double}} {{bob}} a double token {{{bad token}} {{bad token}}}";
var rx = /(?:^|[^{]){{([^{}]*)}}(?!})/g;
var m, res=[];
while(m=rx.exec(s)) {
res.push(m[1]);
}
console.log(res);
See the regex demo here.
(?:^|[^{]) - either start of string or any char but {
{{ - double {
([^{}]*) - Group 1: any char but { and } zero or more times
}} - double }
(?!}) - not followed immediately with }.

Get word between two sets of characters

I'm trying to write a regular expression that takes any word between these two sets of characters:3D and &sa
examples:
3DEvb31p5vFs4_&sa : the output : Evb31p5vFs4_
3D_Ve8_LBztG50_&sa : the output : _Ve8_LBztG50_
I have used the expression: /\w[3D][A-Za-z0-9_-].*sa/g
So the next step is to skip the "3D" and "&sa"
Thanks in advance!

You can use match() with regex /3D(.*)&sa/
var a='3DEvb31p5vFs4_&sa';
var b='3D_Ve8_LBztG50_&sa' ;
document.write(a.match(/3D(.*)&sa/)[1] +'<br>');
document.write(b.match(/3D(.*)&sa/)[1]);
Explanation:
3D(.*)&sa
Debuggex Demo

Try this:
3D(?s)(.*)&sa
Explaination:
3D matches the characters 3D literally (case sensitive)
(?s) Match the remainder of the pattern with the following options:
s modifier: single line. Dot matches newline characters
1st Capturing group (.*)
.* matches any character
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
&sa matches the characters &sa literally (case sensitive)
g modifier: global. All matches (don't return on first match)

We Keep Coding

JavaScript is the programming language of the Web.

Regarding JavaScript RegEx - Replace all punctuation including underscore - javascript

In javascript, how can I replace all punctuation (including underscore) marks with hyphen? Moreover, it should not contain more than one hyphen sequentially. I tried "h....e l_l^^0".replace(/[^\w]/g, "-") but it gives me h----e---l_l--0 What should I do so that it returns me h-e-l-l-0 instead?

+ repeats the previous token one or more times. > "h....e l___l^^0".replace(/[\W_]+/g, "-") 'h-e-l-l-0' [\W_]+ matches non-word characters or _ one or more times.

All you need to do is to add an quatifier + to regex "h....e l___l^^0".replace(/[^a-zA-Z0-9]+/g, "-") NOTE instead of [^\w] give [^a-zA-Z0-9]+ because \w contains _ hence it wont be replaced if you give [^\w]

Related

how does javascript regex lazy match work?

(/\s+(\W)/g, '$1') - how are the spaces being removed?

Regex related confusion

Javascript RegEx : Match repeating character exactly twice (ignore if it matches 3 times)

Get word between two sets of characters

Categories

Resources

We Keep Coding

JavaScript is the programming language of the Web.

Regarding JavaScript RegEx - Replace all punctuation including underscore - javascript

In javascript, how can I replace all punctuation (including underscore) marks with hyphen? Moreover, it should not contain more than one hyphen sequentially. I tried "h....e l___l^^0".replace(/[^\w]/g, "-") but it gives me h----e---l___l--0 What should I do so that it returns me h-e-l-l-0 instead?

+ repeats the previous token one or more times. > "h....e l___l^^0".replace(/[\W_]+/g, "-") 'h-e-l-l-0' [\W_]+ matches non-word characters or _ one or more times.

All you need to do is to add an quatifier + to regex "h....e l___l^^0".replace(/[^a-zA-Z0-9]+/g, "-") NOTE instead of [^\w] give [^a-zA-Z0-9]+ because \w contains _ hence it wont be replaced if you give [^\w]

Related

how does javascript regex lazy match work?

(/\s+(\W)/g, '$1') - how are the spaces being removed?

Regex related confusion

Javascript RegEx : Match repeating character exactly twice (ignore if it matches 3 times)

Get word between two sets of characters

Categories

Resources

In javascript, how can I replace all punctuation (including underscore) marks with hyphen? Moreover, it should not contain more than one hyphen sequentially. I tried "h....e l_l^^0".replace(/[^\w]/g, "-") but it gives me h----e---l_l--0 What should I do so that it returns me h-e-l-l-0 instead?