Match two equal character in word and replace with one RE - javascript

I want to convert Two equal character into single one like bannana should be banana //remove "nn" into single "n". ( except : "aa" all should be convert like above)
i/p : khuddar >> o/p : khudar
i/p : maanas >> o/p : maanas
i/p : hello >> o/p : helo
i/p : apple >> o/p : aple
Need regular expression to do these type of work.

Use capturing group and backreference.
Here's a javascript example:
"khuddar".replace(/([^a])\1/g, "$1")
// => "khudar"
"maanas".replace(/([^a])\1/g, "$1")
// => "maanas"
[^a] - matches a character that is not a.
(...) - matches the regular expression and save it to group 1 (2, 3, .. if there's more parentheses after it).
\1 - backreference for the group 1. If the matched part was b, \1 also refer b.

If you need to only match any letters but a, you can use
.replace(/([b-z])\1/ig, "$1")
See the regex demo
Regex explanation:
([b-z]) - Capture group 1 capturing any ASCII letter from b till z (and A to Z because the /i modifier making the pattern case-insensitive)
\1 - a inline backreference that matches the text value captured with the group above (thus, the pattern matches 2 identical ASCII letters)
In the replacement pattern, $1 numbered replacement backreference is used that replaces the 2 identical ASCII letters with 1 occurrence of this letter.
var re = /([b-z])\1/gi;
var str = 'khuddar<br/>maanas<br/>hello<br/>apple<br/>F11';
var subst = '$1';
var result = str.replace(re, subst);
document.body.innerHTML = result;

Related

Replace not numbers or words to underscore but leave dash and remove spaces around it

So I got this string
'word word - word word 24/03/21'
And I would like to convert it to
'word_word-word_word_24_03_21'
I have tried this
replace(/[^aA-zZ0-9]/g, '_')
But I get this instead
word_word___word_word_24_03_21
You can use 2 .replace() calls:
const s = 'word word - word word 24/03/21'
var r = s.replace(/\s*-\s*/g, '-').replace(/[^-\w]+/g, '_')
console.log(r)
//=> "word_word-word_word_24_03_21"
Explanation:
.replace(/\s*-\s*/g, '-'): Remove surrounding spaces of a hyphen
.replace(/[^-\w]+/g, '_'): Replace all character that are not a hyphen and not a word character with an underscore
You can use
console.log(
'word word - word word 24/03/21'.replace(/\s*(-)\s*|[^\w-]+/g, (x,y) => y || "_")
)
Here,
/\s*(-)\s*|[^\w-]+/g - matches and captures into Group 1 a - enclosed with zero or more whitespaces, and just matches any non-word char excluding -
(x,y) => y || "_") - replaces with Group 1 if it was matched, and if not, replacement is a _ char.
With a function for replace and an alternation in the pattern, you could also match:
(\s*-\s*) Match a - between optional whtiespace chars
| Or
[^a-zA-Z0-9-]+ Match 1+ times any of the listed ranges
In the callback, check if group 1 exists. If it does, return only a -, else return _
Note that this notation [^aA-zZ0-9] is not the same as [a-zA-Z0-9], see what [A-z] matches.
let s = "word word - word word 24/03/21";
s = s.replace(/(\s*-\s*)|[^a-zA-Z0-9-]+/g, (_, g1) => g1 ? "-" : "_");
console.log(s);
You can use the + regex operator to replace 1 or more continuous matches at once.
let s = 'word word - word word 24/03/21';
let r = s
.replace(/[^aA-zZ0-9]*-[^aA-zZ0-9]*/g, '-')
.replace(/[^aA-zZ0-9-]+/g, '_');
console.log(r);
// 'word_word-word_word_24_03_21'

Regex to match many times

I'm trying to match a type definition
def euro : t1 -> t2 -> t3 (and this pattern my repeat further in other examples)
I came up with this regex
^def ([^\s]*)\s:\s([^\s]*)(\s->\s[^\s]*)*
But while it matches euro and t1 it
then matches -> t2 rather than t2
fails to match anything with t3
I can't see what I am doing wrong, and my goal is to capture
euro t1 t2 t3
as four separate items, and what I currently get is
0: "def euro : t1 -> t2 -> t3"
1: "euro"
2: "t1"
3: " -> t3"
You can't use a repeated capturing group in JS regex, all but the last values will be "dropped", re-written upon each subsequent iteration.
When creating a regular expression that needs a capturing group to grab part of the text matched, a common mistake is to repeat the capturing group instead of capturing a repeated group. The difference is that the repeated capturing group will capture only the last iteration, while a group capturing another group that's repeated will capture all iterations.
The way out can be capturing the whole substring and then split it. Here is an example:
var s = "def euro : t1 -> t2 -> t3";
var rx = /^def (\S*)\s:\s(\S*)((?:\s->\s\S*)*)/;
var res = [];
var m = s.match(rx);
if (m) {
res = [m[1], m[2]];
for (var s of m[3].split(" -> ").filter(Boolean)) {
res.push(s);
}
}
console.log(res);
Pattern details
^ - start of string
def - a literal substring
(\S*) - Capturing group 1: 0+ non-whitespace chars
\s:\s - a : enclosed with single whitespaces
(\S*) - Capturing group 2: 0+ non-whitespace chars
((?:\s->\s\S*)*) - Capturing group 3: 0+ repetitions of the following pattern sequences:
\s->\s - whitespace, ->, whitespace
\S* - 0+ non-whitespace chars
Details:
?: - creates a non-capturing group
$1 - recieves the result of first capturing group i.e., \w+
\s[\:\-\>]+\s - matches " : " or " -> "
\w+ - matches repeating alphanumeric pattern
let str = 'def euro : t1 -> t2 -> t3';
let regex = /(?:def\s|\s[\:\-\>]+\s)(\w+)/g;
let match = str.replace(regex, '$1\n').trim().split('\n');
console.log(match);

What will be the regular expression for below requirement in javascript

Criteria:
any word that start with a and end with b having middle char digit. this word should not be on the line which start with char '#'
Given string:
a1b a2b a3b
#a4b a5b a6b
a7b a8b a9b
Expected output:
a1b
a2b
a3b
a7b
a8b
a9b
regex: ?i need it for javascipt.
So far tried below thing:
var text_content =above_mention_content
var reg_exp = /^[^#]?a[0-9]b/gmi;
var matched_text = text_content.match(reg_exp);
console.log(matched_text);
Getting below output:
[ 'a1b', ' a7b' ]
Your /^[^#]?a[0-9]b/gmi will match multiple occurrences of the pattern matching the start of line, then 1 or 0 chars other than #, then a, digit and b. No checking for a whole word, nor actually matching words farther than at the beginning of a string.
You may use a regex that will match lines starting with # and match and capture the words you need in other contexts:
var s = "a1b a2b a3b\n#a4b a5b a6b\n a7b a8b a9b";
var res = [];
s.replace(/^[^\S\r\n]*#.*|\b(a\db)\b/gm, function($0,$1) {
if ($1) res.push($1);
});
console.log(res);
Pattern details:
^ - start of a line (as m multiline modifier makes ^ match the line start)
[^\S\r\n]* - 0+ horizontal whitespaces
#.* - a # and any 0+ chars up to the end of a line
| - or
\b - a leading word boundary
(a\db) - Group 1 capturing a, a digit, a b
\b - a trailing word boundary.
Inside the replace() method, a callback is used where the res array is populated with the contents of Group 1 only.
I would suggest to use 2 reg ex:
First Reg ex fetches the non-hashed lines:
^[^#][a\db\s]+
and then another reg ex for fetching individual words(from each line):
^a\db\s

RegExp match word till space or character

I'm trying to match all the words starting with # and words between 2 # (see example)
var str = "#The test# rain in #SPAIN stays mainly in the #plain";
var res = str.match(/(#)[^\s]+/gi);
The result will be ["#The", "#SPAIN", "#plain"] but it should be ["#The test#", "#SPAIN", "#plain"]
Extra: would be nice if the result would be without the #.
Does anyone has a solution for this?
You can use
/#\w+(?:(?: +\w+)*#)?/g
See the demo here
The regex matches:
# - a hash symbol
\w+ - one or more alphanumeric and underscore characters
(?:(?: +\w+)*#)? - one or zero occurrence of:
(?: +\w+)* - zero or more occurrences of one or more spaces followed with one or more word characters followed with
# - a hash symbol
NOTE: If there can be characters other than word characters (those in the [A-Za-z0-9_] range), you can replace \w with [^ #]:
/#[^ #]+(?:(?: +[^ #]+)*#)?/g
See another demo
var re = /#[^ #]+(?:(?: +[^ #]+)*#)?/g;
var str = '#The test-mode# rain in #SPAIN stays mainly in the #plain #SPAIN has #the test# and more #here';
var m = str.match(re);
if (m) {
// Using ES6 Arrow functions
m = m.map(s => s.replace(/#$/g, ''));
// ES5 Equivalent
/*m = m.map(function(s) {
return s.replace(/#$/g, '');
});*/ // getting rid of the trailing #
document.body.innerHTML = "<pre>" + JSON.stringify(m, 0, 4) + "</pre>";
}
You can also try this regex.
#(?:\b[\s\S]*?\b#|\w+)
(?: opens a non capture group for alternation
\b matches a word boundary
\w matches a word character
[\s\S] matches any character
See demo at regex101 (use with g global flag)

javascript insert after every digit

I am trying to add a space after every occurrence of a digit with javascript.
"2tim" will be "2 tim"
js
var v = '2tim';
v.replace(/(\d+)/, /\1 /);
There are 3 things wrong with your code:
The second argument to replace should be a string.
To use a captured group, use the dollar sign.
You don't want to capture all digits into the same group (\d+). Just capture one digit, and make the regex global.
var v = '2tim';
v = v.replace(/(\d)/g, '$1 ');
Here's the fiddle: http://jsfiddle.net/qujsq/
If you want to add a space only after a group of digits, then do use a +:
var v = '12times';
v = v.replace(/(\d+)/g, '$1 ');

Categories