I have the following JS:
"a a a a".replace(/(^|\s)a(\s|$)/g, '$1')
I expect the result to be '', but am instead getting 'a a'. Can anyone explain to me what I am doing wrong?
Clarification: What I am trying to do is remove all occurrences of 'a' that are surronded by whitespace (i.e. a whole token)
It's because this regex /(^|\s)a(\s|$)/g match the previous char and the next char to each a
in string "a a a a" the regex matches :
"a " , then the string to check become "a a a"$ (but now the start of the string is not the beginning and there is not space before)
" a " (the third a) , then become "a"$ (that not match because no space before)
Edit:
Little bit tricky but working (without regex):
var a = "a a a a";
// Handle beginning case 'a '
var startI = a.indexOf("a ");
if (startI === 0){
var off = a.charAt(startI + 2) !== "a" ? 2 : 1; // test if "a" come next to keep the space before
a = a.slice(startI + off);
}
// Handle middle case ' a '
var iOf = -1;
while ((iOf = a.indexOf(" a ")) > -1){
var off = a.charAt(iOf + 3) !== "a" ? 3 : 2; // same here
a = a.slice(0, iOf) + a.slice(iOf+off, a.length);
}
// Handle end case ' a'
var endI = a.indexOf(" a");
if (endI === a.length - 2){
a = a.slice(0, endI);
}
a; // ""
First "a " matches.
Then it will try to match against "a a a", which will skip first a, and then match "a ".
Then it will try to match against "a", which will not match.
First match will be replaced to beginning of line. => "^"
Then we have "a" that didn't match => "a"
Second match will be replaced to " " => " "
Then we have "a" that didn't match => "a"
The result will be "a a".
To get your desired result you can do this:
"a a a a".replace(/(?:\s+a(?=\s))+\s+|^a\s+(?=[^a]|$|a\S)|^a|\s*a$/g, '')
As others have tried to point out, the issue is that the regex consumes the surrounding spaces as part of the match. Here's a [hopefully] more straight forward explanation of why that regex doesn't work as you expect:
First let's breakdown the regex, it says match the a space or start of string, followed by an 'a' followed by a space or the end of the string.
Now let's apply it to the string. I've added character indexes beneath the string to make things easier to talk about:
a a a a
0123456
The regex looks at the 0 index char, and finds an 'a' at that location, followed by a space at index 2. This is a match because it is the start of the string, followed by an a followed by a space. The length of our match is 2 (the 'a' and the space), so we consume two characters and start our next search at index 2.
Character 2 ('a') is neither a space nor the start of the string, and therefore it doesn't match the start of our regular expression, so we consume that character (without replacing it) and move on to the next.
Character 3 is a space, followed by an 'a' followed by another space, which is a match for our regex. We replace it with an empty string, consume the length of the match (3 characters - " a ") and move on to index 6.
Character 6 ('a') is neither a space nor the start of the string, and therefore it doesn't match the start of our regular expression, so we consume that character (without replacing it) and move on to the next.
Now we're at the end of the string, so we're done.
The reason why the regex #caeth suggested (/(^|\s+)a(?=\s|$)/g) works is because of the ?= quantifier. From the MDN Regexp Documentation:
Matches x only if x is followed by y. For example, /Jack(?=Sprat)/ matches "Jack" only if it is followed by "Sprat". /Jack(?=Sprat|Frost)/ matches "Jack" only if it is followed by "Sprat" or "Frost". However, neither "Sprat" nor "Frost" is part of the match results.
So, in this case, the ?= quantifier checks to see if the following character is a space, without actually consuming that character.
(^|\s)a(?=\s|$)
Try this.Replace by $1.See demo.
https://regex101.com/r/gQ3kS4/3
Use this instead:
"a a a a".replace(/(^|\s*)a(\s|$)/g, '$1')
With "* this you replace all the "a" occurrences
Greetings
Or you can just split the string up, filter it and glue it back:
"a ba sl lf a df a a df r a".split(/\s+/).filter(function (x) { return x != "a" }).join(" ")
>>> "ba sl lf df df r"
"a a a a".split(/\s+/).filter(function (x) { return x != "a" }).join(" ")
>>> ""
Or in ECMAScript 6:
"a ba sl lf a df a a df r a".split(/\s+/).filter(x => x != "a").join(" ")
>>> "ba sl lf df df r"
"a a a a".split(/\s+/).filter(x => x != "a").join(" ")
>>> ""
I assume that there is no leading and trailing spaces. You can change the filter to x && x != 'a' if you want to remove the assumption.
Related
I would like to replace text using javascript/regex
"TV "my-samsung" (UUID: a1c3bbc1d27c5be8:8baabe2fa7f5d9ca) is already switched off."
with
TV 'my-samsung' is already switched off.
by removing text (UUID: ) and replace " with '
Looks like regex can be used
\([\s\S]*?\)
https://regex101.com/r/xXDncn/1
or have also tried using replace method in JS
str = str.replace("(UUID", "");
You can use
const str = '" "Tv "my-samsung" (UUID: a1c3bbc1d27c5be8:8baabe2fa7f5d9ca) is already switched-off""';
console.log(
str.replace(/\s*\(UUID:[^()]*\)/g, '').replace(/^[\s"]+|[\s"]+$/g, '').replaceAll('"', "'")
)
See the first regex demo. It matches
\s* - zero or more whitespaces
\(UUID: - (UUID: string
[^()]* - zero or more chars other than ( and )
\) - a ) char.
The g flag makes it replace all occurrences.
The second regex removes trailing and leading whitespace and double quotation marks:
^[\s"]+ - one or more whitespaces and double quotes at the start of string
| - or
[\s"]+$ - one or more whitespaces and double quotes at the end of string.
The .replaceAll('"', "'") is necessary to replace all " with ' chars.
It is not a good idea to merge these two operations into one as the replacements are different. Here is how it could be done, just for learning purposes:
const str = '" "Tv "my-samsung" (UUID: a1c3bbc1d27c5be8:8baabe2fa7f5d9ca) is already switched-off""';
console.log(
str.replace(/^[\s"]+|[\s"]+$|\s*\(UUID:[^()]*\)|(")/g, (x,y) => y ? "'" : "")
)
That is, " is captured into Group 1, the replacement is now a callable, where x is the whole match and y is the Group 1 contents. If Group 1 matched, the replacement is ', else, the replacement is an empty string (to remove the match found).
you can try this
str.replace(/\(.*?\)/, "")
str.replace(/\(.*?\)/, "with")
--- update ---
const str = `"TV "my-samsung" (UUID: a1c3bbc1d27c5be8:8baabe2fa7f5d9ca) is already switched off."`;
const a = str.replace(/"(.*?)\(.*\)(.*)"/, (a, b, c) => {
return b.replace(/"/g, "'") + c
});
console.log(a); //TV 'my-samsung' is already switched off.
I have input string
..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''
I want to check if the first and last char place contains - or ' or ..
If yes then trim until we get name.
Expected output : VAibhavs.sharma
I am using like this.
while (
myString.charAt(0) == "." ||
myString.charAt(0) == "'" ||
myString.charAt(0) == "-" ||
myString.charAt(myString.length - 1) == "." ||
myString.charAt(myString.length - 1) == "'" ||
myString.charAt(myString.length - 1) == "-"
)
I know this is not correct way. How can I use regex?
I tried /^\'$. But this only checks or first char for a single special char.
You can use regular expression:
input = "..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''"
output = input.replace(/^[-'\.]+/,"").replace(/[-'\.]+$/,"")
console.log(output)
[-'\.] ... -, ' or . character
+ ... one or more times
^ ... beginning of the string
$ ... end of the string
EDIT:
using match:
input = "..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''"
output = input.match(/^[-'\.]+(.*?)[-'\.]+$/)[1]
console.log(output)
(...) ... (1st) group
.*? ... any chacter, zero or more times, ? means non-greedy
.match(...)[1] ... 1 means 1st group
There is already one accepted answer but still, this is how I would do.
var pattern = /\b[A-Za-z.]+\b/gm;
var str = "..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''";
console.log(str.match(pattern));
// Output
// ["VAibhavs.sharma"]
\b is a zero-width word boundary. It matches positions where one side is a word character (usually a letter, digit or underscore) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).
Given a string containing CamelCase and also uppercase acronymns e.g. 'ManualABCTask';
How can it be split to a string with a space between all words and acronyms in a less wordy way?
I had the following process:
let initial = 'ManualABCTask'
//Split on single upper case followed by any number of lower case:
.split(/(['A-Z'][a-z]*)/g)
//the returned array includes empty string entries e.g. ["", "", "Manual", "A", "", "B", "", "C","", "Task", ""] so remove these:
.filter(x => x != '');
//When joining the array, the acronymn uppercase single letters have a space e.g. 'Manual A B C Task' so instead, reduce and add space only if array entry has more than one character
let word = initial.reduce((prevVal,currVal) => {
return (currVal.length == 1) ? prevVal + currVal : prevVal + ' ' + currVal + ' ';
}, '');
This does the job on the combinations it needs to e.g:
'ManualABCTask' => 'Manual ABC Task'
'ABCManualTask' => 'ABC Manual Task'
'ABCManualDEFTask' => 'ABC Manual DEF Task'
But it was a lot of code for the job done and surely could be handled in the initial regex.
I was experimenting while writing the question and with a tweak to the regex, got it down to one line, big improvement! So posting anyway with solution.
My regex know how isn't great so this could maybe be improved on still.
I know near to nothing about JavaScript but i had a bash at it:
let initial = 'ManualABCTask'
initial = initial.replace(/([A-Z][a-z]+)/g, ' $1 ').trim();
There 2 groups: starting from head letter with following lowercases, and starting from head letter until next letter isn't lowercase:
find = new RegExp(
"(" +
"[A-Z][a-z]+" + // Group starting from head letter with following lowercases
"|" +
"[A-Z]+(?![a-z])" + // Group with head letters until next letter isn't lowercase:
")",
"g"
)
initial = 'ManualABCTask'.split(find)
As mentioned in post, changed to handle in regex:
initial = 'ManualABCTask'.split(/(['A-Z']{2,99})(['A-Z'][a-z]*)/g).join(' ');
Group any concurrent upper characters with length of 2 to 99 to get the acronyms, and any single upper character followed by any number of lower to get the other words. Join with space.
I need to match numbers that are not preceeded by "/" in a group.
In order to do this I made the following regex:
var reg = /(^|[^,\/])([0-9]*\.?[0-9]*)/g;
First part matches start of the string and anything else except "/", second part matches a number. Everything works ok regarding the regex (it matches what I need). I use https://regex101.com/ for testing. Example here: https://regex101.com/r/7UwEUn/1
The problem is that when I use it in js (script below) it goes into an infinite loop if first character of the string is not a number. At a closer look it seems to keep matching the start of the string, never progressing further.
var reg = /(^|[^,\/])([0-9]*\.?[0-9]*)/g;
var text = "a 1 b";
while (match = reg.exec(text)) {
if (typeof match[2] != 'undefined' && match[2] != '') {
numbers.push({'index': match.index + match[1].length, 'value': match[2]});
}
}
If the string starts with a number ("1 a b") all is fine.
The problem appears to be here (^|[^,/]) - removing ^| will fix the issue with infinite loop but it will not match what I need in strings starting with numbers.
Any idea why the internal index is not progressing?
Infinite loop is caused by the fact your regex can match an empty string. You are not likely to need empty strings (even judging by your code), so make it match at least one digit, replace the last * with +:
var reg = /(^|[^,\/])([0-9]*\.?[0-9]+)/g;
var text = "a 1 b a 2 ana 1/2 are mere (55";
var numbers=[];
while (match = reg.exec(text)) {
numbers.push({'index': match.index + match[1].length, 'value': match[2]});
}
console.log(numbers);
Note that this regex will not match numbers like 34. and in that case you may use /(^|[^,\/])([0-9]*\.?[0-9]+|[0-9]*\.)/g, see this regex demo.
Alternatively, you may use another "trick", advance the regex lastIndex manually upon no match:
var reg = /(^|[^,\/])([0-9]*\.?[0-9]+)/g;
var text = "a 1 b a 2 ana 1/2 are mere (55";
var numbers=[];
while (match = reg.exec(text)) {
if (match.index === reg.lastIndex) {
reg.lastIndex++;
}
if (match[2]) numbers.push({'index': match.index + match[1].length, 'value': match[2]});
}
console.log(numbers);
I have the following URL structure:
https://api.bestschool.com/student/1102003120009/tests/json
I want to cut the student ID from the URL. So far I've came up with this:
/(student\/.*[^\/]*)/
which returns
student/1102003120009/tests/json
I only want the ID.
Your regex (student\/.*[^\/]*) matches and captures into Group 1 a literal sequence student/, then matches any characters other than a newline, 0 or more occurrences (.*) - that can match the whole line at once! - and then 0 or more characters other than /. It does not work because of .*. Also, a capturing group should be moved to the [^\/]* pattern.
You can use the following regex and grab Group 1 value:
student\/([^\/]*)
See regex demo
The regex matches student/ literally, and then matches and captures into Group 1 zero or more symbols other than /.
Alternatively, if you want to avoid using capturing, and assuming that the ID is always numeric and is followed by /tests/, you can use the following regex:
\d+(?=\/tests\/)
The \d+ matches 1 or more digits, and (?=\/tests\/) checks if right after the digits there is a /tests/ character sequence.
var re = /student\/([^\/]*)/;
var str = 'https://api.bestschool.com/student/1102003120009/tests/json';
var m = str.match(re);
if (m !== null) {
document.getElementById("r").innerHTML = "First method : " + m[1] + "<br/>";
}
var m2 = str.match(/\d+(?=\/tests\/)/);
if (m2 !== null) {
document.getElementById("r").innerHTML += "Second method: " + m2;
}
<div id="r"/>