Cannot get all possible overlapping regular expression matches - javascript

I have string
Started: 11.11.2014 11:19:28.376<br/>Ended: 1.1.4<br/>1:9:8.378<br/>Request took: 0:0:0.2
I need to add zeros in case I encounter 1:1:8 it should be 01:01:08 same goes for date. I tried using
/((:|\.|\s)[0-9](:|\.))/g
but it did not give all possible overlapping matches. How to fix it?
var str = "Started: 11.11.2014 11:19:28.376<br/>Ended: 11.11.2014<br/>11:19:28.378<br/>Request took: 0:0:0.2";
var re = /((:|\.|\s)[0-9](:|\.))/g
while ((match = re.exec(str)) != null) {
//alert("match found at " + match.index);
str = [str.slice(0,match.index), '0', str.slice(match.index+1,str.length)];
}
alert(str);

This will probably do what you want:
str.replace(/\b\d\b/g, "0$&")
It searches for lone digits \d, and pad 0 in front.
The first word boundary \b checks that there is no [a-zA-Z0-9_] in front, and the second checks there is no [a-zA-Z0-9_] behind the digit.
$& in the replacement string refers to the whole match.
If you want to pad 0 as long as the character before and after are not digits:
str.replace(/(^|\D)(\d)(?!\d)/g, "$10$2")

Related

Finding exact words in text, excluding quoted words

In the javascript code below I need to find in a text exact words, but excluding the words that are between quotes. This is my attempt, what's wrong with the regex? It should find all the words excluding word22 and "word3". If I use only \b in the regex it selects exact words but it doesn't exclude the words between quotes.
var text = 'word1, word2, word22, "word3" and word4';
var words = [ 'word1', 'word2', 'word3' , 'word4' ];
words.forEach(function(word){
var re = new RegExp('\\b^"' + word + '^"\\b', 'i');
var pos = text.search(re);
if (pos > -1)
alert(word + " found in position " + pos);
});
First, we'll use a function to escape the characters of the word, just in case there's some that have special meaning for regexp.
// from https://stackoverflow.com/a/30851002/240443
function regExpEscape(literal_string) {
return literal_string.replace(/[-[\]{}()*+!<=:?.\/\\^$|#\s,]/g, '\\$&');
}
Then, we construct a regular expression as an alternation between individual word regexps. For each word, we assert that it starts with a word boundary, ends with a word boundary, and has an even number of quote characters between its end, and the end of string. (Note that from the end of word3, there is only one quote till the end of string, which is odd.)
let text = 'word1, word2, word22, "word3" and word4';
let words = [ 'word1', 'word2', 'word3' , 'word4' ];
let regexp = new RegExp(words.map(word =>
'\\b' + regExpEscape(word) + '\\b(?=(?:[^"]*"[^"]*")*[^"]*$)').join('|'), 'g')
text.match(regexp)
// => word1, word2, word4
while ((m = regexp.exec(text))) {
console.log(m[0], m.index);
}
// word1 0
// word2 7
// word4 34
EDIT: Actually, we can speed the regexp up a bit if we factor out the surrounding conditions:
let regexp = new RegExp(
'\\b(?:' +
words.map(regExpEscape).join('|') +
')\\b(?=(?:[^"]*"[^"]*")*[^"]*$)', 'g')
Your excluding of the quote character is wrong, that's actually matching the beginning of the string followed by a quote. Trying this instead
var re = new RegExp('\\b[^"]' + word + '[^"]\\b', 'i');
Also, this site is amazing to help you debug regex : https://regexpal.com
Edit: Because \b will match on quotation marks, this needs to be tweaked further. Unfortunately javascript doesn't support lookbehinds, so we have to get a little tricky.
var re = new RegExp('(?:^|[^"\\w])' + word + '(?:$|[^"\\w])','i')
So what this is doing is saying
(?: Don't capture this group
^ | [^"\w]) either match the start of the line, or any non word (alphanumeric and underscore) character that isn't a quote
word capture and match your word here
(?: Don't capture this group either
$|[^"\w) either match the end of the line, or any non word character that isn't a quote again

Non-capturing group matching whitespace boundaries in JavaScript regex

I have this function that finds whole words and should replace them. It identifies spaces but should not replace them, ie, not capture them.
function asd (sentence, word) {
str = sentence.replace(new RegExp('(?:^|\\s)' + word + '(?:$|\\s)'), "*****");
return str;
};
Then I have the following strings:
var sentence = "ich mag Äpfel";
var word = "Äpfel";
The result should be something like:
"ich mag *****"
and NOT:
"ich mag*****"
I'm getting the latter.
How can I make it so that it identifies the space but ignores it when replacing the word?
At first this may seem like a duplicate but I did not find an answer to this question, that's why I'm asking it.
Thank you
You should put back the matched whitespaces by using a capturing group (rather than a non-capturing one) with a replacement backreference in the replacement pattern, and you may also leverage a lookahead for the right whitespace boundary, which is handy in case of consecutive matches:
function asd (sentence, word) {
str = sentence.replace(new RegExp('(^|\\s)' + word + '(?=$|\\s)'), "$1*****");
return str;
};
var sentence = "ich mag Äpfel";
var word = "Äpfel";
console.log(asd(sentence, word));
See the regex demo.
Details
(^|\s) - Group 1 (later referred to with the help of a $1 placeholder in the replacement pattern): a capturing group that matches either start of string or a whitespace
Äpfel - a search word
(?=$|\s) - a positive lookahead that requires the end of string or whitespace immediately to the right of the current location.
NOTE: If the word can contain special regex metacharacters, escape them:
function asd (sentence, word) {
str = sentence.replace(new RegExp('(^|\\s)' + word.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + '(?=$|\\s)'), "$1*****");
return str;
};

javascript - regexp exec internal index doesn't progress if first char is not a match

I need to match numbers that are not preceeded by "/" in a group.
In order to do this I made the following regex:
var reg = /(^|[^,\/])([0-9]*\.?[0-9]*)/g;
First part matches start of the string and anything else except "/", second part matches a number. Everything works ok regarding the regex (it matches what I need). I use https://regex101.com/ for testing. Example here: https://regex101.com/r/7UwEUn/1
The problem is that when I use it in js (script below) it goes into an infinite loop if first character of the string is not a number. At a closer look it seems to keep matching the start of the string, never progressing further.
var reg = /(^|[^,\/])([0-9]*\.?[0-9]*)/g;
var text = "a 1 b";
while (match = reg.exec(text)) {
if (typeof match[2] != 'undefined' && match[2] != '') {
numbers.push({'index': match.index + match[1].length, 'value': match[2]});
}
}
If the string starts with a number ("1 a b") all is fine.
The problem appears to be here (^|[^,/]) - removing ^| will fix the issue with infinite loop but it will not match what I need in strings starting with numbers.
Any idea why the internal index is not progressing?
Infinite loop is caused by the fact your regex can match an empty string. You are not likely to need empty strings (even judging by your code), so make it match at least one digit, replace the last * with +:
var reg = /(^|[^,\/])([0-9]*\.?[0-9]+)/g;
var text = "a 1 b a 2 ana 1/2 are mere (55";
var numbers=[];
while (match = reg.exec(text)) {
numbers.push({'index': match.index + match[1].length, 'value': match[2]});
}
console.log(numbers);
Note that this regex will not match numbers like 34. and in that case you may use /(^|[^,\/])([0-9]*\.?[0-9]+|[0-9]*\.)/g, see this regex demo.
Alternatively, you may use another "trick", advance the regex lastIndex manually upon no match:
var reg = /(^|[^,\/])([0-9]*\.?[0-9]+)/g;
var text = "a 1 b a 2 ana 1/2 are mere (55";
var numbers=[];
while (match = reg.exec(text)) {
if (match.index === reg.lastIndex) {
reg.lastIndex++;
}
if (match[2]) numbers.push({'index': match.index + match[1].length, 'value': match[2]});
}
console.log(numbers);

JavaScript RegEx: match all minuses that are not at start or end of string for currency input

For a currency input I want to replace all minus input that is not at the start of the string or, when it is the last character, is not preceded by a comma.
In the input event I'm already calling a replace with a simple regex for some other invalid input:
input.replace(/[^0-9\.\,\-]/g, '')
.replace('.', ',');
It would be great if I could extend this regex to also strip the invalid minuses.
Some examples of desired behavior:
50-50 -> 5050
50,00- -> 50,00
-5-0,- -> -50,-
Edit: double minus at the end or start should also be stripped.
--50,00-> -50,00
50,-- -> 50,-
I figured I could start with a positive lookahead -(?=.), but that still matches the first character.
Additionally, I found this post that pretty much does the opposite (minuses are not allowed at start and end), but that would still match the whole string. Not the sepatate minuses.
Any help would be appreciated.
Use the following approach with specific regex pattern:
var replaceHyphen = function (str) {
return str.replace(/(\d)-|(-)-/g, '$1$2');
};
console.log(replaceHyphen('50-50'));
console.log(replaceHyphen('50,00-'));
console.log(replaceHyphen('-5-0,-'));
console.log(replaceHyphen('--50,00'));
console.log(replaceHyphen('50,--'));
Is a function ok? This should do the trick:
function removeMinus(str) {
var prefix = str.startsWith("-") ? "-" : "";
var postfix = str.endsWith(",-") ? "-" : "";
return prefix + str.split("-").join("") + postfix
}
You could use word boundary \b to do that.
RegExp Boundaries
\b
Matches a word boundary. This is the position where a word character is not followed or preceeded by another word-character, such as between a letter and a space...
https://regex101.com/r/YzCiEx/1
var regex = /\b-+\b/g;
console.log("50-50".replace(regex, ''))
console.log("50,00".replace(regex, ''))
console.log("-5-0,-".replace(regex, ''))
console.log("-5------6-".replace(regex, ''))
console.log("-6--66-6,-".replace(regex, ''))

RegExp match word till space or character

I'm trying to match all the words starting with # and words between 2 # (see example)
var str = "#The test# rain in #SPAIN stays mainly in the #plain";
var res = str.match(/(#)[^\s]+/gi);
The result will be ["#The", "#SPAIN", "#plain"] but it should be ["#The test#", "#SPAIN", "#plain"]
Extra: would be nice if the result would be without the #.
Does anyone has a solution for this?
You can use
/#\w+(?:(?: +\w+)*#)?/g
See the demo here
The regex matches:
# - a hash symbol
\w+ - one or more alphanumeric and underscore characters
(?:(?: +\w+)*#)? - one or zero occurrence of:
(?: +\w+)* - zero or more occurrences of one or more spaces followed with one or more word characters followed with
# - a hash symbol
NOTE: If there can be characters other than word characters (those in the [A-Za-z0-9_] range), you can replace \w with [^ #]:
/#[^ #]+(?:(?: +[^ #]+)*#)?/g
See another demo
var re = /#[^ #]+(?:(?: +[^ #]+)*#)?/g;
var str = '#The test-mode# rain in #SPAIN stays mainly in the #plain #SPAIN has #the test# and more #here';
var m = str.match(re);
if (m) {
// Using ES6 Arrow functions
m = m.map(s => s.replace(/#$/g, ''));
// ES5 Equivalent
/*m = m.map(function(s) {
return s.replace(/#$/g, '');
});*/ // getting rid of the trailing #
document.body.innerHTML = "<pre>" + JSON.stringify(m, 0, 4) + "</pre>";
}
You can also try this regex.
#(?:\b[\s\S]*?\b#|\w+)
(?: opens a non capture group for alternation
\b matches a word boundary
\w matches a word character
[\s\S] matches any character
See demo at regex101 (use with g global flag)

Categories