extract all the numbers from a string in a javascript - javascript

I want all the proper natural numbers from a given string,
var a = "#1234abc 12 34 5 67 sta5ck over # numbrs ."
numbers = a.match(/d+/gi)
in the above string I should only match the numbers 12, 34, 5, 67, not 1234 from the first word 5 etc..
so numbers should be equal to [12,34,5,67]

Use word boundaries,
> var a = "#1234abc 12 34 5 67 sta5ck over # numbrs ."
undefined
> numbers = a.match(/\b\d+\b/g)
[ '12', '34', '5', '67' ]
Explanation:
\b Word boundary which matches between a word charcter(\w) and a non-word charcter(\W).
\d+ One or more numbers.
\b Word boundary which matches between a word charcter and a non-word charcter.
OR
> var myString = '#1234abc 12 34 5 67 sta5ck over # numbrs .';
undefined
> var myRegEx = /(?:^| )(\d+)(?= |$)/g;
undefined
> function getMatches(string, regex, index) {
... index || (index = 1); // default to the first capturing group
... var matches = [];
... var match;
... while (match = regex.exec(string)) {
..... matches.push(match[index]);
..... }
... return matches;
... }
undefined
> var matches = getMatches(myString, myRegEx, 1);
undefined
> matches
[ '12', '34', '5', '67' ]
Code stolen from here.

If anyone is interested in a proper regex solution to match digits surrounded by space characters, it is simple for languages that support lookbehind (like Perl and Python, but not JavaScript at the time of writing):
(?<=^|\s)\d+(?=\s|$)
Debuggex PCRE Demo
As illustrated in the accepted answer, in languages that don't support lookbehind, it is necessary to use a hack, e.g. to include the 1st space in the match, while keepting the important stuff in a capturing group:
(?:^|\s)(\d+)(?=\s|$)
Debuggex JavaScript Demo
Then you just need to extract that capturing group from the matches, see e.g. this answer to How do you access the matched groups in a JavaScript regular expression?

Related

How to change given string to regex modified string using javascript

Example strings :
2222
333333
12345
111
123456789
12345678
Expected result:
2#222
333#333
12#345
111
123#456#789
12#345#678
i.e. '#' should be inserted at the 4th,8th,12th etc last position from the end of the string.
I believe this can be done using replace and some other methods in JavaScript.
for validation of output string i have made the regex :
^(\d{1,3})(\.\d{3})*?$
You can use this regular expression:
/(\d)(\d{3})$/
this will match and group the first digit \d and group the last three \d{3} which are then grouped in their own group. Using the matched groups, you can then reference them in your replacement string using $1 and $2.
See example below:
const transform = str => str.replace(/(\d)(\d{3})$/, '$1#$2');
console.log(transform("2222")); // 2#222
console.log(transform("333333")); // 333#333
console.log(transform("12345")); // 12#345
console.log(transform("111")); // 111
For larger strings of size N, you could use other methods such as .match() and reverse the string like so:
const reverse = str => Array.from(str).reverse().join('');
const transform = str => {
return reverse(reverse(str).match(/(\d{1,3})/g).join('#'));
}
console.log(transform("2222")); // 2#222
console.log(transform("333333")); // 333#333
console.log(transform("12345")); // 12#345
console.log(transform("111")); // 111
console.log(transform("123456789")); // 123#456#789
console.log(transform("12345678")); // 12#345#678
var test = [
'111',
'2222',
'333333',
'12345',
'123456789',
'1234567890123456'
];
console.log(test.map(function (a) {
return a.replace(/(?=(?:\B\d{3})+$)/g, '#');
}));
You could match all the digits. In the replacement insert an # after every third digit from the right using a positive lookahead.
(?=(?:\B\d{3})+$)
(?= Positive lookahead, what is on the right is
(?:\B\d{3})+ Repeat 1+ times not a word boundary and 3 digits
$ Assert end of string
) Close lookahead
Regex demo
const regex = /^\d+$/;
["2222",
"333333",
"12345",
"111",
"123456789",
"12345678"
].forEach(s => console.log(
s.replace(/(?=(?:\B\d{3})+$)/g, "#")
));

Select the first character before a forward slash but after any space

I have the following string patterns which I need to match as described.
I need only the first char/digit on each of the following examples. Before the '/' and after any space:
12/5 <--match on 1
x23/4.5 match on x
234.5/7 match on 2
2 - 012.3/4 match on 0
regex something like the following is obviously not enough:
\d(?=\d\/))
To make Clear
I'm actauly using the regex with js split so it's some mpping function which takes each one of the strings and split it on the match. So for example 2 - 012.3/4 would be split to [ 2 - 0, 12.3/4] and 12/5 to 1, [2/5] and so on.
See example (with non working regex) here:
https://regex101.com/r/N1RbGp/1
Try a regular expression like this:
(?<=^|\s)[a-zA-Z0-9](?=[^\s]*[/])
Breaking it down:
(?<=^|\s) is a zero-width (non-capturing) positive lookbehind that ensures
that the match will begin only immediately following start-of-text or a
whitespace character.
[a-zA-Z0-9] matches a single letter or digit.
(?=\S*[/]) is a zero-width (non-capturing) positive lookahead that requires
the matched letter or digit to be followed by zero or more non-whitespace characters and a solidus ('/') character.
Here's the code:
const texts = [
'12/5',
'x23/4.5',
'234.5/7',
'2 - 012.3/4',
];
texts.push( texts.join(', ') );
for (const text of texts) {
const rx = /(?<=^|\s)[a-zA-Z0-9](?=\S*[/])/g;
console.log('');
console.group(`text: '${text}'`);
for(let m = rx.exec(text) ; m ; m = rx.exec(text) ) {
console.log(`matched '${m[0]}' at offset ${m.index} in text.`);
}
console.groupEnd();
}
This is the output:
text: '12/5'
matched '1' at offset 0 in text.
text: 'x23/4.5'
matched 'x' at offset 0 in text.
text: '234.5/7'
matched '2' at offset 0 in text.
text: '2 - 012.3/4'
matched '0' at offset 4 in text.
text: '12/5, x23/4.5, 234.5/7, 2 - 012.3/4'
matched '1' at offset 0 in text.
matched 'x' at offset 6 in text.
matched '2' at offset 15 in text.
matched '0' at offset 28 in text.
The first group in this regex matches the character you're asking for:
([^\s])[^\s]*/
You could also just use:
[^\s]+/
And use the first character of the match (or perhaps you need the rest anyway).
If you want to be able to scan the whole document:
/(?<=(^|\s))\S(?=\S*\/)/g
https://regex101.com/r/rN08sP/1
s = `12/5
x23/4.5
234.5/2
534/5.6
- 49.55/6.5
234.5/7`;
console.log(s.match(/(?<=(^|\s))\S(?=\S*\/)/g));
But if you want to extract that character in a short string: (did you mean there is a space in front?)
It'd be /\s(\S)\S*\//
const arr = [
" 12/5",
" x23/4.5",
" 234.5/7",
" 2 - 012.3/4"
];
arr.forEach(s => {
let result = s.match(/\s(\S)\S*\//);
if (result)
console.log("For", s, "result: ", result[1])
});
But if "beginning of line" is ok... so no space is needed in front, then /(^|\s)(\S)\S*\//:
const arr = [
"12/5",
"x23/4.5",
"234.5/7",
"2 - 012.3/4"
];
arr.forEach(s => {
let result = s.match(/(^|\s)(\S)\S*\//);
if (result)
console.log("For", s, "result: ", result[2])
});
But actually, if you don't mean literally a space but just boundary in general:
const arr = [
"12/5",
"x23/4.5",
"234.5/7",
"2 - 012.3/4"
];
arr.forEach(s => {
let result = s.match(/\b(\S)\S*\//);
if (result)
console.log("For", s, "result: ", result[1])
});

Regex: How to match digits not followed by any characters but allow characters after space?

Need help making a regex match with these criteria(pardon me for my possibly confusing phrasing).
Only match if starts with a number or dot
Match number, dot, and whitespace in between
Match until first space if nondigits follow that space
If only numbers follow the space then match it
If any characters except a dot, whitespace, or number follow a number then return null.
So far I've gotten this, but it still allows special characters to follow the numbers after.
/^[0-9\.][0-9\.\s]+(?!\w)/
Sample results
"1500" should return "1500"
"1500 0" should return "1500 0"
"1500 a" should return "1500"
"1500&" SHOULD return null, but so far returns "1500"
"1500a" should return null, as it should.
You may use
/^[\d.][\d\s.]*(?!\S)/
See the regex demo and the regex graph:
Details
^ - start of string
[\d.] - a digit or a dot
[\d\s.]* - 0 or more digits, whitespaces, dots, as many as possible
(?!\S) - followed with a whitespace or end of string.
JS demo:
var strs = ['1500 0', '1500 a', '1500&', '1500a'];
var rx = /^[\d.][\d\s.]*(?!\S)/;
for (var i=0; i < strs.length; i++) {
var m = strs[i].match(rx);
if (m) {
console.log(strs[i], "=>", m[0]);
} else {
console.log(strs[i], "=> NO MATCH");
}
}
You can try the following regex
^[\d\.][0-9\.]+((\s(?=\w)\d*)|$)
Explanation:
^ start of the string
[\d\.] match a char of number of digits or dot
[0-9\.]+ match any number of digits or dots
(\s(?=\w)\d*) match white space, look-ahead of alphanumeric chars and 0 or more occurrence of digits
|$ or end of the string if not match condition no 4.
JS Example:
let match = null, pattern = /^[\d\.][0-9\.]+((\s(?=\w)\d*)|$)/;
match = '1500 0'.match(pattern) || [null]
console.log(match[0])
match = '1500'.match(pattern) || [null]
console.log(match[0])
match = '1500&'.match(pattern) || [null]
console.log(match[0])
match = '1500 a'.match(pattern) || [null]
console.log(match[0])
match = '1500a'.match(pattern) || [null]
console.log(match[0])

Recursively patten js

I want to check a recursively text that verufy three rules.
1º: All the string should be a sequence of numbers between 0-31 + a dot .
Example: 1.23.5.12
2º: The string can't begin or end with a dot.
Like this.
.1.23.5.12.
3º You can write a max of 51 digits (following the previous rules)
I tried to make a pattern to my js function. But this dont work.
This is my function:
var str = document.getElementById("numero").value;
var patt1 = /^[0-9]+\./g;
var result = str.match(patt1);
document.getElementById("demo").innerHTML = result;
What is wrong in the pattern?
You may use
/^(?!(?:\D*\d){52})(?:[12]?\d|3[01])(?:\.(?:[12]?\d|3[01]))*$/
See the regex demo
Details
^ - start of string
(?!(?:\D*\d){52}) - fail if there are 52 or more digits separated with any 0+ non-digits
(?:[12]?\d|3[01]) - 1 or 2 (optional) followed with any single digit or 3 followed with 0 or 1 (0 - 31)
(?:\.(?:[12]?\d|3[01]))* - zero or more consecutive repetitions of
\. - dot
(?:[12]?\d|3[01]) - see above (0 - 31)
$ - end of string.
Use it with test:
if (/^(?!(?:\D*\d){52})(?:[12]?\d|3[01])(?:\.(?:[12]?\d|3[01]))*$/.test(str)) {
// Valid!
}
Test:
var rx = /^(?!(?:\D*\d){52})(?:[12]?\d|3[01])(?:\.(?:[12]?\d|3[01]))*$/;
var strs = [".12", "123", "1.23.5.12", "12345678"];
for (var s of strs) {
console.log(s, "=>", rx.test(s));
}
The regex ^[0-9]+\. matches from the start of the string ^ one or more digits [0-9]+ followed by a dot \.
You might use:
^(?!(\.?\d){52})(?:[0-9]|[12][0-9]|3[01])(?:\.(?:[0-9]|[12][0-9]|3[01]))+$
Explanation
^ Assert the start of the line
(?!(\.?\d){52}) Negative lookahead to assert that what follows is not 52 times an optional dot followed by one or more digits
(?:[0-9]|[12][0-9]|3[01]) Match a number 0 - 31
(?:\.(?:[0-9]|[12][0-9]|3[01]))+ Repeat in a group matching a dot followed by a number 0 - 31 and repleat that one or more times so that a single digit wihtout a dot does not match
$ Assert the end of the string
const strings = [
'1.23.5.12',
'1.23.5.12.',
'.1.23.5.12.',
'1.23.5.12',
'1',
'1.23.5.12.1.23.5.1.23.5.12.1.23.5.1.23.5.12.1.23.5.1.23.5.12.1.23.5.1.23.5.12.1.23.5.2',
'1.23.5.12.1.23.5.12.1.23.5.12.1.23.5.12.1.23.5.12.1.23.5.12.1.23.5.12.1.23.5.12.1.23.5.12'
];
let pattern = /^(?!(\.?\d){52})(?:[0-9]|[12][0-9]|3[01])(?:\.(?:[0-9]|[12][0-9]|3[01]))+$/;
strings.forEach((s) => {
console.log(s + " ==> " + pattern.test(s));
});

Splitting words in a variety of ways with Regex

Working on something similar to Solr's WordDelimiterFilter, but not in Java.
Want to split words into tokens like this:
P90X = P, 90, X (split on word/number boundary)
TotallyCromulentWord = Totally, Cromulent, Word (split on lowercase/uppercase boundary)
TransAM = Trans, AM
Looking for a general solution, not specific to the above examples. Preferably in a regex flavour that doesn't support lookbehind, but I can use PL/perl if necessary, which can do lookbehind.
Found a few answers on SO, but they all seemed to use lookbehind.
Things to split on:
Transition from lowercase letter to upper case letter
Transition from letter to number or number to letter
(Optional) split on a few other characters (- _)
My main concern is 1 and 2.
That's not something I'd like to do without lookbehind, but for the challenge, here is a javascript solution that you should be able to easily convert into whatever language:
function split(s) {
var match;
var result = [];
while (Boolean(match = s.match(/([A-Z]+|[A-Z]?[a-z]+|[0-9]+|([^a-zA-Z0-9])+)$/))) {
if (!match[2]) {
//don't return non alphanumeric tokens
result.unshift(match[1]);
}
s = s.substring(0, s.length - match[1].length);
}
return result;
}
Demo:
P90X [ 'P', '90', 'X' ]
TotallyCromulentWord [ 'Totally', 'Cromulent', 'Word' ]
TransAM [ 'Trans', 'AM' ]
URLConverter [ 'URL', 'Converter' ]
Abc.DEF$012 [ 'Abc', 'DEF', '012' ]
This regex should split into tokens all the words in a paragraph, or string.
Even works for the simple case in you're example.
Match globally. Also, if you want to add other specific delimiters that can be done as well.
# /(?:[A-Z]?[a-z]+(?=[A-Z\d]|[^a-zA-Z\d]|$)|[A-Z]+(?=[a-z\d]|[^a-zA-Z\d]|$)|\d+(?=[a-zA-Z]|[^a-zA-Z\d]|$))[^a-zA-Z\d]*|[^a-zA-Z\d]+/
(?:
[A-Z]? [a-z]+
(?= [A-Z\d] | [^a-zA-Z\d] | $ )
|
[A-Z]+
(?= [a-z\d] | [^a-zA-Z\d] | $ )
|
\d+
(?= [a-zA-Z] | [^a-zA-Z\d] | $ )
)
[^a-zA-Z\d]*
|
[^a-zA-Z\d]+

Categories