Regex for first number after a number and period - javascript

I have an ordered list with names and addresses that is structured like:
1. Last, First 123 Main St buncha
buncha buncha
2. Lasta, Firsta 234 Lane St etc etc
So I need a regex that finds the number that immediately follows the number with a period. So in this case an array containing [123, 234]. I have a couple of patterns I've tried. The one that I think is the closest is
/(?![0-9]+\.)[0-9]+/gim;
unfortunately this just returns every number, but i think its in the right area. Any help would be appreciated.

Use a positive lookbehind to explicitly match the number, period, and the text between that and the number in the address.
const string = `1. Last, First 123 Main St buncha
buncha buncha
2. Lasta, Firsta 234 Lane St etc etc`;
let regex = /(?<=^\d+\.\D*)\d+/gm;
console.log(string.match(regex));

Something like this
const source =
`1. Last, First 123 Main St buncha
buncha buncha
2. Lasta, Firsta 234 Lane St etc etc`;
const result = source.match(/(?<=(\d+\..+))\d+/gm);
console.log(result);

You could also use a capturing group
^\d+\.\D*(\d+)
Explanation
^ Start of string
\d+\. Match 1+ digits and .
\D* Match 0+ any char except a digit
(\d+) Capture group 1, match 1+ digits
Regex demo
const regex = /^\d+\.\D*(\d+)/gm;
const str = `1. Last, First 123 Main St buncha
buncha buncha
2. Lasta, Firsta 234 Lane St etc etc`;
let res = Array.from(str.matchAll(regex), m => m[1]);
console.log(res);

Related

Regex return only last number after special char

I have totally 5 inputs like this,
A-01
AB - A2-01:xyz word (*)
AB - A3-02a:xyz word (*):xyz word (*)
AB - A-01:xyz word (*)
xyz-word (*)
Trying to get only last number after the special character -
Expected result after applying regex:
01
01
02
01
empty(just return empty)
Regex:
const input2 = 'AB - A2-01:xyz word (*)';
console.log(input.match("[^-]+$")[0]);
input.match("[^-]+$")[0] regex works for first input A-01. It returns 01.
For second input it prints as,
01:xyz word (*).
After the number, I don't need the rest of the characters.
I am trying this in Angular TypeScript using regex.
This will capture the first number after the last -:
const inputs = ['A-01',
'AB - A2-01:xyz word (*)',
'AB - A3-02a:xyz word (*):xyz word (*)',
'AB - A-01:xyz word (*)',
'xyz-word (*)'];
for (const input of inputs) {
m = input.match(/^.*-[^\d]*(\d+)/);
if (m) {
console.log(m[1]);
}
}
The regex explained:
^ - start of line anchor
.* - any character 0 or more times (greedy)
- - a literal -
[^\d]* - any non-digit, 0 or more times (greedy)
( - start of capture group
\d+ - any digit, 1 or more times
) - end of capture group
You'll then find the captured number in m[1] (if any - check m first).
If you instead want an array where the non-matching entries are "empty" (undefined), you could map the inputs using an arrow function expression and use an empty () capture for the non-matches.
const inputs = ['A-01',
'AB - A2-01:xyz word (*)',
'AB - A3-02a:xyz word (*):xyz word (*)',
'AB - A-01:xyz word (*)',
'xyz-word (*)'];
a = inputs.map(input => input.match(/^.*-[^\d]*(\d+)|()/)[1]);
console.log(a);
Disclaimer: This is my first javascript ever so I may have done it in a cumbersome, non-idiomatic, way

Regex, Find any number except for the number preceded by a letter

I want to find all numbers except those preceded by English letter
example one: test123 I don't want 123 to match.
example two: another 123 I want 123 to match.
example three: try other solutions 123 I want 123 to match.
I tried many and no one get the desired result, last one was
let reg = /((?<![a-zA-Z])[0-9]){1,}/g;
but it just ignore this first number I want to ignore all
example : test123 - it ignored 1 but take 23 , the desired result is ignore 123
I tried this regex but did not work as well
let reg = /((?<![a-zA-Z])[0-9]){1,}/g;
and the result must ignore all digits number after English letter
You can use
const reg = /(?<![a-zA-Z\d]|\d\.)\d+(?:\.\d+)?/g;
See the regex demo. Details:
(?<![a-zA-Z\d]|\d\.) - a negative lookbehind that fails the match if there is a letter/digit or a digit followed with a dot immediately to the left of the current location
\d+(?:\.\d+)? - one or more digits followed with an optional sequence of a . and one or more digits.
JavaScript demo:
const text = "test123\ntest 456\ntest123.5\ntest 456.5";
const reg = /(?<![a-zA-Z\d]|\d\.)\d+(?:\.\d+)?/g;
console.log(text.match(reg)); // => ["456","456.5"]
For environments not supporting ECMAScript 2018+ standard:
var text = "test123\ntest 456\ntest123.5\ntest 456.5";
var reg = /([a-zA-Z])?\d+(?:\.\d+)?/g;
var results = [], m;
while(m = reg.exec(text)) {
if (m[1] === undefined) {
results.push(m[0]);
}
}
console.log(results); // => ["456","456.5"]

Regex to extract search terms is not working as expected

I have the test string
ti: harry Potter OR kw: magic AND sprint: title OR ti: HARRY
and want the output as
["ti: harry Potter OR kw:", "kw: magic AND sprint:", "sprint: title OR ti:", "ti: HARRY"]
but the output I am getting is
["ti: harry Potter OR kw:", "kw: magic AND sprint:", "nt: title OR ti:", "ti: HARRY"]
It is taking only 2 characters before the colon
The regex I am using is
const match = /[a-z0-9]{2}:.*?($|[a-z0-9]{2}:)/g;
and I am extracting it and putting it in an array
I tried replacing it with /[a-z0-9]+:.*?($|[a-z0-9]+:)/g; but when I increase index and add the strings to parsed, it does it weirdly (This is included in code as well)
I tried changing the {2} to n and that is also not working as expected.
const parsed = [];
const match = /[a-z0-9]{2}:.*?($|[a-z0-9]{2}:)/g;
const message = "ti: harry Potter OR kw: magic AND sprint: title OR ti: HARRY";
let next = match.exec(message);
while (next) {
parsed.push(next[0]);
match.lastIndex = next.index + 1;
next = match.exec(message);
console.log("next again", next);
}
console.log("parsed", parsed);
https://codesandbox.io/s/regex-forked-6op514?file=/src/index.js
For the desired matches, you might use a pattern where you would also optionally match AND or OR and get the match in capture group 1, which is denoted be m[1] in the example code.
\b(?=([a-z0-9]+:.*?(?: (?:AND|OR) [a-z0-9]+:|$)))
In parts, the pattern matches:
\b A word boundary to prevent a partial match
(?= Positive lookahead to assert what is on the right is
( Capture group 1
[a-z0-9]+:
.*? Match any char except a newline as least as possible
(?: Non capture group
(?:AND|OR) [a-z0-9]+: Match either AND or OR followed by a space and 1+ times a char a-z0-9 and :
| Or
$ Assert the end of the string
) Close non capture group
) Close group 1
) Close the lookahead
See a regex demo.
const regex = /\b(?=([a-z0-9]+:.*?(?: (?:AND|OR) [a-z0-9]+:|$)))/gm;
const str = `ti: harry Potter OR kw: magic AND sprint: title OR ti: HARRY`;
const result = Array.from(str.matchAll(regex), m => m[1]);
console.log(result);

What will be the regular expression for below requirement in javascript

Criteria:
any word that start with a and end with b having middle char digit. this word should not be on the line which start with char '#'
Given string:
a1b a2b a3b
#a4b a5b a6b
a7b a8b a9b
Expected output:
a1b
a2b
a3b
a7b
a8b
a9b
regex: ?i need it for javascipt.
So far tried below thing:
var text_content =above_mention_content
var reg_exp = /^[^#]?a[0-9]b/gmi;
var matched_text = text_content.match(reg_exp);
console.log(matched_text);
Getting below output:
[ 'a1b', ' a7b' ]
Your /^[^#]?a[0-9]b/gmi will match multiple occurrences of the pattern matching the start of line, then 1 or 0 chars other than #, then a, digit and b. No checking for a whole word, nor actually matching words farther than at the beginning of a string.
You may use a regex that will match lines starting with # and match and capture the words you need in other contexts:
var s = "a1b a2b a3b\n#a4b a5b a6b\n a7b a8b a9b";
var res = [];
s.replace(/^[^\S\r\n]*#.*|\b(a\db)\b/gm, function($0,$1) {
if ($1) res.push($1);
});
console.log(res);
Pattern details:
^ - start of a line (as m multiline modifier makes ^ match the line start)
[^\S\r\n]* - 0+ horizontal whitespaces
#.* - a # and any 0+ chars up to the end of a line
| - or
\b - a leading word boundary
(a\db) - Group 1 capturing a, a digit, a b
\b - a trailing word boundary.
Inside the replace() method, a callback is used where the res array is populated with the contents of Group 1 only.
I would suggest to use 2 reg ex:
First Reg ex fetches the non-hashed lines:
^[^#][a\db\s]+
and then another reg ex for fetching individual words(from each line):
^a\db\s

Retrieve BSR and category from string with RegExp

When I parse Amazon products I get this such of string.
"#19 in Home Improvements (See top 100)"
I figured how to retrieve BSR number which is /#\d*/
But have no idea how to retrieve Category which is going after in and end until brackets (See top 100).
I suggest
#(\d+)\s+in\s+([^(]+?)\s*\(
See the regex demo
var re = /#(\d+)\s+in\s+([^(]+?)\s*\(/;
var str = '#19 in Home Improvements (See top 100)';
var m = re.exec(str);
if (m) {
console.log(m[1]);
console.log(m[2]);
}
Pattern details:
# - a hash
(\d+) - Group 1 capturing 1 or more digits
\s+in\s+ - in enclosed with 1 or more whitespaces
([^(]+?) - Group 2 capturing 1 or more chars other than ( as few as possible before th first...
\s*\( - 0+ whitespaces and a literal (.

Categories