I'm trying to create a regex that will select the numbers/numbers with commas(if easier, can trim commas later) that do not have a parentheses after and not the numbers inside the parentheses should not be selected either.
Used with the JavaScript's String.match method
Example strings
9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
What i have so far:
/((^\d+[^\(])|(,\d+,)|(,*\d+$))/gm
I tried this in regex101 and underlined the numbers i would like to match and x on the one that should not.
You could start with a substitution to remove all the unwanted parts:
/\d*\(.*?\),?//gm
Demo
This leaves you with
5,10
10,2,5,
10,7,2,4
which makes the matching pretty straight forward:
/(\d+)/gm
If you want it as a single match expression you could use a negative lookbehind:
/(?<!\([\d,]*)(\d+)(?:,|$)/gm
Demo - and here's the same matching expression as a runnable javascript (skeleton code borrowed from Wiktor's answer):
const text = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4`;
const matches = Array.from(text.matchAll(/(?<!\([\d,]*)(\d+)(?:,|$)/gm), x=>x[1])
console.log(matches);
Here, I'd recommend the so-called "best regex trick ever": just match what you do not need (negative contexts) and then match and capture what you need, and grab the captured items only.
If you want to match integer numbers that are not matched with \d+\([^()]*\) pattern (a number followed with a parenthetical substring), you can match this pattern or match and capture the \d+, one or more digit matching pattern, and then simply grab Group 1 values from matches:
const text = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4`;
const matches = Array.from(text.matchAll(/\d+\([^()]*\)|(\d+)/g), x=> x[1] ?? "").filter(Boolean)
console.log(matches);
Details:
text.matchAll(/\d+\([^()]*\)|(\d+)/g) - matches one or more digits (\d+) + ( (with \() + any zero or more chars other than ( and ) (with [^()]*) + \) (see \)), or (|) one or more digits captured into Group 1 ((\d+))
Array.from(..., x=> x[1] ?? "") - gets Group 1 value, or, if not assigned, just adds an empty string
.filter(Boolean) - removes empty strings.
Using several replacement regexes
var textA = `9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
`
console.log('A', textA)
var textB = textA.replace(/\(.*?\),?/g, ';')
console.log('B', textB)
var textC = textB.replace(/^\d+|\d+$|\d*;\d*/gm, '')
console.log('C', textC)
var textD = textC.replace(/,+/g, ' ').trim(',')
console.log('D', textD)
With a loop
Here is a solution which splits the lines on comma and loops over the pieces:
var inside = false;
var result = [];
`9(296,178),5,3(123),10
10,9(296,178),2,5,3(123),3(124,125)
10,7,5(296,293,444,1255),3(218),2,4
`.split("\n").map(line => {
let pieceArray = line.split(",")
pieceArray.forEach((piece, k) => {
if (piece.includes('(')) {
inside = true
} else if (piece.includes(')')) {
inside = false
} else if (!inside && k > 0 && k < pieceArray.length-1 && !pieceArray[k-1].includes(')')) {
result.push(piece)
}
})
})
console.log(result)
It does print the expected result: ["5", "7"]
I'm trying to make the code a lot cleaner and concise. The main goal I want to do is to change the string to my requirements .
Requirements
I want to remove any empty lines (like the one in the middle of the two sentences down below)
I want to remove the * in front of each sentence, if there is.
I want to make the first letter of each word capital and the rest lowercase (except words that have $ in front of it)
This is what I've done so far:
const string =
`*SQUARE HAS ‘NO PLANS’ TO BUY MORE BITCOIN: FINANCIAL NEWS
$SQ
*$SQ UPGRADED TO OUTPERFORM FROM PERFORM AT OPPENHEIMER, PT $185`
const nostar = string.replace(/\*/g, ''); // gets rid of the * of each line
const noemptylines = nostar.replace(/^\s*[\r\n]/gm, ''); //gets rid of empty blank lines
const lowercasestring = noemptylines.toLowerCase(); //turns it to lower case
const tweets = lowercasestring.replace(/(^\w{1})|(\s{1}\w{1})/g, match => match.toUpperCase()); //makes first letter of each word capital
console.log(tweets)
I've done most of the code, however, I want to keep words that have $ in front of it, capital, which I don't know how to do.
Furthermore, I was wondering if its possible to combine regex expression, so its even shorter and concise.
You could make use of capture groups and the callback function of replace.
^(\*|[\r\n]+)|\$\S*|(\S+)
^ Start of string
(\*|[\r\n]*$) Capture group 1, match either * or 1 or more newlines
| Or
\$\S* Match $ followed by optional non whitespace chars (which will be returned unmodified in the code)
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
Regex demo
const regex = /^(\*|[\r\n]+)|\$\S*|(\S+)/gm;
const string =
`*SQUARE HAS ‘NO PLANS’ TO BUY MORE BITCOIN: FINANCIAL NEWS
$SQ
*$SQ UPGRADED TO OUTPERFORM FROM PERFORM AT OPPENHEIMER, PT $185`;
const res = string.replace(regex, (m, g1, g2) => {
if (g1) return ""
if (g2) {
g2 = g2.toLowerCase();
return g2.toLowerCase().charAt(0).toUpperCase() + g2.slice(1);
}
return m;
});
console.log(res);
Making it readable is more important than making it short.
const tweets = string
.replace(/\*/g, '') // gets rid of the * of each line
.replace(/^\s*[\r\n]/gm, '') //gets rid of empty blank lines
.toLowerCase() //turns it to lower case
.replace(/(^\w{1})|(\s{1}\w{1})/g, match => match.toUpperCase()) //makes first letter of each word capital
.replace(/\B\$(\w+)\b/g, match => match.toUpperCase()); //keep words that have $ in front of it, capital
I have problem with simple rexex. I have example strings like:
Something1\sth2\n649 sth\n670 sth x
Sth1\n\something2\n42 036 sth\n42 896 sth y
I want to extract these numbers from strings. So From first example I need two groups: 649 and 670. From second example: 42 036 and 42 896. Then I will remove space.
Currently I have something like this:
\d+ ?\d+
But it is not a good solution.
You can use
\n\d+(?: \d+)?
\n - Match new line
\d+ - Match digit from 0 to 9 one or more time
(?: \d+)? - Match space followed by digit one or more time. ( ? makes it optional )
let strs = ["Something1\sth2\n649 sth\n670 sth x","Sth1\n\something2\n42 036 sth\n42 896 sth y"]
let extractNumbers = str => {
return str.match(/\n\d+(?: \d+)?/g).map(m => m.replace(/\s+/g,''))
}
strs.forEach(str=> console.log(extractNumbers(str)))
If you need to remove the spaces. Then the easiest way for you to do this would be to remove the spaces and then scrape the numbers using 2 different regex.
str.replace(/\s+/, '').match(/\\n(\d+)/g)
First you remove spaces using the \s token with a + quantifier using replace.
Then you capture the numbers using \\n(\d+).
The first part of the regex helps us make sure we are not capturing numbers that are not following a new line, using \ to escape the \ from \n.
The second part (\d+) is the actual match group.
var str1 = "Something1\sth2\n649 sth\n670 sth x";
var str2 = "Sth1\n\something2\n42 036 sth\n42 896 sth y";
var reg = /(?<=\n)(\d+)(?: (\d+))?/g;
var d;
while(d = reg.exec(str1)){
console.log(d[2] ? d[1]+d[2] : d[1]);
}
console.log("****************************");
while(d = reg.exec(str2)){
console.log(d[2] ? d[1]+d[2] : d[1]);
}
I'm working on a lab assignment for a web applications class and am stuck on implementing a word counter for a basic HTML webpage. The setup of the tests and HTML are already done for us. I simply need to write a function called countWords that takes a string and returns the number of words. It works differently from your traditional word counter though. A word is defined as anything A-Z. Everything else is considered not part of a word. So, if the string is just "234##$^" then the word count is 0. So, I'm not just counting white space like most word counters. All the answers I've found on StackOverflow to similar questions try to just count white space and don't work for my situation. Hence why I made a new question.
My idea was to have a return statement that matches any grouping of a-z using a regular expression and return the length. Then, have a conditional to check for the empty string or string with no letters a-z.
function countWords(s) {
if(s === "" || s === "%$#^23#") {
return 0
}
return s.match(/[^a-z]/gi).length
}
Right now the if statement is just matching the two test cases so that I can pass my tests. I'm not sure how to go about writing another match regular expression to check for no letters in the string or the empty string. Any help is appreciated! Been stuck for a while.
const str1 = '%$#^23#';
const str2 = 'String with ___ special characters and #$&# white spaces !!!';
const str3 = 'Special &$%# characters --> and %$#^5# connected,words but our <++##||++> function,still_works!';
const wordCount = (str) => str.replace(/[\W_\d]/g,' ').split(' ').filter(Boolean).length;
console.log(wordCount(str1)); // 0
console.log(wordCount(str2)); // 7
console.log(wordCount(str3)); // 11
use "regex" to replace all special characters, underscores, numbers, and extra white spaces with an empty space
--> replace(/[\W_\d]/g,' ')
convert the string into an array
--> .split(' ')
use filter to remove all empty string(s) in the array
--> .filter(Boolean)
then, get the word count with "length"
--> .length
You first need to filter the string, remove all the special characters and numbers:
var filtered_test = my_text.replace(/[^a-zA-Z ]/g, '');
then do a normal split and count:
var words = filtered_test.split(" ");
console.log(words.length); //prints out the count of words
You can use a functional replace method to chunk all of the "words" into an array, then simply return the array length. This has the added benefit of providing a 0 count:
explanatory version:
function countWords(str, words = []) {
str.replace(/[A-Z]+/gi, (m) => words.push(m));
return words.length;
}
minimal version:
let countWords = (str, words = []) =>
( str.replace(/[A-Z]+/gi, (m) => words.push(m)), words.length );
let countWords = (str, words = []) => (str.replace(/[A-Z]+/gi, (m) => words.push(m)), words.length);
console.log( "##asdfadf###asfadf: " + countWords("##asdfadf###asfadf") )
console.log("##13424#$#$#$%: " + countWords("##13424#$#$#$%"));
How about this regular expression: /.*?[a-z]+.*?(\s|$)/gi
Use return s.match(/.*?[a-z]+.*?(\s|$)/gi).length
Anything with at least 1 letter in it is counted. Then the phrase O##ne two $#!+ ##%Three four^&&$ five would count as 5 words.
Edit: If you want to be evil to pass your test cases when there are 0 matches use (input.match(/.*?[a-z]+.*?(\s|$)/gi) || "").length
How can I get the strings between last 2 slashes in regex in javascript?
for example:
stackoverflow.com/questions/ask/index.html => "ask"
http://regexr.com/foo.html?q=bar => "regexr.com"
https://www.w3schools.com/icons/default.asp => "icons"
You can use /\/([^/]+)\/[^/]*$/; [^/]*$ matches everything after the last slash, \/([^/]+)\/ matches the last two slashes, then you can capture what is in between and extract it:
var samples = ["stackoverflow.com/questions/ask/index.html",
"http://regexr.com/foo.html?q=bar",
"https://www.w3schools.com/icons/default.asp"]
console.log(
samples.map(s => s.match(/\/([^/]+)\/[^/]*$/)[1])
)
You can solve this by using split().
let a = 'stackoverflow.com/questions/ask/index.html';
let b = 'http://regexr.com/foo.html?q=bar';
let c = 'https://www.w3schools.com/icons/default.asp';
a = a.split('/')
b = b.split('/')
c = c.split('/')
indexing after split()
console.log(a[a.length-2])
console.log(b[b.length-2])
console.log(c[c.length-2])
I personally do not recommend using regex. Because it is hard to maintain
I believe that will do:
[^\/]+(?=\/[^\/]*$)
[^\/]+ This matches all chars other than /. Putting this (?=\/[^\/]*$) in the sequence looks for the pattern that comes before the last /.
var urls = [
"stackoverflow.com/questions/ask/index.html",
"http://regexr.com/foo.html?q=bar",
"https://www.w3schools.com/icons/default.asp"
];
urls.forEach(url => console.log(url.match(/[^\/]+(?=\/[^\/]*$)/)[0]));
You can use (?=[^/]*\/[^/]*$)(.*?)(?=\/[^/]*$). You can test it here: https://www.regexpal.com/
The format of the regex is: (positive lookahead for second last slash)(.*?)(positive lookahead for last slash).
The (.*?) is a lazy match for what's between the slashes.
references:
Replace second to last "/" character in URL with a '#'
RegEx that will match the last occurrence of dot in a string