Using regex to split string in javascript - javascript

I'd like to split my string so that "Hello the cost 12.50 Hello this item is 7.30" would become ["Hello the cost is 12.50", "Hello this item is 7.30"]. I started off by first finding in the string what matches the 12.50 and 7.30 (floats), but can't seem to figure out how to split it by that number.

Use the regex pattern (.*? \d+(?:\.\d+)?)\s*, and find all matches:
var re = /(.*? \d+(?:\.\d+)?)\s*/g;
var s = 'Hello the cost 12.50 Hello this item is 7.30';
var m;
do {
m = re.exec(s);
if (m) {
console.log(m[1]);
}
} while (m);

This might be a RegExp you are looking for:
'The price is 9.50. Another price is 22.74'.match(/(?=^|\.\s)\D+\d+(\.\d+)?(?=.|$)/gmu)
What this tells to JS RegExp engine:
Dear Engine,
Please, find something directly preceded by start of string or a dot & a space without including it in result.
After that there should be one or more non-numbers.
Then there should be one or more numbers that might be followed by a dot and one or more numbers.
Finally, all that should be located just before end of string or a dot. Please, include neither in the result.
Search for this pattern globally, in multiline mode & be aware of any unicode characters should there be any in the search string.

Related

I need some help for a specific regex in javascript

I try to set a correct regex in my javascript code, but I'm a bit confused with this. My goal is to find any occurence of "rotate" in a string. This should be simple, but in fact I'm lost as my "rotate" can have multiple endings! Here are some examples of what I want to find with the regex:
rotate5
rotate180
rotate-1
rotate-270
The "rotate" word can be at the begining of my string or at the end, or even in the middle separated by spaces from other words. The regex will be used in a search-and-replace function.
Can someone help me please?
EDIT: What I tried so far (probably missing some of them):
/\wrotate.*/
/rotate.\w*/
/rotate.\d/
/\Srotate*/
I'm not fully understanding the regex mechanic yet.
Try this regex as a start. It will return all occurrences of a "rotate" string where a number (positive or negative) follows the "rotate".
/(rotate)([-]?[0-9]*)/g
Here is sample code
var aString = ["rotate5","rotate180","rotate-1","some text rotate-270 rotate-1 more text rotate180"];
for (var x = 0; x < 4; x++){
var match;
var regex = /(rotate)([-]?[0-9]*)/g;
while (match = regex.exec(aString[x])){
console.log(match);
}
}
In this example,
match[0] gives the whole match (e.g. rotate5)
match[1] gives the text "rotate"
match[2] gives the numerical text immediately after the word "rotate"
If there are multiple rotate stings in the string, this will return them all
If you just need to know if the 'word' is in the string so /rotate/ simply will be OK.
But if you want some matching about what coming before or after the #mseifert will be good
If you just want to replace the word rotate by another one
you can just use the string method String.replace use it like var str = "i am rotating with rotate-90"; str.repalace('rotate','turning')'
WHy your regex doesnt work ?
/\wrotate.*/
means that the string must start with a caracter [a-zA-Z0-9_] followed by rotate and another optional character
/rotate.\w*/
meanse rotate must be followed by a character and others n optional character
...............
Using your description:
The "rotate" word can be at the beginning of my string or at the end, or even in the middle separated by spaces from other words. The regex will be used in a search-and-replace function.
This regex should do the work:
const regex = /(^rotate|rotate$|\ {1}rotate\ {1})/gm;
You can learn more about regular expressions with these sites:
http://www.regular-expressions.info
regex101.com and btw here is an example using your requirements.

Capturing parentheses - /(\d)/ ? or /\s*;\s*/?

I am reading about split and below is a variable looking at the string values. However I do not understand what the symbols are looking for.
According to the page: If separator contains capturing parentheses, matched results are returned in the array.
var myString = 'Hello 1 word. Sentence number 2.';
var splits = myString.split(/(\d)/);
console.log(splits);
// Results
[ "Hello ", "1", " word. Sentence number ", "2", "." ]
My question is, what is happening here? Parentheses "(" or ")" is not part of the string. Why is space or "." separated for some and not the other?
Another one is /\s*;\s*
States it removes semi-colon before and after if there are 0 or more space. Does this mean /\s* mean it looks for a space and remove and ';' in this case is the separator?
var names = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ';
console.log(names);
var re = /\s*;\s*/;
var nameList = names.split(re);
console.log(nameList);
// Results
["Harry Trump", "Fred Barney", "Helen Rigby", "Bill Abel", "Chris Hand "]
If so why doesn't /\s*^\s*/ remobe space before and after ^ symbol if my string looked like this.
var names = 'Harry Trump ^Fred Barney^ Helen Rigby ^ Bill Abel ^Chris Hand ';
console.log(names);
var re = /\s*^\s*/;
var nameList = names.split(re);
console.log(nameList);
I would like to know what the symbols mean and why they are in certain order. Thanks you.
It seems you got your examples from here.
First let's look at this one /(\d)/.
Working inside out, recognize that \d escapes all digits.
Now, from the article, wrapping the parentheses around the escape tells the split method to keep the delimiter (which in this case is any digit) in the returned array. Notice that without the parentheses, the returned array wouldn't have numeric elements (as strings of course). Lastly, it is wrapped in slashes (//) to create a regular expression. Basically this case says: split the string by digits and keep the digits in the returned array.
The second case /\s*;\s* is a little more complicated and will take some understanding of regular expressions. First note that \s escapes a space. In regular expressions, a character c followed by a * says 'look for 0 or more of c, in consecutive order'. So this regular expression matches strings like ' ; ', ';', etc (I added the single quotes to show the spaces). Note that in this case, we don't have parentheses, so the semicolons will be excluded from the returned array.
If you're still stuck, I'd suggest reading about regular expressions and practice writing them. This website is great, just be be weary of the fact that regular expressions on that site may be slightly different than those used in javascript in terms of syntax.
The 1st example below splits the input string at any digit, keeping the delimiter (i.e. the digit) in the final array.
The 2nd example below shows that leaving the parentheses out still splits the array at any digit, but those digit delimiters are not included in the final array.
The 3rd example below splits the input string any time the following pattern is encountered: as many consecutive spaces as possible (including none) immediately followed by a semi-colon immediately followed by as many consecutive spaces as possible (including none).
The 4th example below shows that you can indeed split a similar input string as in the 3rd example but with "^" replacing ";". However, because the "^" by itself means "the start of the string" you have to tell JavaScript to find the actual "^" by putting a backslash (i.e. a special indicator designated for this purpose) right in front of it, i.e. "\^".
const show = (msg) => {console.log(JSON.stringify(msg));};
var myString = 'Hello 1 word. Sentence number 2.';
var splits1 = myString.split(/(\d)/);
show(splits1);
var splits2 = myString.split(/\d/);
show(splits2);
var names1 = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ';
var nameList1 = names1.split(/\s*;\s*/);
show(nameList1);
var names2 = 'Harry Trump ^Fred Barney^ Helen Rigby ^ Bill Abel ^Chris Hand ';
var nameList2 = names2.split(/\s*\^\s*/);
show(nameList2);

Check if sentence contains a phrase

Sentences:
Hey checkout Hello World <- SHOULD BE INCLUDED
hello world is nice! <- SHOULD BE INCLUDED
Hhello World should not work <- SHOULD NOT BE INCLUDED
This too Hhhello World <- SHOULD NOT BE INCLUDED
var phraseToSearch = "Hello World";
Do note: sentence.ToLower().IndexOf(phraseToSearch.ToLower()) would not work as it would include all the above sentences while the result should only include sentences 1 and 2
You can use regular expression to match a character pattern with a string.
The regular expression is simply looking for Hello World the exact letters you are looking for with \b a word border and using the i case insensitive modifier.
Regex has a method test that will run the regular expression on the given string. It will return a true if the regular expression matched.
const phraseToSearch = /\bhello world\b/i
const str1 = 'Hey checkout Hello World'
const str2 = 'hello world is nice!'
const str3 = 'Hhello World should not work'
const str4 = 'This too Hhhello World'
console.log(
phraseToSearch.test(str1),
phraseToSearch.test(str2),
phraseToSearch.test(str3),
phraseToSearch.test(str4)
)
You probably want to use a regular expression. Here are the things you want to match
Text (with spaces surrounding it)
... Text (with space on one side, and end of text on the other)
Text ... (with space on one side, and start of side on the other)
Text (just the string, on its own)
One way to do it, without a regular expression, is just to put 4 conditions (one for each bullet point above) and join them up with a &&, but that would lead to messy code.
Another option is to split both strings be spaces, and checking if one array was a subarray of another.
However, my solution uses a regular expression - which is a pattern you can test on a string.
Our pattern should
Look for a space/start of string
Check for the string
Look for a space/end of string
\b, according to this, will match spaces, seperators of words, and ends of strings. These things are called word boundries.
Here is the code:
function doesContain(str, query){ // is query in str
return new RegExp("\b" + query + "\b", "i").test(str)
}
The i makes the match case insensitive.

Javascript RegExp replace. How to carry unknown characters into replacement?

I'm trying to get a much deeper understanding of JS RegExp for a project I'm working on.
So if I were checking for all strings containing foo and then a character that is not a number, I would use /foo[^0-9]/. However, let's say I want to change all strings matching that pattern to foobar and then the original characters, how would I go about that?
str = foozip;
newStr = str.replace(/foo[^0-9]/, "foobar");
console.log(newStr);
//returns foobarip Note the lack of a z.
str = foozip;
newStr = str.replace(/foo/, "foobar");
console.log(newStr);
//this matches foo6zip, which is no good
Do I have to run a separate check to do this? Is there a way to carry unknown characters from one side of a replace to the other?
You have two options:
Use lookahead:
str.replace(/foo(?=[^0-9])/, "foobar")
Use capture groups:
str.replace(/foo([^0-9])/, "foobar$1")

Create a permalink with JavaScript

I have a textbox where a user puts a string like this:
"hello world! I think that __i__ am awesome (yes I am!)"
I need to create a correct URL like this:
hello-world-i-think-that-i-am-awesome-yes-i-am
How can it be done using regular expressions?
Also, is it possible to do it with Greek (for example)?
"Γεια σου κόσμε"
turns to
geia-sou-kosme
In other programming languages (Python/Ruby) I am using a translation array. Should I do the same here?
Try this:
function doDashes(str) {
var re = /[^a-z0-9]+/gi; // global and case insensitive matching of non-char/non-numeric
var re2 = /^-*|-*$/g; // get rid of any leading/trailing dashes
str = str.replace(re, '-'); // perform the 1st regexp
return str.replace(re2, '').toLowerCase(); // ..aaand the second + return lowercased result
}
console.log(doDashes("hello world! I think that __i__ am awesome (yes I am!)"));
// => hello-world-I-think-that-i-am-awesome-yes-I-am
As for the greek characters, yeah I can't think of anything else than some sort of lookup table used by another regexp.
Edit, here's the oneliner version:
Edit, added toLowerCase():
Edit, embarrassing fix to the trailing regexp:
function doDashes2(str) {
return str.replace(/[^a-z0-9]+/gi, '-').replace(/^-*|-*$/g, '').toLowerCase();
}
A simple regex for doing this job is matching all "non-word" characters, and replace them with a -. But before matching this regex, convert the string to lowercase. This alone is not fool proof, since a dash on the end may be possible.
[^a-z]+
Thus, after the replacement; you can trim the dashes (from the front and the back) using this regex:
^-+|-+$
You'd have to create greek-to-latin glyps translation yourself, regex can't help you there. Using a translation array is a good idea.
I can't really say for Greek characters, but for the first example, a simple:
/[^a-zA-Z]+/
Will do the trick when using it as your pattern, and replacing the matches with a "-"
As per the Greek characters, I'd suggest using an array with all the "character translations", and then adding it's values to the regular expression.
To roughly build the url you would need something like this.
var textbox = "hello world! I think that __i__ am awesome (yes I am!)";
var url = textbox.toLowerCase().replace(/([^a-z])/, '').replace(/\s+/, " ").replace(/\s/, '-');
It simply removes all non-alpha characters, removes double spacing, and then replaces all space chars with a dash.
You could use another regular expression to replace the greek characters with english characters.

Categories