Match string in between two strings [duplicate] - javascript

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 3 years ago.
If I have a string like this:
var str = "play the Ukulele in Lebanon. play the Guitar in Lebanon.";
I want to get the strings between each of the substrings "play" and "in", so basically an array with "the Ukelele" and "the Guitar".
Right now I'm doing:
var test = str.match("play(.*)in");
But that's returning the string between the first "play" and last "in", so I get "the Ukulele in Lebanon. Play the Guitar" instead of 2 separate strings. Does anyone know how to globally search a string for all occurrences of a substring between a starting and ending string?

You can use the regex
play\s*(.*?)\s*in
Use the / as delimiters for regex literal syntax
Use the lazy group to match minimal possible
Demo:
var str = "play the Ukulele in Lebanon. play the Guitar in Lebanon.";
var regex = /play\s*(.*?)\s*in/g;
var matches = [];
while (m = regex.exec(str)) {
matches.push(m[1]);
}
document.body.innerHTML = '<pre>' + JSON.stringify(matches, 0, 4) + '</pre>';

You are so close to the right answer. There are a few things you may be overlooking:
You need your match to be non-greedy, this can be accomplished by using the ? operator
Do not use the String.match() method as it's proven to match the entirety of the pattern and does not pay attention to capturing groups as you would expect. An alternative is to use RegExp.exec() or String.replace(), but using replace would require a little more work, so stick to building your own array with exec
var str = "display the Ukulele in Lebanon. play the Guitar in Lebanon.";
var re = /\bplay (.+?) in\b/g;
var matches = [];
var match;
while ( match = re.exec(str) ){
matches[ matches.length ] = match[1];
}
document.getElementById('demo').innerHTML = JSON.stringify( matches );
<pre id="demo"></pre>

/\bplay\s+(.+?)\s+in\b/ig might be more specific and might work better for you.
I believe there may be some issues with the regexes offered previously. For instance, /play\s*(.*?)\s*in/g will find a match within "displaying photographs in sequence". Of course this is not what you want. One of the problems is that there is nothing specifying that "play" should be a discrete word. It needs a word boundary before it and at least one instance of white space after it (it can't be optional). Similarly, the white space after the capture group should not be optional.
The other expression offered at the time I added this, /play (.+?) in/g, lacks the word boundary token before "play" and after "in", so it will contain a match in "display blue ink". This is not what you want.
As to your expression, it was missing the word boundary and white space tokens as well. But as another mentioned, it also needed the wildcard to be lazy. Otherwise, given your example string, your match would start with the first instance of "play" and end with the 2nd instance of "in".
If issues with my offered expression are found, would appreciate feedback.

A victim of greedy matching.
.* finds the longest possible match,
while .*? finds the shortest possible match.
For the example given str will be an array or 3 strings containing:
the Ukelele
the Guitar
Lebanon

Related

I need some help for a specific regex in javascript

I try to set a correct regex in my javascript code, but I'm a bit confused with this. My goal is to find any occurence of "rotate" in a string. This should be simple, but in fact I'm lost as my "rotate" can have multiple endings! Here are some examples of what I want to find with the regex:
rotate5
rotate180
rotate-1
rotate-270
The "rotate" word can be at the begining of my string or at the end, or even in the middle separated by spaces from other words. The regex will be used in a search-and-replace function.
Can someone help me please?
EDIT: What I tried so far (probably missing some of them):
/\wrotate.*/
/rotate.\w*/
/rotate.\d/
/\Srotate*/
I'm not fully understanding the regex mechanic yet.
Try this regex as a start. It will return all occurrences of a "rotate" string where a number (positive or negative) follows the "rotate".
/(rotate)([-]?[0-9]*)/g
Here is sample code
var aString = ["rotate5","rotate180","rotate-1","some text rotate-270 rotate-1 more text rotate180"];
for (var x = 0; x < 4; x++){
var match;
var regex = /(rotate)([-]?[0-9]*)/g;
while (match = regex.exec(aString[x])){
console.log(match);
}
}
In this example,
match[0] gives the whole match (e.g. rotate5)
match[1] gives the text "rotate"
match[2] gives the numerical text immediately after the word "rotate"
If there are multiple rotate stings in the string, this will return them all
If you just need to know if the 'word' is in the string so /rotate/ simply will be OK.
But if you want some matching about what coming before or after the #mseifert will be good
If you just want to replace the word rotate by another one
you can just use the string method String.replace use it like var str = "i am rotating with rotate-90"; str.repalace('rotate','turning')'
WHy your regex doesnt work ?
/\wrotate.*/
means that the string must start with a caracter [a-zA-Z0-9_] followed by rotate and another optional character
/rotate.\w*/
meanse rotate must be followed by a character and others n optional character
...............
Using your description:
The "rotate" word can be at the beginning of my string or at the end, or even in the middle separated by spaces from other words. The regex will be used in a search-and-replace function.
This regex should do the work:
const regex = /(^rotate|rotate$|\ {1}rotate\ {1})/gm;
You can learn more about regular expressions with these sites:
http://www.regular-expressions.info
regex101.com and btw here is an example using your requirements.

Match and replace a substring while ignoring special characters

I am currently looking for a way to turn matching text into a bold html line. I have it partially working except for special characters giving me problems because I desire to maintain the original string, but not compare the original string.
Example:
Given the original string:
Taco John's is my favorite place to eat.
And wanting to match:
is my 'favorite'
To get the desired result:
Taco John's <b>is my favorite</b> place to eat.
The way I'm currently getting around the extra quotes in the matching string is by replacing them
let regex = new RegExp('('+escapeRegexCharacters(matching_text.replace(/[^a-z 0-9]/gi,''))+')',"gi")
let html= full_text.replace(/[^a-z 0-9]/gi,'').replace(regex, "<b>$1</b>")}}></span>
This almost works, except that I lose all punctuation:
Taco Johns <b>is my favorite</b> place to eat
Is there any way to use regex, or another method, to add tags surrounding a matching phrase while ignoring both case and special characters during the matching process?
UPDATE #1:
It seems that I am being unclear. I need the original string's puncuation to remain in the end result's html. And I need the matching text logic to ignore all special characters and capitalization. So is my favorite is My favorite and is my 'favorite' should all trigger a match.
Instead of removing the special characters from the string being searched, you could inject in your regular expression a pattern between each character-to-match that will skip any special characters that might occur. That way you build a regular expression that can be applied directly to the string being searched, and the replacing operation will thus not touch the special characters outside of the matches:
let escapeRegexCharacters =
s => s.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&"),
full_text = "Taco John's is My favorite place to eat.";
matching_text = "is my 'favorite'";
regex = new RegExp(matching_text.replace(/[^a-z\s\d]/gi, '')
.split().map(escapeRegexCharacters).join('[^a-z\s\d]*'), "gi"),
html = full_text.replace(regex, "<b>$&</b>");
console.log(html);
Regexps are useful where there is a pattern, but, in this case you have a direct match, so, the good approach is using a String.prototype.replace:
function wrap(source, part, tagName) {
return source
.replace(part,
`<${tagName}>${part}</${tagName}>`
)
;
}
At least, if there is a pattern, you should edit your question and provide it.
As an option, for single occurrence case - use String.split
Example replacing '###' with '###' :
let inputString = '1234###5678'
const chunks = inputString.split('###')
inputString = `${chunks[0]}###${chunks[1]}`
It's possible to avoid using a capture group with the $& replacement string, which means "entire matched substring":
var phrase = "Taco John's is my favorite place to eat."
var matchingText = "is my favorite"
var re = new RegExp(escapeRegexCharacters(matchingText), "ig");
phrase.replace(re, "<b>$&</b>");
(Code based on obarakon's answer.)
Generalizing, the regex you could use is my /w+. You can use that in a replacer function so that you can javascript manipulate the resultant text:
var str = "Taco John's is my favorite place to eat.";
var html = str.replace(/is my \w*/, function (x) {
return "<b>" + x + "</b>";
} );
console.log(html);

RegEx - Get All Characters After Last Slash in URL

I'm working with a Google API that returns IDs in the below format, which I've saved as a string. How can I write a Regular Expression in javascript to trim the string to only the characters after the last slash in the URL.
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9'
Don't write a regex! This is trivial to do with string functions instead:
var final = id.substr(id.lastIndexOf('/') + 1);
It's even easier if you know that the final part will always be 16 characters:
var final = id.substr(-16);
A slightly different regex approach:
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
Breaking down this regex:
\/ match a slash
( start of a captured group within the match
[^\/] match a non-slash character
+ match one of more of the non-slash characters
) end of the captured group
\/? allow one optional / at the end of the string
$ match to the end of the string
The [1] then retrieves the first captured group within the match
Working snippet:
var id = 'http://www.google.com/m8/feeds/contacts/myemail%40gmail.com/base/nabb80191e23b7d9';
var afterSlashChars = id.match(/\/([^\/]+)\/?$/)[1];
// display result
document.write(afterSlashChars);
Just in case someone else comes across this thread and is looking for a simple JS solution:
id.split('/').pop(-1)
this is easy to understand (?!.*/).+
let me explain:
first, lets match everything that has a slash at the end, ok?
that's the part we don't want
.*/ matches everything until the last slash
then, we make a "Negative lookahead" (?!) to say "I don't want this, discard it"
(?!.*) this is "Negative lookahead"
Now we can happily take whatever is next to what we don't want with this
.+
YOU MAY NEED TO ESCAPE THE / SO IT BECOMES:
(?!.*\/).+
this regexp: [^\/]+$ - works like a champ:
var id = ".../base/nabb80191e23b7d9"
result = id.match(/[^\/]+$/)[0];
// results -> "nabb80191e23b7d9"
This should work:
last = id.match(/\/([^/]*)$/)[1];
//=> nabb80191e23b7d9
Don't know JS, using others examples (and a guess) -
id = id.match(/[^\/]*$/); // [0] optional ?
Why not use replace?
"http://google.com/aaa".replace(/(.*\/)*/,"")
yields "aaa"

Regex trying to match characters before and after symbol

I'm trying to match characters before and after a symbol, in a string.
string: budgets-closed
To match the characters before the sign -, I do: ^[a-z]+
And to match the other characters, I try: \-(\w+) but, the problem is that my result is: -closed instead of closed.
Any ideas, how to fix it?
Update
This is the piece of code, where I was trying to apply the regex http://jsfiddle.net/trDFh/1/
I repeat: It's not that I don't want to use split; it's just I was really curious, and wanted to see, how can it be done the regex way. Hacking into things spirit
Update2
Well, using substring is a solution as well: http://jsfiddle.net/trDFh/2/ and is the one I chosed to use, since the if in question, is actually an else if in a more complex if syntax, and the chosen solutions seems to be the most fitted for now.
Use exec():
var result=/([^-]+)-([^-]+)/.exec(string);
result is an array, with result[1] being the first captured string and result[2] being the second captured string.
Live demo: http://jsfiddle.net/Pqntk/
I think you'll have to match that. You can use grouping to get what you need, though.
var str = 'budgets-closed';
var matches = str.match( /([a-z]+)-([a-z]+)/ );
var before = matches[1];
var after = matches[2];
For that specific string, you could also use
var str = 'budgets-closed';
var before = str.match( /^\b[a-z]+/ )[0];
var after = str.match( /\b[a-z]+$/ )[0];
I'm sure there are better ways, but the above methods do work.
If the symbol is specifically -, then this should work:
\b([^-]+)-([^-]+)\b
You match a boundry, any "not -" characters, a - and then more "not -" characters until the next word boundry.
Also, there is no need to escape a hyphen, it only holds special properties when between two other characters inside a character class.
edit: And here is a jsfiddle that demonstrates it does work.

JavaScript RegEx Match Failing

I am having issues matching a string using regex in javascript. I am trying to get everything up to the word "at". I am using the following and while it doesn't return any errors, it also doesn't do anything either.
var str = "Team A at Team B";
var matches = str.match(/(.*?)(?=at|$)/);
I tried multiple regex patterns before coming across this SO post, Regex to capture everything before first optional string, but it doesn't to return what I want.
Remove the ? at your first capturing group, and |$ from your second, and add ^ to mark beginning of string:
str.match(/^(.*)(?=at)/)
Alternatively (I personally find below easier to read, but your call):
str.substr(0, str.search(/\bat\b/))

Categories