Extract the last few words in a sentence using regex? - javascript

I am trying to write a regex to extract the last few words from a sentence. However, I can't seem to make it work.
var str = "Tell me about robin hood";
const TELL_ME_ABOUT_REGEX = /^Tell me about (\w+\s*)*/
if( str.match(TELL_ME_ABOUT_REGEX) ){
var matches = TELL_ME_ABOUT_REGEX.exec( str );
console.log( "user wants to know about ",matches);
}
I am trying to get the word "robin hood". But I only end up with "hood".
[ 'Tell me about robin hood',
'hood',
index: 0,
input: 'Tell me about robin hood' ]
What do I change in my regex?

Why you need regex for this? You can do it without regex like it
var str = "Tell me about robin hood";
var str = str.split(" ").splice(3).join(" ");
alert(str);
http://codepen.io/anon/pen/aNvrYz
Edit : Regex Solution
var str = "Tell me about robin hood";
var match = str.match(/^Tell me about (.*)/i);
var textInDe = match[1];
alert(textInDe);
http://codepen.io/anon/pen/BKoerY

Here is a correct regex:
^Tell me about ((?:\w+\s*)*)
Compare to your original one which is
^Tell me about (\w+\s*)*
That's so closed. The point is you should use the non-capturing group for an inner bracket and capturing group for an outer bracket.
Note that (\w+\s*)* from your regex, it might captures robin to \1 at the first time and then overwrites hood to \1. So, now you might understand that regex engine will work in this way.

The problem with /^Tell me about (\w+\s*)*/ is that are multiples matches for "Tell me about robin hood", i.e., robin is a possible match, as well robin\s and so forth. The repetition characters may confuse you sometimes, but when you think on all match possibilities, it can be clearer.
For matching all that came after Tell me about you can simply get it all at once, with only one possible match: (.*).
Regex demo

Try this
Tell me about ([\w\s]+)
Regex Demo
(\w+\s*)* will capture 'robin' then 'hood' but only keep the last iteration => 'hood'

Related

Is there an easy way to return all words from string?

Is there a way to return all words in a given string? The best solution I have currently found is using the match method and returning any string with at least one non-whitespace char (/\S+/g).
The issue with this method is that it includes a comma, period, etc. in the word. If I try using a RegExp with \w, then it doesn't include periods and commas, but it makes "don't" two words because of the '.
Is there any true and easy solution to this issue?
For example: "I don't want to go, mom". This should return the words [I, don't, want, to, go, mom]
Would this work?
mystr.replace(".","").split(/\s/g);
I would have commented, but I don't have 50 rep
Use word boundaries in regex and match function
const matches = "I don't want to go, mom.".match(/(\b[^\s]+\b)/g);
console.log(matches);
I don't know if this is exactly what are you looking for but... Using just the example string you've provided this worked:
myString = "I don't want to go, mom"
wordsArray = []
myString = myString.replace(',', '')
wordsArray = myString.split(' ')
console.log({wordsArray})
But be ware that you have so much additional cases:
"two-handed" there is one or two words? ['two', 'handed'], ['two-handed'] or ['twohanded']
"Mrs. Foo", "Dr. bar"... expecting: ['Mrs', 'Foo'], ['Mrs. Foo'], ['Mrs.Foo'], ['MrsFoo'] ?
I'll appreciate any feedback.

Ignore pattern with speciefic word in it

I've got the following code:
var one = "What a lovely puppy!
Is it
a beagle?",
two = "You've got a lovely funny beagle!",
three = "Where's my lovely dog? Where's my little beagle?";
var target = (".* lovely(?!funny).* beagle.*");
What I need is to "catch" words between lovely and beagle. If there's word funny between my "catch" boundaries (var two), the method
match() should ignore it.
In other words, it should check for matches all phrases from lovely to beagle, which involve no word funny.
My var target seems to be written incorrect.
I expect:
one.match(target) //return: "puppy! Is it a"
two.match(target) //must be ignored, because of word "funny"
three.match(target) //return: "dog? Where's my little"
Any help will be appreciated!
You'll have to join the repeated test for funny and the .*. Try
lovely(?:(?!funny).)*beagle
Check it out here at regex101.
Regards
I suggest to make a per character check for the first letter, then check for 'funny' if current letter is an inital f:
var regex = /\blovely\b([^f]|\Bf|f(?!unny\b))+\bbeagle\b/;

Match string in between two strings [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 3 years ago.
If I have a string like this:
var str = "play the Ukulele in Lebanon. play the Guitar in Lebanon.";
I want to get the strings between each of the substrings "play" and "in", so basically an array with "the Ukelele" and "the Guitar".
Right now I'm doing:
var test = str.match("play(.*)in");
But that's returning the string between the first "play" and last "in", so I get "the Ukulele in Lebanon. Play the Guitar" instead of 2 separate strings. Does anyone know how to globally search a string for all occurrences of a substring between a starting and ending string?
You can use the regex
play\s*(.*?)\s*in
Use the / as delimiters for regex literal syntax
Use the lazy group to match minimal possible
Demo:
var str = "play the Ukulele in Lebanon. play the Guitar in Lebanon.";
var regex = /play\s*(.*?)\s*in/g;
var matches = [];
while (m = regex.exec(str)) {
matches.push(m[1]);
}
document.body.innerHTML = '<pre>' + JSON.stringify(matches, 0, 4) + '</pre>';
You are so close to the right answer. There are a few things you may be overlooking:
You need your match to be non-greedy, this can be accomplished by using the ? operator
Do not use the String.match() method as it's proven to match the entirety of the pattern and does not pay attention to capturing groups as you would expect. An alternative is to use RegExp.exec() or String.replace(), but using replace would require a little more work, so stick to building your own array with exec
var str = "display the Ukulele in Lebanon. play the Guitar in Lebanon.";
var re = /\bplay (.+?) in\b/g;
var matches = [];
var match;
while ( match = re.exec(str) ){
matches[ matches.length ] = match[1];
}
document.getElementById('demo').innerHTML = JSON.stringify( matches );
<pre id="demo"></pre>
/\bplay\s+(.+?)\s+in\b/ig might be more specific and might work better for you.
I believe there may be some issues with the regexes offered previously. For instance, /play\s*(.*?)\s*in/g will find a match within "displaying photographs in sequence". Of course this is not what you want. One of the problems is that there is nothing specifying that "play" should be a discrete word. It needs a word boundary before it and at least one instance of white space after it (it can't be optional). Similarly, the white space after the capture group should not be optional.
The other expression offered at the time I added this, /play (.+?) in/g, lacks the word boundary token before "play" and after "in", so it will contain a match in "display blue ink". This is not what you want.
As to your expression, it was missing the word boundary and white space tokens as well. But as another mentioned, it also needed the wildcard to be lazy. Otherwise, given your example string, your match would start with the first instance of "play" and end with the 2nd instance of "in".
If issues with my offered expression are found, would appreciate feedback.
A victim of greedy matching.
.* finds the longest possible match,
while .*? finds the shortest possible match.
For the example given str will be an array or 3 strings containing:
the Ukelele
the Guitar
Lebanon

Regex acronym matching and typo correction

I'm trying to fix some typos and one common one is a space missing betweens sentences: "This is a sentence.Here is another sentence." I want to match and add a space so I wrote this regular expression:
var re = /\.(?=[A-Z]|\()/g;
var res = str.replace(re, '. ');
That covers the squished together sentences, as well as another typo involving parenthesis which is not important for this question.
The problem is that there are acronyms that show up, which are also matched and (incorrectly) replace. Example: "The U.S. is a country" is replaced to "The U. S. is a country". I'm trying to prevent these acronyms from being matched. I think maybe what I want is a "lookbehind", but javascript doesn't support that.
Any idea how to solve this?
You could try:
\.(?=[A-Z]|\()(?![A-Z]\.)
This ensures the proceeding characters after the "." do not include a capital letter followed by a "."
This seems to work:
var str = "A sentance.Another sentance with an A.C.R.O.N.Y.M.Yet another sentence."
var re = /\.(?=[A-Z][^.]|\()/g;
var res = str.replace(re, '. ');
res // => "A sentance. Another sentance with an A.C.R.O.N.Y.M. Yet another sentence."

Regex trying to match characters before and after symbol

I'm trying to match characters before and after a symbol, in a string.
string: budgets-closed
To match the characters before the sign -, I do: ^[a-z]+
And to match the other characters, I try: \-(\w+) but, the problem is that my result is: -closed instead of closed.
Any ideas, how to fix it?
Update
This is the piece of code, where I was trying to apply the regex http://jsfiddle.net/trDFh/1/
I repeat: It's not that I don't want to use split; it's just I was really curious, and wanted to see, how can it be done the regex way. Hacking into things spirit
Update2
Well, using substring is a solution as well: http://jsfiddle.net/trDFh/2/ and is the one I chosed to use, since the if in question, is actually an else if in a more complex if syntax, and the chosen solutions seems to be the most fitted for now.
Use exec():
var result=/([^-]+)-([^-]+)/.exec(string);
result is an array, with result[1] being the first captured string and result[2] being the second captured string.
Live demo: http://jsfiddle.net/Pqntk/
I think you'll have to match that. You can use grouping to get what you need, though.
var str = 'budgets-closed';
var matches = str.match( /([a-z]+)-([a-z]+)/ );
var before = matches[1];
var after = matches[2];
For that specific string, you could also use
var str = 'budgets-closed';
var before = str.match( /^\b[a-z]+/ )[0];
var after = str.match( /\b[a-z]+$/ )[0];
I'm sure there are better ways, but the above methods do work.
If the symbol is specifically -, then this should work:
\b([^-]+)-([^-]+)\b
You match a boundry, any "not -" characters, a - and then more "not -" characters until the next word boundry.
Also, there is no need to escape a hyphen, it only holds special properties when between two other characters inside a character class.
edit: And here is a jsfiddle that demonstrates it does work.

Categories