JavaScript regexp get a word between texts

JavaScript regexp get a word between texts - javascript

In javascript I need to get the text within the custom tag. For example
[tag_retweet attr="val" attr2="val"]
In this case I need to get the word "retweet" only skipping all other texts and another example is,
[tag_share]
Here I need to get the word "share".
So what will be the regexp for getting that tag name in my case ??

Something like /\[tag_([a-z0-9_]+)(?:\s+|\])/
var tag = '[tag_retweet attr="val" attr2="val"]';
var match = tag.match(/\[tag_([a-z0-9_]+)(?:\s+|\])/);
window.alert(match[1]); // alerts "retweet"

The regex to capture it would be:
/.*\[tag_(.*?)\W.*/
This matches any characters up to the end of [tag_ and then starts capturing any caracters until it encounters a non-word character, then any other characters. The match will contain only the releavant parts.
use it like:
myString.match(/.*\[tag_(.*?)\W.*/)[1]

Basically, you're looking for what comes after [tag_, up until the next space (or the end of the tag)
So:
var tag = '[tag_retweet attr="val" attr2="val"]';
// or var tag = '[tag_share]';
var match = tag.match(/\[tag_(.*?)[\] ]/)[1];

Related

How to write regexp for finding :smile: in javascript?

I want to write a regular expression, in JavaScript, for finding the string starting and ending with :.
For example "hello :smile: :sleeping:" from this string I need to find the strings which are starting and ending with the : characters. I tried the expression below, but it didn't work:
^:.*\:$

My guess is that you not only want to find the string, but also replace it. For that you should look at using a capture in the regexp combined with a replacement function.
const emojiPattern = /:(\w+):/g
function replaceEmojiTags(text) {
return text.replace(emojiPattern, function (tag, emotion) {
// The emotion will be the captured word between your tags,
// so either "sleep" or "sleeping" in your example
//
// In this function you would take that emotion and return
// whatever you want based on the input parameter and the
// whole tag would be replaced
//
// As an example, let's say you had a bunch of GIF images
// for the different emotions:
return '<img src="/img/emoji/' + emotion + '.gif" />';
});
}
With that code you could then run your function on any input string and replace the tags to get the HTML for the actual images in them. As in your example:
replaceEmojiTags('hello :smile: :sleeping:')
// 'hello <img src="/img/emoji/smile.gif" /> <img src="/img/emoji/sleeping.gif" />'
EDIT: To support hyphens within the emotion, as in "big-smile", the pattern needs to be changed since it is only looking for word characters. For this there is probably also a restriction such that the hyphen must join two words so that it shouldn't accept "-big-smile" or "big-smile-". For that you need to change the pattern to:
const emojiPattern = /:(\w+(-\w+)*):/g
That pattern is looking for any word that is then followed by zero or more instances of a hyphen followed by a word. It would match any of the following: "smile", "big-smile", "big-smile-bigger".

The ^ and $ are anchors (start and end respectively). These cause your regex to explicitly match an entire string which starts with : has anything between it and ends with :.
If you want to match characters within a string you can remove the anchors.
Your * indicates zero or more so you'll be matching :: as well. It'll be better to change this to + which means one or more. In fact if you're just looking for text you may want to use a range [a-z0-9] with a case insensitive modifier.
If we put it all together we'll have regex like this /:([a-z0-9]+):/gmi
match a string beginning with : with any alphanumeric character one or more times ending in : with the modifiers g globally, m multi-line and i case insensitive for things like :FacePalm:.
Using it in JavaScript we can end up with:
var mytext = 'Hello :smile: and jolly :wave:';
var matches = mytext.match(/:([a-z0-9]+):/gmi);
// matches = [':smile:', ':wave:'];
You'll have an array with each match found.

Why do these JavaScript regular expression capture parenthesis snag entire line instead of the suffixes appended to a word?

Can someone please tell me WHY my simple expression doesn't capture the optional arbitrary length .suffix fragments following hello, matching complete lines?
Instead, it matches the ENTIRE LINE (hello.aa.b goodbye) instead of the contents of the capture parenthesis.
Using this code (see JSFIDDLE):
//var line = "hello goodbye"; // desired: suffix null
//var line = "hello.aa goodbye"; // desired: suffix[0]=.aa
var line = "hello.aa.b goodbye"; // desired: suffix[0]=.aa suffix[1]=.b
var suffix = line.match(/^hello(\.[^\.]*)*\sgoodbye$/g);
I've been working on this simple expression for OVER three hours and I'm beginning to believe I have a fundamental misunderstanding of how capturing works: isn't there a "cursor" gobbling up each line character-by-character and capturing content inside the parenthesis ()?

I originally started from Perl and then PHP. When I started with JavaScript, I got stuck with this situation once myself.
In JavaScript, the GLOBAL match does NOT produce a multidimensional array. In other words, in GLOBAL match there is only match[0] (no sub-patterns).
Please note that suffix[0] matches the whole string.
Try this:
//var line = "hello goodbye"; // desired: suffix undefined
//var line = "hello.aa goodbye"; // desired: suffix[1]=.aa
var line = "hello.aa.b goodbye"; // desired: suffix[1]=.aa suffix[2]=.b
var suffix = line.match(/^hello(\.[^.]+)?(\.[^.]+)?\s+goodbye$/);
If you have to use a global match, then you have to capture the whole strings first, then run a second RegEx to get the sub-patterns.
Good luck
:)
Update: Further Explanation
If each string only has ONE matchable pattern (like var line = "hello.aa.b goodbye";)
then you can use the pattern I posted above (without the GLOBAL modifier)
If a sting has more than ONE matchable pattern, then look at the following:
// modifier g means it will match more than once in the string
// ^ at the start mean starting with, when you wan the match to start form the beginning of the string
// $ means the end of the string
// if you have ^.....$ it means the whole string should be a ONE match
var suffix = line.match(/^hello(\.[^.]+)?(\.[^.]+)?\s+goodbye$/g);
var line = 'hello.aa goodbye and more hello.aa.b goodbye and some more hello.cc.dd goodbye';
// no match here since the whole of the string doesn't match the RegEx
var suffix = line.match(/^hello(\.[^.]+)?(\.[^.]+)?\s+goodbye$/);
// one match here, only the first one since it is not a GLOBAL match (hello.aa goodbye)
// suffix[0] = hello.aa goodbye
// suffix[1] = .aa
// suffix[2] = undefined
var suffix = line.match(/hello(\.[^.]+)?(\.[^.]+)?\s+goodbye/);
// 3 matches here (but no sub-patterns), only a one dimensional array with GLOBAL match in JavaScript
// suffix[0] = hello.aa goodbye
// suffix[1] = hello.aa.b goodbye
// suffix[2] = hello.cc.dd goodbye
var suffix = line.match(/hello(\.[^.]+)?(\.[^.]+)?\s+goodbye/g);
I hope that helps.
:)

inside ()
please do not look for . and then some space , instead look for . and some characters and finally outside () look for that space

A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations.
var suffix = line.match(/^hello((\.[^\.]*)*)\sgoodbye$/g);
if (suffix !== null)
suffix = suffix[1].match(/(\.[^\.\s]*)/g)
and I recommand regex101 site.

Using the global flag with the match method doesn't return any capturing groups. See the specification.
Although you use ()* it's only one capturing group. The * only defines that the content has to be matched 0 or more time before the space comes.
As #EveryEvery has pointed out you can use a two-step approach.

Ignore Word List

I have a list of words to ignore. However, when I call it, it replaces every instance of it even when it's inside a string.
For example: "he" ends up turning "the" into "t".
How can I have it just remove the words when they're on their own?
Here's the code:
var commonWords=/and|a|an|has|he|to|was|in|were|are|is|will|as|it|if|
with|at|its|it's|be|by|on|that|from|the|about|again|all|almost|also|although|
always|among|another|any|be|because|been|before|being|between|both|by|can|could|
did|do|does|doesn't|'|done|due|during|each|either|enough|from|had|has|have|having|
here|i|if|into|is|isn't|itself|just|may|might|most|mostly|must|nor|no|neither|nearly|
of|often|on|our|ours|his|hers|he's|he|she|she's|overall|perhaps|quite|rather|really|
regarding|seem|seems|seen|several|should|show|showewd|shown|shows|significant|
significantly|since|so|some|such|than|that|then|their|theirs|there's|therefore|these|
they|this|those|through|thus|to|upon|use|used|using|various|very|was|we|were|what|when|
which|while|with|within|without|would|however|or|for|the|but|etc|yet|/g;
commonWords.ignoreCase;
var w = w.replace(commonWords, '');

You're not trying to replace any instance in a string, you want to replace whole words. You need to look for word boundaries using the \b anchor.
For example...
var commonWords = /\b(and|a|an|has|he|she)\b/g;

Split mulitple part of string in some html div with id

HI i need to split some part of variable value
in my html file i got a dynamic value of variable some thing like this
product/roe_anythin_anything-1.jpg
product/soe_anything_anything-2.jpg
i need to remove the before
/slashpart
and after
_ part
which should return the roe or soe part
i have use a function
<script>
function splitSize(){
$('#splitSize').each(function(index) {
var mystr = $(this).html();
var mystr1 = /product\/(.*)-.*/.exec(mystr);
$(this).html(mystr1[1]);
//$(this).html(mystr1[0]);
});
}
splitSize();
</script>
with which i got roe_anythin_anything successfully i just need to remove now after `
_ part
`
please suggest how can i do this

This is as you asked using split . You can use RegEx to make it simpler
var myStr = 'product/roe-1.jpg' ;
myStr = myStr.split('/')[1];
myStr = myStr.split('-')[0];
Working JS Fiddle

Use regex group capture
var myStr = 'product/roe-1.jpg';
var result = /product\/(.*)-.*/.exec(myStr)[1];
Break down:
/product\/
matches the initial product string and the / character (escaped so its not interpreted as the end of the regex)
The
(.*)
Matches your roe characters and keeps them in a 'capture group' - everything up to but not including the hyphen
Then the hyphen is matched, then anything else.
This returns a 2 element array. Item 0 is the whole string, item 1 is the contents of the capture group.
See How do you access the matched groups in a JavaScript regular expression? for more details

Remove string after predefined string

I am pulling content from an RSS feed, before using jquery to format and edit the rss feed (string) that is returned. I am using replace to replace strings and characters like so:
var spanish = $("#wod a").text();
var newspan = spanish.replace("=","-");
$("#wod a").text(newspan);
This works great. I am also trying to remove all text after a certain point. Similar to truncation, I would like to hide all text starting from the word "Example".
In this particular RSS feed, the word example is in every feed. I would like to hide "example" and all text the follows that word. How can I accomplish this?

Though there is not enough jQuery, you even don't need it to remove everything after a certain word in the given string. The first approach is to use substring:
var new_str = str.substring(0, str.indexOf("Example"));
The second is a trick with split:
var new_str = str.split("Example")[0];

If you also want to keep "Example" and just remove everything after that particular word, you can do:
var str = "aaaa1111?bbb&222:Example=123456",
newStr = str.substring(0, str.indexOf('Example') + 'Example'.length);
// will output: aaaa1111?bbb&222:Example

jQuery isn't intended for string manipulation, you should use Vanilla JS for that:
newspan = newspan.replace(/example.*$/i, "");
The .replace() method accepts a regular expression, so in this case I've used /example.*$/i which does a case-insensitive match against the word "example" followed by zero or more of any other characters to the end of the string and replaces them with an empty string.

I would like to hide all text starting from the word "Example"
A solution that uses the simpler replace WITH backreferences so as to "hide" everything starting with the word Example but keeping the stuff before it.
var str = "my house example is bad"
str.replace(/(.*?) example.*/i, "$1") // returns "my house"
// case insensitive. also note the space before example because you
// probably want to throw that out.

We Keep Coding

JavaScript is the programming language of the Web.

JavaScript regexp get a word between texts - javascript

Something like /\[tag_([a-z0-9_]+)(?:\s+|\])/ var tag = '[tag_retweet attr="val" attr2="val"]'; var match = tag.match(/\[tag_([a-z0-9_]+)(?:\s+|\])/); window.alert(match[1]); // alerts "retweet"

Basically, you're looking for what comes after [tag_, up until the next space (or the end of the tag) So: var tag = '[tag_retweet attr="val" attr2="val"]'; // or var tag = '[tag_share]'; var match = tag.match(/\[tag_(.*?)[\] ]/)[1];

Related

How to write regexp for finding :smile: in javascript?

Why do these JavaScript regular expression capture parenthesis snag entire line instead of the suffixes appended to a word?

Ignore Word List

Split mulitple part of string in some html div with id

Remove string after predefined string

Categories

Resources

We Keep Coding

JavaScript is the programming language of the Web.

JavaScript regexp get a word between texts - javascript

Something like /\[tag_([a-z0-9_]+)(?:\s+|\])/ var tag = '[tag_retweet attr="val" attr2="val"]'; var match = tag.match(/\[tag_([a-z0-9_]+)(?:\s+|\])/); window.alert(match[1]);​ // alerts "retweet"

Basically, you're looking for what comes after [tag_, up until the next space (or the end of the tag) So: var tag = '[tag_retweet attr="val" attr2="val"]'; // or var tag = '[tag_share]'; var match = tag.match(/\[tag_(.*?)[\] ]/)[1];

Related

How to write regexp for finding :smile: in javascript?

Why do these JavaScript regular expression capture parenthesis snag entire line instead of the suffixes appended to a word?

Ignore Word List

Split mulitple part of string in some html div with id

Remove string after predefined string

Categories

Resources

Something like /\[tag_([a-z0-9_]+)(?:\s+|\])/ var tag = '[tag_retweet attr="val" attr2="val"]'; var match = tag.match(/\[tag_([a-z0-9_]+)(?:\s+|\])/); window.alert(match[1]); // alerts "retweet"