regex not being called repeatedly for multiple matches (isn't global) - javascript

I have this regex /#[a-zA-Z0-9_]+$/g to do a global look up of all user names that are mentioned.
Here is some sample code.
var userRegex = /#[a-zA-Z0-9_]+$/g;
var text = "This is some sample text #Stuff #Stuff2 #Stuff3";
text.replace(userRegex, function(match, text, urlId) {
console.log(match);
});
So basically that console.log only gets called once, in this case it'll just show #Stuff3. I'm not sure why it isn't searching globally. If someone can help fix up that regex for me, that'd be awesome!

$ means "Assert the position at the end of the string (or before a line break at the end of the string, if any)". But you don't seem to want that.
So remove the $ and use /#[a-zA-Z0-9_]+/g instead.
var userRegex = /#[a-zA-Z0-9_]+/g,
text = "This is some sample text #Stuff #Stuff2 #Stuff3";
text.match(userRegex); // [ "#Stuff", "#Stuff2", "#Stuff3" ]

It isn't doing a global search throughout the entire context simply because of the end of string $ anchor (which only asserts at the end of string position). You can use the following here:
var results = text.match(/#\w+/g) //=> [ '#Stuff', '#Stuff2', '#Stuff3' ]
Note: \w is shorthand for matching any word character.

Adding to #Oriol's answer. You can add word boundaries to be more specific.
#([a-zA-Z0-9_]+)\b
the \b will cause the username to match only if it is followed by a non-word character.
Here is the regex demo.

Related

REGEX - after bracket get data until end bracket

I have a string like the following:
SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)
Question
I am trying to make a regex to get just the information between the curly brackets, for example the end string would look like:
BI1 BI17 BI1234
I have found this example on stackoverflow which will get the first value BI1, but will ignore the rest after.
Get text between two rounded brackets
this is the REGEX I created from the above link: /\(([^)]+)\)/g but it includes the brackets, I want to remove these.
I am using this website to attempt to solve this query which has a testing window to see if the regex entered works:
http://www.regexr.com
Additional Information
there can be any amount of numbers also, which is why I have given 3 different examples.
this is a continous string, not on seperate lines
thanks for any help on this matter.
While this isn't possible using just regexes, you can do it with string#split and the following regex:
\).*?\(|^.*?\(|\).*?$
Yielding code that looks a bit like this:
function getBracketed(str) {
return str.split(/\).*?\(|^.*?\(|\).*?$/).filter(Boolean);
}
(You need to filter out the empty strings that'll appear at the beginning and end if you do it this way - hence the extra operation).
Regex demo on Regex101
Code demo on Repl.it
If you need to keep all inside parentheses and remove everything else, you might use
var str = "SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)";
var result = str.replace(/.*?\(([^()]*)\)/g, " $1").trim();
console.log(result);
If you need to get only the BI+digits pattern inside parentheses, use
/.*?\((BI\d+)\)/g
Details:
.*? - match any 0+ chars other than linebreak symbols
\( - match a (
(BI\d+) - Group 1 capturing BI + 1 or more digits (\d+) (or [^()]* - zero or more chars other than ( and ))
\) - a closing ).
To get all the values as array (say, for later joining), use
var str = "SOME TEXT (BI1) SOME MORE TEXT (BI17) SOME FINAL TEXT (BI1234)";
var re = /\((BI\d+)\)/g;
var res =str.match(re).map(function(s) {return s.substring(1, s.length-1);})
console.log(res);
console.log(res.join(" "));

regex encapsulation

I've got a question concerning regex.
I was wondering how one could replace an encapsulated text, something like {key:23} to something like <span class="highlightable">23</span, so that the entity will still remain encapsulated, but with something else.
I will do this in JS, but the regex is what is important, I have been searching for a while, probably searching for the wrong terms, I should probably learn more about regex, generally.
In any case, is there someone who knows how to perform this operation with simplicity?
Thanks!
It's important that you find {key:23} in your text first, and then replace it with your wanted syntax, this way you avoid replacing {key:'sometext'} with that syntax which is unwanted.
var str = "some random text {key:23} some random text {key:name}";
var n = str.replace(/\{key:[\d]+\}/gi, function myFunction(x){return x.replace(/\{key:/,'<span>').replace(/\}/, '</span>');});
this way only {key:AnyNumber} gets replaced, and {key:AnyThingOtherThanNumbers} don't get touched.
It seems you are new to regex. You need to learn more about character classes and capturing groups and backreferences.
The regex is somewhat basic in your case if you do not need any nested encapsulated text support.
Let's start:
The beginning is {key: - it will match the substring literally. Note that { can be a special character (denoting start of a limiting quantifier), thus, it is a good idea to escape it: {key:.
([^}]+) - This is a bit more interesting: the round brackets around are a capturing group that let us later back-reference the matched text. The [^}]+ means 1 or more characters (due to +) other than } (as [^}] is a negated character class where ^ means not)
} matches a } literally.
In the replacement string, we'll get the captured text using a backreference $1.
So, the entire regex will look like:
{key:([^}]+)}
See demo on regex101.com
Code snippet:
var re = /{key:([^}]+)}/g;
var str = '{key:23}';
var subst = '<span class="highlightable">$1</span>';
document.getElementById("res").innerHTML = str.replace(re, subst);
.highlightable
{
color: red;
}
<div id="res"/>
If you want to use a different behavior based on the value of key, then you'll need to adjust the regex to either match digits only (with \d+) or letters only (say, with [a-zA-Z] for English), or other shorthand classes, ranges (= character classes), or their combinations.
If your string is in var a, then:
var test = a.replace( /\{key:(\d+)\}/g, "<span class='highlightable'>$1</span>");

Replace words of text area

I have made a javascript function to replace some words with other words in a text area, but it doesn't work. I have made this:
function wordCheck() {
var text = document.getElementById("eC").value;
var newText = text.replace(/hello/g, '<b>hello</b>');
document.getElementById("eC").innerText = newText;
}
When I alert the variable newText, the console says that the variable doesn't exist.
Can anyone help me?
Edit:
Now it replace the words, but it replaces it with <b>hello</b>, but I want to have it bold. Is there a solution?
Update:
In response to your edit, about your wanting to see the word "hello" show up in bold. The short answer to that is: it can't be done. Not in a simple textarea, at least. You're probably looking for something more like an online WYSIWYG editor, or at least a RTE (Richt Text Editor). There are a couple of them out there, like tinyMCE, for example, which is a decent WYSIWYG editor. A list of RTE's and HTML editors can be found here.
First off: As others have already pointed out: a textarea element's contents is available through its value property, not the innerText. You get the contents alright, but you're trying to update it through the wrong property: use value in both cases.
If you want to replace all occurrences of a string/word/substring, you'll have to resort to using a regular expression, using the g modifier. I'd also recommend making the matching case-insensitive, to replace "hello", "Hello" and "HELLO" all the same:
var txtArea = document.querySelector('#eC');
txtArea.value = txtArea.value.replace(/(hello)/gi, '<b>$1</b>');
As you can see: I captured the match, and used it in the replacement string, to preserve the caps the user might have used.
But wait, there's more:
What if, for some reason, the input already contains <b>Hello</b>, or contains a word containing the string "hello" like "The company is called hellonearth?" Enter conditional matches (aka lookaround assertions) and word boundaries:
txtArea.value = txtArea.value.replace(x.value.replace(/(?!>)\b(hello)\b(?!<)/gi, '<b>$1</b>');
fiddle
How it works:
(?!>): Only match the rest if it isn't preceded by a > char (be more specific, if you want to and use (?!<b>). This is called a negative look-ahead
\b: a word boundary, to make sure we're not matching part of a word
(hello): match and capture the string literal, provided (as explained above) it is not preceded by a > and there is a word boundary
(?!<): same as above, only now we don't want to find a matching </b>, so you can replace this with the more specific (?!<\/b>)
/gi: modifiers, or flags, that affect the entire pattern: g for global (meaning this pattern will be applied to the entire string, not just a single match). The i tells the regex engine the pattern is case-insensitive, ie: h matches both the upper and lowercase character.
The replacement string <b>$1</b>: when the replacement string contains $n substrings, where n is a number, they are treated as backreferences. A regex can group matches into various parts, each group has a number, starting with 1, depending on how many groups you have. We're only grouping one part of the pattern, but suppose we wrote:
'foobar hello foobar'.replace(/(hel)(lo)/g, '<b>$1-$2</b>');
The output would be "foobar <b>hel-lo</b> foobar", because we've split the match up into 2 parts, and added a dash in the replacement string.
I think I'll leave the introduction to RegExp at that... even though we've only scratched the surface, I think it's quite clear now just how powerful regex's can be. Put some time and effort into learning more about this fantastic tool, it is well worth it.
If <textarea>, then you need to use .value property.
document.getElementById("eC").value = newText;
And, as mentioned Barmar, replace() replaces only first word. To replace all word, you need to use simple regex. Note that I removed quotes. /g means global replace.
var newText = text.replace(/hello/g, '<b>hello</b>');
But if you want to really bold your text, you need to use content editable div, not text area:
<div id="eC" contenteditable></div>
So then you need to access innerHTML:
function wordCheck() {
var text = document.getElementById("eC").innerHTML;
var newText = text.replace(/hello/g, '<b>hello</b>');
newText = newText.replace(/<b><b>/g,"<b>");//These two lines are there to prevent <b><b>hello</b></b>
newText = newText.replace(/<\/b><\/b>/g,"</b>");
document.getElementById("eC").innerHTML = newText;
}

Why do these JavaScript regular expression capture parenthesis snag entire line instead of the suffixes appended to a word?

Can someone please tell me WHY my simple expression doesn't capture the optional arbitrary length .suffix fragments following hello, matching complete lines?
Instead, it matches the ENTIRE LINE (hello.aa.b goodbye) instead of the contents of the capture parenthesis.
Using this code (see JSFIDDLE):
//var line = "hello goodbye"; // desired: suffix null
//var line = "hello.aa goodbye"; // desired: suffix[0]=.aa
var line = "hello.aa.b goodbye"; // desired: suffix[0]=.aa suffix[1]=.b
var suffix = line.match(/^hello(\.[^\.]*)*\sgoodbye$/g);
I've been working on this simple expression for OVER three hours and I'm beginning to believe I have a fundamental misunderstanding of how capturing works: isn't there a "cursor" gobbling up each line character-by-character and capturing content inside the parenthesis ()?
I originally started from Perl and then PHP. When I started with JavaScript, I got stuck with this situation once myself.
In JavaScript, the GLOBAL match does NOT produce a multidimensional array. In other words, in GLOBAL match there is only match[0] (no sub-patterns).
Please note that suffix[0] matches the whole string.
Try this:
//var line = "hello goodbye"; // desired: suffix undefined
//var line = "hello.aa goodbye"; // desired: suffix[1]=.aa
var line = "hello.aa.b goodbye"; // desired: suffix[1]=.aa suffix[2]=.b
var suffix = line.match(/^hello(\.[^.]+)?(\.[^.]+)?\s+goodbye$/);
If you have to use a global match, then you have to capture the whole strings first, then run a second RegEx to get the sub-patterns.
Good luck
:)
Update: Further Explanation
If each string only has ONE matchable pattern (like var line = "hello.aa.b goodbye";)
then you can use the pattern I posted above (without the GLOBAL modifier)
If a sting has more than ONE matchable pattern, then look at the following:
// modifier g means it will match more than once in the string
// ^ at the start mean starting with, when you wan the match to start form the beginning of the string
// $ means the end of the string
// if you have ^.....$ it means the whole string should be a ONE match
var suffix = line.match(/^hello(\.[^.]+)?(\.[^.]+)?\s+goodbye$/g);
var line = 'hello.aa goodbye and more hello.aa.b goodbye and some more hello.cc.dd goodbye';
// no match here since the whole of the string doesn't match the RegEx
var suffix = line.match(/^hello(\.[^.]+)?(\.[^.]+)?\s+goodbye$/);
// one match here, only the first one since it is not a GLOBAL match (hello.aa goodbye)
// suffix[0] = hello.aa goodbye
// suffix[1] = .aa
// suffix[2] = undefined
var suffix = line.match(/hello(\.[^.]+)?(\.[^.]+)?\s+goodbye/);
// 3 matches here (but no sub-patterns), only a one dimensional array with GLOBAL match in JavaScript
// suffix[0] = hello.aa goodbye
// suffix[1] = hello.aa.b goodbye
// suffix[2] = hello.cc.dd goodbye
var suffix = line.match(/hello(\.[^.]+)?(\.[^.]+)?\s+goodbye/g);
I hope that helps.
:)
inside ()
please do not look for . and then some space , instead look for . and some characters and finally outside () look for that space
A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations.
var suffix = line.match(/^hello((\.[^\.]*)*)\sgoodbye$/g);
if (suffix !== null)
suffix = suffix[1].match(/(\.[^\.\s]*)/g)
and I recommand regex101 site.
Using the global flag with the match method doesn't return any capturing groups. See the specification.
Although you use ()* it's only one capturing group. The * only defines that the content has to be matched 0 or more time before the space comes.
As #EveryEvery has pointed out you can use a two-step approach.

Javascript RegExp Matching weirdness

I have a RegExp:
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi
and some text "Champion"
somehow, this is coming back as a match, am I crazy?
0: "pio"
1: "i"
index: 4
input: "Champion"
length: 2
the loop is here:
// contruct the pattern, dynamically
var someText = "Champion";
var phrase = ".?(NCAA|Division|I|Basketball|Champions,|1939-2011).?";
var pat = new RegExp(phrase, "gi"); // <- ends up being
var result;
while( result = pat.exec(someText) ) {
// do stuff!
}
There has to be something wrong with my RegExp, right?
EDIT:
The .? thing was just a quick and dirty attempt to say that I'd like to match one of those words AND/OR one of those words with a single char on either side. ex:
\sNCAA\s
NCAA
NCAA\s
\sNCAA
GOAL:
I'm trying to do some simple hit highlighting based on some search words. I've got a function that gets all of the text nodes on a page, and I'd like to go through them all and highlight any matches to any of the terms in my phrase variable.
I think that I just need to rework how I am building my RegExp.
Well, first of all you're specifying case-insensitivity, and secondly, you are matching the letter I as one of your matchable string.
Champion would match pio and i, because they both match /.?I.?/gi
It however doesn't match /.?Champions,.?/gi because of the trailing comma.
Add start (^) and end ($) anchors to the regexp.
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
Without the anchors, the regexp's match can start and end anywhere in the string, which is why
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
can match pio and i: because it's actually matching around the (case-insensitive) I. If you leave the anchors off, but remove the ...|I|..., the regex won't match 'Champion':
> /.?(NCAA|Division|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
null
Champion matches /.?I.?/i.
Your own output notes that it's matching the substring "pio".
Perhaps you meant to bound the expression to the start and end of the input, with ^ and $ respectively:
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
I know you said to ignore the .?, but I can't: it's most likely wrong, and it's most likely going to continue to cause you problems. Explain why they're there and we can tell you how to do it properly. :)

Categories