Get all words starting with X and ending with Y - javascript

I have got a textarea with keyup=validate()
I need a javascript function that gets all words starting with # and ending with a character that is not A-Za-z0-9
For example:
This is a text #user1 this is more text #user2. And this is even more #user3!
The function gives an array:
Array("#user1","#user2","#user3");
I am sure there must be a way to do this written on somewhere on the internet if I just google something but I have no idea what I have to look for.. I am very new with regular expresions.
Thank you very much!

The regular expression you want is:
/#[a-z\d]+/ig
This matches # followed by a sequence of letters and numbers. The i modifier makes it case-insensitive, so you don't have to put A-Z in the character class, and g makes it find all the matches.
var str = "This is a text #user1 this is more text #user2. And this is even more #user3!";
var matches = str.match(/#[a-z\d]+/ig);
console.log(matches);

JS
var str = "This is a text #user1 this is more text #user2. And this is even more #user3!",
var textArr = str.split(" ");
for(var i = 0; i < textArr.length; i++) {
var test = textArr[i];
matches = test.match(/^#.*.[A-Za-z0-9]$/);
console.log(matches);
};
Explanation:
You should also read about the regex(http://www.w3schools.com/jsref/jsref_obj_regexp.asp) and match(http://www.w3schools.com/jsref/jsref_match.asp) to get an idea how it works.
Basically, applying ^# means starting the regex look for #. $ means ending with. and .* any character in between.
To Test: http://www.regular-expressions.info/javascriptexample.html

Thanks for the replies above, they've helped me - Where I've written this method that hopefully answers the question about having a start and end regex check.
In this example it looks for ##_ at the start and _## at the end
e.g. ##_ anyTokenYouNeedToFind _##.
Code:
const tokenSearchHelper = (inputText) => {
let matches = inputText.match(/##_[a-zA-Z0-9_\d]+_##/ig);
return matches;
}
const out = tokenSearchHelper("Hello ##_World_##");
console.log(out);

Related

Regex Match Punctuation Space but Retain Punctuation

I have a large paragraph string which I'm trying to split into sentences using JavaScript's .split() method. I need a regex that will match a period or a question-mark [?.] followed by a space. However, I need to retain the period/question-mark in the resulting array. How can I do this without positive lookbehinds in JS?
Edit: Example input:
"This is sentence 1. This is sentence 2? This is sentence 3."
Example output:
["This is sentence 1.", "This is sentence 2?", "This is sentence 3."]
This regex will work
([^?.]+[?.])(?:\s|$)
Regex Demo
JS Demo
Ideone Demo
var str = 'This is sentence 1. This is sentence 2? This is sentence 3.';
var regex = /([^?.]+[?.])(?:\s|$)/gm;
var m;
while ((m = regex.exec(str)) !== null) {
document.writeln(m[1] + '<br>');
}
Forget about split(). You want match()
var text = "This is an example paragragh. Oh and it has a question? Ok it's followed by some other random stuff. Bye.";
var matches = text.match(/[\w\s'\";\(\)\,]+(\.|\?)(\s|$)/g);
alert(matches);
The generated matches array contains each sentence:
Array[4]
0:"This is an example paragragh. "
1:"Oh and it has a question? "
2:"Ok it's followed by some other random stuff. "
4:"Bye. "
Here is the fiddle with it for further testing: https://jsfiddle.net/uds4cww3/
Edited to match end of line too.
May be this one validates your array items
\b.*?[?\.](?=\s|$)
Debuggex Demo
This is tacky, but it works:
var breakIntoSentences = function(s) {
var l = [];
s.replace(/[^.?]+.?/g, a => l.push(a));
return l;
}
breakIntoSentences("how? who cares.")
["how?", " who cares."]
(Really how it works: the RE matches a string of not-punctuation, followed by something. Since the match is greedy, that something is either punctuation or the end-of-string.)
This will only capture the first in a series of punctuation, so breakIntoSentences("how???? who cares...") also returns ["how?", " who cares."]. If you want to capture all the punctuation, use /[^.?]+[.?]*/g as the RE instead.
Edit: Hahaha: Wavvves teaches me about match(), which is what the replace/push does. You learn something knew every goddamn day.
In its minimal form, supporting three punctuation marks, and using ES6 syntax, we get:
const breakIntoSentences = s => s.match(/[^.?,]+[.?,]*/g)
I guess .match will do it:
(?:\s?)(.*?[.?])
I.e.:
sentence = "This is sentence 1. This is sentence 2? This is sentence 3.";
result = sentence.match(/(?:\s?)(.*?[.?])/ig);
for (var i = 0; i < result.length; i++) {
document.write(result[i]+"<br>");
}

Retrieving several capturing groups recursively with RegExp

I have a string with this format:
#someID#tn#company#somethingNew#classing#somethingElse#With
There might be unlimited #-separated words, but definitely the whole string begins with #
I have written the following regexp, though it matches it, but I cannot get each #-separated word, and what I get is the last recursion and the first (as well as the whole string). How can I get an array of every word in an element separately?
(?:^\#\w*)(?:(\#\w*)+) //I know I have ruled out second capturing group with ?: , though doesn't make much difference.
And here is my Javascript code:
var reg = /(?:^\#\w*)(?:(\#\w*)+)/g;
var x = null;
while(x = reg.exec("#someID#tn#company#somethingNew#classing#somethingElse#With"))
{
console.log(x);
}
And here is the result (Firebug, console):
["#someID#tn#company#somet...sing#somethingElse#With", "#With"]
0
"#someID#tn#company#somet...sing#somethingElse#With"
1
"#With"
index
0
input
"#someID#tn#company#somet...sing#somethingElse#With"
EDIT :
I want an output like this with regular expression if possible:
["#someID", "#tn", #company", "#somethingNew", "#classing", "#somethingElse", "#With"]
NOTE that I want a RegExp solution. I know about String.split() and String operations.
You can use:
var s = '#someID#tn#company#somethingNew#classing#somethingElse#With'
if (s.substr(0, 1) == "#")
tok = s.substr(1).split('#');
//=> ["someID", "tn", "company", "somethingNew", "classing", "somethingElse", "With"]
You could try this regex also,
((?:#|#)\w+)
DEMO
Explanation:
() Capturing groups. Anything inside this capturing group would be captured.
(?:) It just matches the strings but won't capture anything.
#|# Literal # or # symbol.
\w+ Followed by one or more word characters.
OR
> "#someID#tn#company#somethingNew#classing#somethingElse#With".split(/\b(?=#|#)/g);
[ '#someID',
'#tn',
'#company',
'#somethingNew',
'#classing',
'#somethingElse',
'#With' ]
It will be easier without regExp:
var str = "#someID#tn#company#somethingNew#classing#somethingElse#With";
var strSplit = str.split("#");
for(var i = 1; i < strSplit.length; i++) {
strSplit[i] = "#" + strSplit[i];
}
console.log(strSplit);
// ["#someID", "#tn", "#company", "#somethingNew", "#classing", "#somethingElse", "#With"]

Javascript regex: get text at a particular line and character #

Given a chunk of text (imagine a page from a book), how can I get the word at a particular line and character #?
Find and return the word at Ln # 3, Ch # 7 "just".
var text = "Lorem ispum dolar\n
Si emit I dont know latin\n
Really just making this up as I go\n
Ok this should be enough for us to work on.\n
JSFiddle to try code on: http://jsfiddle.net/xa9xS/709/
You can use something like this (?:.*\n){2}.{6}\s+(\w+) Where this would get word of line 2+1 starting at character 6+1.
Edit: Figured I'd robustify it a bit. The above fails to match anything if you provide a character-index in the middle of a word. The following will skip ahead untill the start of a word before it starts capturing: (?:.*\n){2}.{6}.*?\b(\w+)\b.
PS: Regex in javascript doesn't support positive lookbehind, so skipping back to the start of the word is quite a bit trickier.
Edit2: Making the string.replace work requires us to capture the other parts of the string. This seems to do the trick: text.replace(/((?:.*\n){2}(?:.{6}.*?))\b(\w+)\b((?:.*\n?)*)/g, "$1[the-replacement]$3") but it does complicate things. It might be better to use the more direct approach in this case. Simplicity is king!
window.example_text = "Lorem ispum dolar\n\
Si emit I dont know latin\n\
Really just making this up as I go\n\
Ok this should be enough for us to work on.\n";
var lineNumber = 3;
var charNumber = 7;
var match = (example_text.split("\n")[lineNumber - 1]).substr(charNumber).split(/\s/)[0];
console.log(match);
http://jsfiddle.net/2DFhM/1/
Use this regex:
^(?:.*(?:\r?\n)*){2}.{6}\W+(\w+)
Explanation
The ^ anchor asserts that we are at the beginning of the string
To get to line 3, we need to skip two lines
Our line skipper is (?:.*(?:\r?\n)*){2}, matching any chars that are not line breaks, then line breaks
.{6} eats up the first six chars
There is no word starting at character 7, so we are going to match the next word:
\W+ matches any non-word chars
(\w+) captures word chars to Group 1
we retrieve the match from Group 1
In JS:
var myregex = /^(?:.*[\r\n]*){2}.{6}\W+(\w+)/;
var matchArray = myregex.exec(yourString);
if (matchArray != null) {
thematch = matchArray[1];
} else {
thematch = "";
}
Probably too late now lol, lots of good answers but here goes for the sake of completeness:
made this regexp here: http://regex101.com/r/nF2vX8/1
(?:.*\n.*){2}^(?:.{7})(\w*\W)
and here's a solution in javascript:
var index_left = 0, index_right = 0, stringy = "";
for (; line_number-- > 0;){
index_left = index_right;
index_right = example_text.indexOf("\n", index_right) + 1;
}
stringy = example_text.substring(index_left, index_right-1);
index_left = 0;
index_left = stringy.indexOf(" ", char_number+1);
stringy = stringy.substring(0, index_left);
index_left = stringy.lastIndexOf(" ", index_left);
stringy = stringy.substring(index_left+1);
console.log(stringy);
and the fiddle for the js: http://jsfiddle.net/xa9xS/714/
it mangles line_number but it's easy to fix by copying the value and i'm too bored to do it now :P

Match everything between certain characters

Wow, I suck at regex
http://regex101.com/r/lM8oX3
([*][.]+[*])
I'm trying to match text such as this:
*hello*
Just try with following regex:
(\*[^*]+\*)
In your regex you have [.] which in fact searches for dots because in [] it loses its special context and is treated as a normal character. You should better use .+ then but it will match also * characters. So use my above solution then.
Live demo
This will capture
var text = "asdfasdf *hello*";
console.log( text.match(/([*][^*]+[*])/)[1]);
But that only grabs the first match;
If you want all matches
var text = "asdfasdf *hello* asdffdsa *asdf*";
var matches = text.match(/([*][^*]+[*])/g);
if(matches.length > 1) {
for(var i=1; i<matches.length; i++) {
console.log(matches[i]);
}
}

Javascript RegExp match & Multiple backreferences

I'm having trouble trying to use multiple back references in a javascript match so far I've got: -
function newIlluminate() {
var string = "the time is a quarter to two";
var param = "time";
var re = new RegExp("(" + param + ")", "i");
var test = new RegExp("(time)(quarter)(the)", "i");
var matches = string.match(test);
$("#debug").text(matches[1]);
}
newIlluminate();
#Debug when matching the Regex 're' prints 'time' which is the value of param.
I've seen match examples where multiple back references are used by wrapping the match in parenthesis however my match for (time)(quarter)... is returning null.
Where am I going wrong? Any help would be greatly appreciated!
Your regex is literally looking for timequarterthe and splitting the match (if it finds one) into the three backreferences.
I think you mean this:
var test = /time|quarter|the/ig;
Your regex test simply doesn't match the string (as it does not contain the substring timequarterthe). I guess you want alternation:
var test = /time|quarter|the/ig; // does not even need a capturing group
var matches = string.match(test);
$("#debug").text(matches!=null ? matches.join(", ") : "did not match");

Categories