Regex Match Punctuation Space but Retain Punctuation - javascript

I have a large paragraph string which I'm trying to split into sentences using JavaScript's .split() method. I need a regex that will match a period or a question-mark [?.] followed by a space. However, I need to retain the period/question-mark in the resulting array. How can I do this without positive lookbehinds in JS?
Edit: Example input:
"This is sentence 1. This is sentence 2? This is sentence 3."
Example output:
["This is sentence 1.", "This is sentence 2?", "This is sentence 3."]

This regex will work
([^?.]+[?.])(?:\s|$)
Regex Demo
JS Demo
Ideone Demo
var str = 'This is sentence 1. This is sentence 2? This is sentence 3.';
var regex = /([^?.]+[?.])(?:\s|$)/gm;
var m;
while ((m = regex.exec(str)) !== null) {
document.writeln(m[1] + '<br>');
}

Forget about split(). You want match()
var text = "This is an example paragragh. Oh and it has a question? Ok it's followed by some other random stuff. Bye.";
var matches = text.match(/[\w\s'\";\(\)\,]+(\.|\?)(\s|$)/g);
alert(matches);
The generated matches array contains each sentence:
Array[4]
0:"This is an example paragragh. "
1:"Oh and it has a question? "
2:"Ok it's followed by some other random stuff. "
4:"Bye. "
Here is the fiddle with it for further testing: https://jsfiddle.net/uds4cww3/
Edited to match end of line too.

May be this one validates your array items
\b.*?[?\.](?=\s|$)
Debuggex Demo

This is tacky, but it works:
var breakIntoSentences = function(s) {
var l = [];
s.replace(/[^.?]+.?/g, a => l.push(a));
return l;
}
breakIntoSentences("how? who cares.")
["how?", " who cares."]
(Really how it works: the RE matches a string of not-punctuation, followed by something. Since the match is greedy, that something is either punctuation or the end-of-string.)
This will only capture the first in a series of punctuation, so breakIntoSentences("how???? who cares...") also returns ["how?", " who cares."]. If you want to capture all the punctuation, use /[^.?]+[.?]*/g as the RE instead.
Edit: Hahaha: Wavvves teaches me about match(), which is what the replace/push does. You learn something knew every goddamn day.
In its minimal form, supporting three punctuation marks, and using ES6 syntax, we get:
const breakIntoSentences = s => s.match(/[^.?,]+[.?,]*/g)

I guess .match will do it:
(?:\s?)(.*?[.?])
I.e.:
sentence = "This is sentence 1. This is sentence 2? This is sentence 3.";
result = sentence.match(/(?:\s?)(.*?[.?])/ig);
for (var i = 0; i < result.length; i++) {
document.write(result[i]+"<br>");
}

Related

Regex - to extract text before the last a hyphen/dash

Example data expected output
sds-rwewr-dddd-cash0-bbb cash0
rrse-cash1-nonre cash1
loan-snk-cash2-ssdd cash2
garb-cash3-dfgfd cash3
loan-unwan-cash4-something cash4
The common pattern is here, need to extract a few chars before the last hyphen of given string.
var regex1= /.*(?=(?:-[^-]*){1}$)/g ; //output will be "ds-rwewr-dddd-cash0" from "sds-rwewr-dddd-cash0-bbb "
var regex2 = /\w[^-]*$/g ; //output will be "cash0" from "ds-rwewr-dddd-cash0"
var res =regex2.exec(regex1.exec(sds-rwewr-dddd-cash0-bbb)) //output will cash0
Although above nested regex is working as expected but may not be optimize one. So any help will be appreciated for optimized regex
You can use
/\w+(?=-[^-]*$)/
If the part before the last hyphen can contain chars other than word chars, keep using \w[^-]*: /\w[^-]*(?=-[^-]*$)/. If you do not need to check the first char of your match, simply use /[^-]+(?=-[^-]*$)/.
See the regex demo.
Details:
\w+ - one or more word chars
(?=-[^-]*$) - that must be followed with - and then zero or more chars other than - till the end of string.
JavaScript demo
const texts = ['sds-rwewr-dddd-cash0-bbb','rrse-cash1-nonre','loan-snk-cash2-ssdd','garb-cash3-dfgfd','loan-unwan-cash4-something'];
const regex = /\w+(?=-[^-]*$)/;
for (var text of texts) {
console.log(text, '=>', text.match(regex)?.[0]);
}

limit character number for specific words starting with # in JavaScript

I have some issues, I need to "limit" character for specific word with special character (10 characters)
example in a textarea :
The #dog is here, I need a #rest and this is not #availableeeeeeeee for now
the word "availableeeeeeeee" needs to be cut when I reach 10 characters
Desired results
The #dog is here, I need a #rest and this is not #availablee for now
My question is how to limit characters for each word that containing a hashtag?
Thanks
1. Regex Solution:
You can use .replace() method with the following regex /(#\w{10})\[\w\d\]+/g, it will remove the extra characters:
str = str.replace(/(#\w{10})[\w\d]+/g, '$1');
Demo:
var str = "The #dog is here, I need a #rest and this is not #availableeeeeeeee for now";
str = str.replace(/(#\w{10})[\w\d]+/g, '$1');
console.log(str);
Note:
This regex matches the words starting with # using a matching group to get only the first 10 characters.
Full match #availableeeeeeeee
Group 1. n/a #availablee
And the .replace() call will keep only the matched group from the regex and skip the extra characters.
Note that you need to attach this code in the onchange event handler of your textarea.
2. split() Solution:
If you want to go with a solution that doesn't use Regex, you can use .split() method with Array.prototype.map() like this:
str = str.split(" ").map(function(item){
return item.startsWith("#") && item.length > 11 ? item.substr(0,11) : item;
}).join(" ");
Demo:
var str = "The #dog is here, I need a #rest and this is not #availableeeeeeeee for now";
str = str.split(" ").map(function(item){
return item.startsWith("#") && item.length > 11 ? item.substr(0,11) : item;
}).join(" ");
console.log(str);
a simple solution with javascript could be, to split text area all words into array. iterate it and validate word length.
var value = $('#text').val();
var maxSize = 10;
var words = value.trim().replace(regex, ' ').split(' ');
for(var wlength= 0 ; wlength < words.length; wlength++)
{
if(words[wlength] > maxSize)
{
alert('size exceeds max allowed');
}
}
you can try not allowing typing itself after 10 characters for any word by regular expression inline validation in HTML directly.
Well, I think you can try the following.
Using split() method will cut the string in words, then forEach word if it startsWith a '#', we substr it up to 10 + 1 characters. Finally, join everybody to obtain the final result :).
string="The #dog is here, I need a #rest and this is not #availableeeeeeeee for now"
var result = []
string.split(" ").forEach(function(item){
if (item.startsWith("#")){
result.push(item.substr(0,11));
} else result.push(item);
});
console.log(result.join(" "));

Retrieving several capturing groups recursively with RegExp

I have a string with this format:
#someID#tn#company#somethingNew#classing#somethingElse#With
There might be unlimited #-separated words, but definitely the whole string begins with #
I have written the following regexp, though it matches it, but I cannot get each #-separated word, and what I get is the last recursion and the first (as well as the whole string). How can I get an array of every word in an element separately?
(?:^\#\w*)(?:(\#\w*)+) //I know I have ruled out second capturing group with ?: , though doesn't make much difference.
And here is my Javascript code:
var reg = /(?:^\#\w*)(?:(\#\w*)+)/g;
var x = null;
while(x = reg.exec("#someID#tn#company#somethingNew#classing#somethingElse#With"))
{
console.log(x);
}
And here is the result (Firebug, console):
["#someID#tn#company#somet...sing#somethingElse#With", "#With"]
0
"#someID#tn#company#somet...sing#somethingElse#With"
1
"#With"
index
0
input
"#someID#tn#company#somet...sing#somethingElse#With"
EDIT :
I want an output like this with regular expression if possible:
["#someID", "#tn", #company", "#somethingNew", "#classing", "#somethingElse", "#With"]
NOTE that I want a RegExp solution. I know about String.split() and String operations.
You can use:
var s = '#someID#tn#company#somethingNew#classing#somethingElse#With'
if (s.substr(0, 1) == "#")
tok = s.substr(1).split('#');
//=> ["someID", "tn", "company", "somethingNew", "classing", "somethingElse", "With"]
You could try this regex also,
((?:#|#)\w+)
DEMO
Explanation:
() Capturing groups. Anything inside this capturing group would be captured.
(?:) It just matches the strings but won't capture anything.
#|# Literal # or # symbol.
\w+ Followed by one or more word characters.
OR
> "#someID#tn#company#somethingNew#classing#somethingElse#With".split(/\b(?=#|#)/g);
[ '#someID',
'#tn',
'#company',
'#somethingNew',
'#classing',
'#somethingElse',
'#With' ]
It will be easier without regExp:
var str = "#someID#tn#company#somethingNew#classing#somethingElse#With";
var strSplit = str.split("#");
for(var i = 1; i < strSplit.length; i++) {
strSplit[i] = "#" + strSplit[i];
}
console.log(strSplit);
// ["#someID", "#tn", "#company", "#somethingNew", "#classing", "#somethingElse", "#With"]

Javascript regex: get text at a particular line and character #

Given a chunk of text (imagine a page from a book), how can I get the word at a particular line and character #?
Find and return the word at Ln # 3, Ch # 7 "just".
var text = "Lorem ispum dolar\n
Si emit I dont know latin\n
Really just making this up as I go\n
Ok this should be enough for us to work on.\n
JSFiddle to try code on: http://jsfiddle.net/xa9xS/709/
You can use something like this (?:.*\n){2}.{6}\s+(\w+) Where this would get word of line 2+1 starting at character 6+1.
Edit: Figured I'd robustify it a bit. The above fails to match anything if you provide a character-index in the middle of a word. The following will skip ahead untill the start of a word before it starts capturing: (?:.*\n){2}.{6}.*?\b(\w+)\b.
PS: Regex in javascript doesn't support positive lookbehind, so skipping back to the start of the word is quite a bit trickier.
Edit2: Making the string.replace work requires us to capture the other parts of the string. This seems to do the trick: text.replace(/((?:.*\n){2}(?:.{6}.*?))\b(\w+)\b((?:.*\n?)*)/g, "$1[the-replacement]$3") but it does complicate things. It might be better to use the more direct approach in this case. Simplicity is king!
window.example_text = "Lorem ispum dolar\n\
Si emit I dont know latin\n\
Really just making this up as I go\n\
Ok this should be enough for us to work on.\n";
var lineNumber = 3;
var charNumber = 7;
var match = (example_text.split("\n")[lineNumber - 1]).substr(charNumber).split(/\s/)[0];
console.log(match);
http://jsfiddle.net/2DFhM/1/
Use this regex:
^(?:.*(?:\r?\n)*){2}.{6}\W+(\w+)
Explanation
The ^ anchor asserts that we are at the beginning of the string
To get to line 3, we need to skip two lines
Our line skipper is (?:.*(?:\r?\n)*){2}, matching any chars that are not line breaks, then line breaks
.{6} eats up the first six chars
There is no word starting at character 7, so we are going to match the next word:
\W+ matches any non-word chars
(\w+) captures word chars to Group 1
we retrieve the match from Group 1
In JS:
var myregex = /^(?:.*[\r\n]*){2}.{6}\W+(\w+)/;
var matchArray = myregex.exec(yourString);
if (matchArray != null) {
thematch = matchArray[1];
} else {
thematch = "";
}
Probably too late now lol, lots of good answers but here goes for the sake of completeness:
made this regexp here: http://regex101.com/r/nF2vX8/1
(?:.*\n.*){2}^(?:.{7})(\w*\W)
and here's a solution in javascript:
var index_left = 0, index_right = 0, stringy = "";
for (; line_number-- > 0;){
index_left = index_right;
index_right = example_text.indexOf("\n", index_right) + 1;
}
stringy = example_text.substring(index_left, index_right-1);
index_left = 0;
index_left = stringy.indexOf(" ", char_number+1);
stringy = stringy.substring(0, index_left);
index_left = stringy.lastIndexOf(" ", index_left);
stringy = stringy.substring(index_left+1);
console.log(stringy);
and the fiddle for the js: http://jsfiddle.net/xa9xS/714/
it mangles line_number but it's easy to fix by copying the value and i'm too bored to do it now :P

Get all words starting with X and ending with Y

I have got a textarea with keyup=validate()
I need a javascript function that gets all words starting with # and ending with a character that is not A-Za-z0-9
For example:
This is a text #user1 this is more text #user2. And this is even more #user3!
The function gives an array:
Array("#user1","#user2","#user3");
I am sure there must be a way to do this written on somewhere on the internet if I just google something but I have no idea what I have to look for.. I am very new with regular expresions.
Thank you very much!
The regular expression you want is:
/#[a-z\d]+/ig
This matches # followed by a sequence of letters and numbers. The i modifier makes it case-insensitive, so you don't have to put A-Z in the character class, and g makes it find all the matches.
var str = "This is a text #user1 this is more text #user2. And this is even more #user3!";
var matches = str.match(/#[a-z\d]+/ig);
console.log(matches);
JS
var str = "This is a text #user1 this is more text #user2. And this is even more #user3!",
var textArr = str.split(" ");
for(var i = 0; i < textArr.length; i++) {
var test = textArr[i];
matches = test.match(/^#.*.[A-Za-z0-9]$/);
console.log(matches);
};
Explanation:
You should also read about the regex(http://www.w3schools.com/jsref/jsref_obj_regexp.asp) and match(http://www.w3schools.com/jsref/jsref_match.asp) to get an idea how it works.
Basically, applying ^# means starting the regex look for #. $ means ending with. and .* any character in between.
To Test: http://www.regular-expressions.info/javascriptexample.html
Thanks for the replies above, they've helped me - Where I've written this method that hopefully answers the question about having a start and end regex check.
In this example it looks for ##_ at the start and _## at the end
e.g. ##_ anyTokenYouNeedToFind _##.
Code:
const tokenSearchHelper = (inputText) => {
let matches = inputText.match(/##_[a-zA-Z0-9_\d]+_##/ig);
return matches;
}
const out = tokenSearchHelper("Hello ##_World_##");
console.log(out);

Categories