Regex to match string in a sentence

Regex to match string in a sentence - javascript

I am trying to find a strictly declared string in a sentence, the thread says:
Find the position of the string "ten" within a sentence, without using the exact string directly (this can be avoided in many ways using just a bit of RegEx). Print as many spaces as there were characters in the original sentence before the aforementioned string appeared, and then the string itself in lowercase.
I've gotten this far:
let words = 'A ton of tunas weighs more than ten kilograms.'
function findTheNumber(){
let regex=/t[a-z]*en/gi;
let output = words.match(regex)
console.log(words)
console.log(output)
}
console.log(findTheNumber())
The result should be:
input = A ton of tunas weighs more than ten kilograms.
output = ten(ENTER)

You could try a regex replacement approach, with the help of a callback function:
var input = "A ton of tunas weighs more than ten kilograms.";
var output = input.replace(/\w+/g, function(match, contents, offset, input_string)
{
if (!match.match(/^[t][e][n]$/)) {
return match.replace(/\w/g, " ");
}
else {
return match;
}
});
console.log(input);
console.log(output);
The above logic matches every word in the input sentence, and then selectively replaces every word which is not ten with an equal number of spaces.

You can use
let text = 'A ton of tunas weighs more than ten kilograms.'
function findTheNumber(words){
console.log( words.replace(/\b(t[e]n)\b|[^.]/g, (x,y) => y ?? " ") )
}
findTheNumber(text)
The \b(t[e]n)\b is basically ten whole word searching pattern.
The \b(t[e]n)\b|[^.] regex will match and capture ten into Group 1 and will match any char but . (as you need to keep it at the end). If Group 1 matches, it is kept (ten remains in the output), else the char matched is replaced with a space.
Depending on what chars you want to keep, you may adjust the [^.] pattern. For example, if you want to keep all non-word chars, you may use \w.

Related

How can I choose the whole string if there's no number after any math operator with JavaScript RegExp

So I've tried my best with this one:
/\d+([+-/*.])\d{0}/g
Hoping it to match with for example 55- (But when there's no number after any math operator)
but it matched with 55- even though there is some number after the operator. (for example: 55-5 It chose the first three characters but as you can see there's "5" after it.)
If you can help I appreciate it!
Also this is my first question on stackoverflow.

{0} doesn't mean "There should be exactly 0 after this", per regex101
{0} matches the previous token exactly zero times (causes token to be ignored)
You also forgot to escape the - and \ in your [ ] section.
This is your original regex: https://regex101.com/r/JGnJe7/1/
Use this regex for the whole string /^\d+([+\-\/*.])$/.
const matchString = str => str.match(/^\d+([+\-\/*.])$/);
const fiftyfivedash = "55-";
const fiftyfivedashfive = "55-5";
console.log(matchString(fiftyfivedash));
console.log(matchString(fiftyfivedashfive));
/^\d+([+\-\/*.])$/ means: https://regex101.com/r/SDrDJx/1
Match must start at index 0 (^)
One or more numbers
One of the symbols
Match must end at the last index ($)
Some alternatives if you don't want the whole string:
const matchString = str => str.match(/\d+[+\-\/*.](?!\d)/g);
const matchStringOther = str => str.match(/\d+[+\-\/*.](?=\D|$)/g);
const fiftyfivedash = "hello 55- 4+ 66*";
const fiftyfivedashfive = "55-5 45-2 456+2";
console.log(matchString(fiftyfivedash));
console.log(matchString(fiftyfivedashfive));
console.log(matchStringOther(fiftyfivedash));
console.log(matchStringOther(fiftyfivedashfive));
/\d+[+\-\/*.](?!\d)/g means: https://regex101.com/r/sDgGTV/1
One or more numbers
One of the symbols
Noncapturing (the ?), anything except (the !): a number
Search multiple times and return all matches (the g)
/\d+[+\-\/*.](?=\D|$)/g means: https://regex101.com/r/HZayTg/1
One or more numbers
One of the symbols
Noncapturing (the ?), either the next character is not a number, or there is no more characters (the $ means end of string)
Search multiple times and return all matches (the g)

RegEx for replacing punctuation excluding negative numbers

Currently, to remove punctuation from a string, I use:
export function scrubPunctuation(text) {
let reg = /\b[-.,()&$#![\]{}"']+\B|\B[-.,()&$#![\]{}"']+\b/g;
return text.replace(reg, "");
}
but this also removes -1, where - is not so much "punctuation" as part of a numerical value.
How do I solve this problem?
Example use case:
I have take a string from a user that might look like this:
const userStr = " I want something, sort of, that has at least one property < -1.02 ? "
Currently, my approach is to first trim the string to remove the leading / trailing white space.
Then I "scrub" punctuation from the string.
From the example of userStr above, I might eventually parse out (via some unrelated to regex):
const relevant = ["something", "at least one", "<", "-1.02"]
In general, non-numeric punctuation is irrelevant.

Split your first character set. Remove the hyphen from the first set and add a Negative lookahead for the hyphen:
[-]+(?![0-9]) \\a Hyphen not followed by a number
And the full expression:
\b[-]+(?![0-9])|[-.,()&$#![\]{}"']+\B|\B[.,()&$#![\]{}"']+\b
Here is a working example

If you don't want the minus sign or the dot or comma removed form the digits, one option might be to capture what you want to keep (in this case a digit with an optional decimal part) and match what you want to remove.
(-?\d+(?:[.,]\d+)*)|[-.,()&$#![\]{}"']+
Regex demo
let pattern = /(-?\d+(?:[.,]\d+)*)|[-.,()&$#![\]{}"']+/g;
let str = "This is -4, -55 or -4,00.00 (test) 5,00";
let res = str.replace(pattern, "$1");
console.log(res);

something like /[,?!.']/g could do the job and you add whatever you want
const text = "bar........,foo,????!-1'poo!!!?'";
const res = text.replace(/[,?!.']/g, "")
console.log(res)

I would split it into two.
First I would remove everything but alphanumeric and -.
/[^a-z0-9\-\s\n]/gi
It is a little more readable than your method and should give the same result unless there is some character you want to keep (like whitespace \s and newline \n).
To get rid of the punctuation "-", I would use:
/-(\d*)/g
So altogether:
export function scrubPunctuation(text) {
let reg = /[^a-z0-9\-\s\n]/gi;
let reg2 = /-(\d*)/g;
text = text.replace(reg, "");
return text.replace(reg2, "$1");
}
Haven't tested it, but it should work

Capturing parentheses - /(\d)/ ? or /\s;\s/?

I am reading about split and below is a variable looking at the string values. However I do not understand what the symbols are looking for.
According to the page: If separator contains capturing parentheses, matched results are returned in the array.
var myString = 'Hello 1 word. Sentence number 2.';
var splits = myString.split(/(\d)/);
console.log(splits);
// Results
[ "Hello ", "1", " word. Sentence number ", "2", "." ]
My question is, what is happening here? Parentheses "(" or ")" is not part of the string. Why is space or "." separated for some and not the other?
Another one is /\s*;\s*
States it removes semi-colon before and after if there are 0 or more space. Does this mean /\s* mean it looks for a space and remove and ';' in this case is the separator?
var names = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ';
console.log(names);
var re = /\s*;\s*/;
var nameList = names.split(re);
console.log(nameList);
// Results
["Harry Trump", "Fred Barney", "Helen Rigby", "Bill Abel", "Chris Hand "]
If so why doesn't /\s*^\s*/ remobe space before and after ^ symbol if my string looked like this.
var names = 'Harry Trump ^Fred Barney^ Helen Rigby ^ Bill Abel ^Chris Hand ';
console.log(names);
var re = /\s*^\s*/;
var nameList = names.split(re);
console.log(nameList);
I would like to know what the symbols mean and why they are in certain order. Thanks you.

It seems you got your examples from here.
First let's look at this one /(\d)/.
Working inside out, recognize that \d escapes all digits.
Now, from the article, wrapping the parentheses around the escape tells the split method to keep the delimiter (which in this case is any digit) in the returned array. Notice that without the parentheses, the returned array wouldn't have numeric elements (as strings of course). Lastly, it is wrapped in slashes (//) to create a regular expression. Basically this case says: split the string by digits and keep the digits in the returned array.
The second case /\s*;\s* is a little more complicated and will take some understanding of regular expressions. First note that \s escapes a space. In regular expressions, a character c followed by a * says 'look for 0 or more of c, in consecutive order'. So this regular expression matches strings like ' ; ', ';', etc (I added the single quotes to show the spaces). Note that in this case, we don't have parentheses, so the semicolons will be excluded from the returned array.
If you're still stuck, I'd suggest reading about regular expressions and practice writing them. This website is great, just be be weary of the fact that regular expressions on that site may be slightly different than those used in javascript in terms of syntax.

The 1st example below splits the input string at any digit, keeping the delimiter (i.e. the digit) in the final array.
The 2nd example below shows that leaving the parentheses out still splits the array at any digit, but those digit delimiters are not included in the final array.
The 3rd example below splits the input string any time the following pattern is encountered: as many consecutive spaces as possible (including none) immediately followed by a semi-colon immediately followed by as many consecutive spaces as possible (including none).
The 4th example below shows that you can indeed split a similar input string as in the 3rd example but with "^" replacing ";". However, because the "^" by itself means "the start of the string" you have to tell JavaScript to find the actual "^" by putting a backslash (i.e. a special indicator designated for this purpose) right in front of it, i.e. "\^".
const show = (msg) => {console.log(JSON.stringify(msg));};
var myString = 'Hello 1 word. Sentence number 2.';
var splits1 = myString.split(/(\d)/);
show(splits1);
var splits2 = myString.split(/\d/);
show(splits2);
var names1 = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ';
var nameList1 = names1.split(/\s*;\s*/);
show(nameList1);
var names2 = 'Harry Trump ^Fred Barney^ Helen Rigby ^ Bill Abel ^Chris Hand ';
var nameList2 = names2.split(/\s*\^\s*/);
show(nameList2);

How to store only the nth substring into a variable in Javascript

var a="how are you?";
In the above example I want to store the second word "are" into another variable in a single step.
I don't want to use something like below
var bigArray = a.split(" ");
var secondText = bigArray[1];
as we may need to store the entire paragraph into a big array and consume a lot of memory without any use.
I would like to know if there is some function which works as below
var secondText=specialFunction(a," ",1);
so that we will get the second substring when the paragraph is split by " "

Well, I would spend my time worrying about more important things than the size of some arrays.
Anyway, you could try using a regexp:
var secondText = (a.match(/ (\w+)/) || []) [1];
This reads as "find a space, then capture the following word".
The || [] part is meant to deal with the situation where there is no match (for example, no second word). In that case, the result will be [][1] which is undefined.
This finds only the second word. What about the more general case? Since we are not allowed to split the string on spaces, because that would create an array and the OP doesn't want that due to memory concerns. So, we will instead build a dynamic regexp. To find the nth word, we want to skip over the first n-1 spaces. Or, to be more precise, we want to skip over the first word, some spaces, then the second word, then some more spaces, etc. So the regexp is
/(?:\w+ ){n}(\w+)/
^^ NO CAPTURING GROUP
^^^^ WORD FOLLOWED BY SPACE
^^^ N TIMES
^^^^^ CAPTURE FOLLOWING WORD
The ?: is to avoid this being treated as a capturing group. We build the regexp using
function make_nth_word_regexp(n) {
n--;
return new RegExp("(?:\\w+ ){" + n + "}(\\w+)");
}
Now look for your nth word:
var fifth_word = str.match(make_nth_word_regexp(5)) [1];
> "Hey there you".match(make_nth_word_regexp(3))[1]
< "you"

Alternative to regex is just to use substring(). Something like
var a="how are you";
alert(a.substring(a.indexOf(" "), a.length).substring(0, a.indexOf(" ")+1));

JavaScript: How can I remove any words containing (or directly preceding) capital letters, numbers, or commas, from a string?

I'm trying to write the code so it removes the "bad" words from the string (the text).
The word is "bad" if it has comma or any special sign thereafter. The word is not "bad" if it contains only a to z (small letters).
So, the result I'm trying to achieve is:
<script>
String.prototype.azwords = function() {
return this.replace(/[^a-z]+/g, "0");
}
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove.".azwords();//should be "good gooood"
//Remove has a capital letter
//remove1 has 1
//remove, has comma
//### has three #
//rem0ve? has 0 and ?
//RemoVE has R and V and E
//remove. has .
alert(res);//should alert "good gooood"
</script>

Try this:
return this.replace(/(^|\s+)[a-z]*[^a-z\s]\S*(?!\S)/g, "");
It tries to match a word (that is surrounded by whitespaces / string ends) and contains any (non-whitespace) character but at least one that is not a-z. However, this is quite complicated and unmaintainable. Maybe you should try a more functional approach:
return this.split(/\s+/).filter(function(word) {
return word && !/[^a-z]/.test(word);
}).join(" ");

okay, first off you probably want to use the word boundary escape \b in your regex. Also, it's a bit tricky if you match the bad words, because a bad word might contain lower case chars, so your current regex will exclude anything which does have lowecase letters.
I'd be tempted to pick out the good words and put them in a new string. It's a much easier regex.
/\b[a-z]+\b/g
NB: I'm not totally sure that it'll work for the first and last words in the string so you might need to account for that as well. http://www.regextester.com/ is exceptionally useful.
EDIT: as you want punctiation after the word to be 'bad', this will actually do what I was suggesting
(^|\s)[a-z]+(\s|$)

Firstly I wouldn't recommend changing the prototype of String (or of any native object) if you can avoid because you leave yourself open to conflicts with other code that might define the same property in different ways. Much better to put custom methods like this on a namespaced object, though I'm sure some will disagree.
Second, is there any need to use RegEx completely? (Genuine question; not trying to be facetious.)
Here is an example of the function with plain old JS using a little bit of RegEx here and there. Easier to comment, debug, and reuse.
Here is the code:
var azwords = function(str) {
var arr = str.split(/\s+/),
len = arr.length,
i = 0,
res = "";
for (i; i < len; i += 1) {
if (!(arr[i].match(/[^a-z]/))) {
res += (!res) ? arr[i] : " " + arr[i];
}
}
return res;
}
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove."; //should be "good gooood"
//Remove has a capital letter
//remove1 has 1
//remove, has comma
//### has three #
//rem0ve? has 0 and ?
//RemoVE has R and V and E
//remove. has .
alert(azwords(res));//should alert "good gooood";

Try this one:
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove.";
var new_one = res.replace(/\s*\w*[#A-Z0-9,.?\\xA1-\\xFF]\w*/g,'');
//Output `good gooood`
Description:
\s* # zero-or-more spaces
\w* # zero-or-more alphanumeric characters
[#A-Z0-9,.?\\xA1-\\xFF] # matches any list of characters
\w* # zero-or-more alphanumeric characters
/g - global (run over all string)

This will find all the words you want /^[a-z]+\s|\s[a-z]+$|\s[a-z]+\s/g so you could use match.
this.match(/^[a-z]+\s|\s[a-z]+$|\s[a-z]+\s/g).join(" "); should return the list of valid words.
Note that this took some time as a JSFiddle so it maybe more efficient to split and iterate your list.

We Keep Coding

JavaScript is the programming language of the Web.

Regex to match string in a sentence - javascript

Related

How can I choose the whole string if there's no number after any math operator with JavaScript RegExp

RegEx for replacing punctuation excluding negative numbers

Capturing parentheses - /(\d)/ ? or /\s;\s/?

How to store only the nth substring into a variable in Javascript

JavaScript: How can I remove any words containing (or directly preceding) capital letters, numbers, or commas, from a string?

Categories

Resources

We Keep Coding

JavaScript is the programming language of the Web.

Regex to match string in a sentence - javascript

Related

How can I choose the whole string if there's no number after any math operator with JavaScript RegExp

RegEx for replacing punctuation excluding negative numbers

Capturing parentheses - /(\d)/ ? or /\s*;\s*/?

How to store only the nth substring into a variable in Javascript

JavaScript: How can I remove any words containing (or directly preceding) capital letters, numbers, or commas, from a string?

Categories

Resources

Capturing parentheses - /(\d)/ ? or /\s;\s/?