Check if sentence contains a phrase - javascript

Sentences:
Hey checkout Hello World <- SHOULD BE INCLUDED
hello world is nice! <- SHOULD BE INCLUDED
Hhello World should not work <- SHOULD NOT BE INCLUDED
This too Hhhello World <- SHOULD NOT BE INCLUDED
var phraseToSearch = "Hello World";
Do note: sentence.ToLower().IndexOf(phraseToSearch.ToLower()) would not work as it would include all the above sentences while the result should only include sentences 1 and 2

You can use regular expression to match a character pattern with a string.
The regular expression is simply looking for Hello World the exact letters you are looking for with \b a word border and using the i case insensitive modifier.
Regex has a method test that will run the regular expression on the given string. It will return a true if the regular expression matched.
const phraseToSearch = /\bhello world\b/i
const str1 = 'Hey checkout Hello World'
const str2 = 'hello world is nice!'
const str3 = 'Hhello World should not work'
const str4 = 'This too Hhhello World'
console.log(
phraseToSearch.test(str1),
phraseToSearch.test(str2),
phraseToSearch.test(str3),
phraseToSearch.test(str4)
)

You probably want to use a regular expression. Here are the things you want to match
Text (with spaces surrounding it)
... Text (with space on one side, and end of text on the other)
Text ... (with space on one side, and start of side on the other)
Text (just the string, on its own)
One way to do it, without a regular expression, is just to put 4 conditions (one for each bullet point above) and join them up with a &&, but that would lead to messy code.
Another option is to split both strings be spaces, and checking if one array was a subarray of another.
However, my solution uses a regular expression - which is a pattern you can test on a string.
Our pattern should
Look for a space/start of string
Check for the string
Look for a space/end of string
\b, according to this, will match spaces, seperators of words, and ends of strings. These things are called word boundries.
Here is the code:
function doesContain(str, query){ // is query in str
return new RegExp("\b" + query + "\b", "i").test(str)
}
The i makes the match case insensitive.

Related

How to match bold markdown if it isn't preceded with a backslash?

I'm looking to match bolded markdown. Here are some examples:
qwer *asdf* zxcv matches *asdf*
qwer*asdf*zxcv matches *asdf*
qwer \*asdf* zxcv does not match
*qwer* asdf zxcv matches *qwer*
A negative look behind like this (?<!\\)\*(.*)\* works.
Except there is no browser support in Firefox, so I cannot use it.
Similarly, I can get very close with (^|[^\\])\*(.*)\*
The issue is that there are two capture groups, and I need the index of the second capture group, and Javascript only returns the index of the first capture group. I can bandaid it in this case by just adding 1, but in other cases this hack will not work.
My reasoning for doing this is that I'm trying to replace a small subset of Markdown with React components. As an example, I'm trying to convert this string:
qwer *asdf* zxcv *123*
Into this array:
[ "qwer ", <strong>asdf</strong>, " zxcv ", <strong>123</strong> ]
Where the second and fourth elements are created via JSX and included as array elements.
You will also need to take into account that when a backslash occurs before an asterisk, it may be one that is itself escaped by a backslash, and in that case the asterisk should be considered the start of bold markup. Except if that one is also preceded by a backslash,...etc.
So I would suggest this regular expression:
((?:^|[^\\])(?:\\.)*)\*((\\.|[^*])*)\*
If the purpose is to replace these with tags, like <strong> ... </strong>, then just use JavaScript's replace as follows:
let s = String.raw`now *this is bold*, and \\*this too\\*, but \\\*this\* not`;
console.log(s);
let regex = /((?:^|[^\\])(?:\\.)*)\*((\\.|[^*])*)\*/g;
let res = s.replace(regex, "$1<strong>$2</strong>");
console.log(res);
If the bolded words should be converted to a React component and stored in an array with the other pieces of plain text, then you could use split and map:
let s = String.raw`now *this is bold*, and \\*this too\\*, but \\\*this\* not`;
console.log(s);
let regex = /((?:^|[^\\])(?:\\.)*)\*((?:\\.|[^*])*)\*/g;
let res = s.split(regex).map((s, i) =>
i%3 === 2 ? React.createComponent("strong", {}, s) : s
);
Since there are two capture groups in the "delimiter" for the split call, one having the preceding character(s) and the second the word itself, every third item in the split result is a word to be bolded, hence the i%3 expression.
This should do the trick:
/(?:^|[^\\])(\*[^*]+[^\\]\*)/
The only capturing group there is the string surrounded by *'s.

Using regex to split string in javascript

I'd like to split my string so that "Hello the cost 12.50 Hello this item is 7.30" would become ["Hello the cost is 12.50", "Hello this item is 7.30"]. I started off by first finding in the string what matches the 12.50 and 7.30 (floats), but can't seem to figure out how to split it by that number.
Use the regex pattern (.*? \d+(?:\.\d+)?)\s*, and find all matches:
var re = /(.*? \d+(?:\.\d+)?)\s*/g;
var s = 'Hello the cost 12.50 Hello this item is 7.30';
var m;
do {
m = re.exec(s);
if (m) {
console.log(m[1]);
}
} while (m);
This might be a RegExp you are looking for:
'The price is 9.50. Another price is 22.74'.match(/(?=^|\.\s)\D+\d+(\.\d+)?(?=.|$)/gmu)
What this tells to JS RegExp engine:
Dear Engine,
Please, find something directly preceded by start of string or a dot & a space without including it in result.
After that there should be one or more non-numbers.
Then there should be one or more numbers that might be followed by a dot and one or more numbers.
Finally, all that should be located just before end of string or a dot. Please, include neither in the result.
Search for this pattern globally, in multiline mode & be aware of any unicode characters should there be any in the search string.

Replacing characters at the start and end of a certain string

Suppose this string:
b*any string here*
In case this exists, I want to replace b* at the beginning to <b>, and the * at the end to </b> (Disregard the backslash I need it for escaping on SO site).
Moreover, there might be more then one match:
b*any string here* and after that b*string b*.
These cases should not be handled:
b*foo bar
foo bar*
bb*foo bar* (b is not after a whitespace or beginning of string).
I've gotten this far:
(?<=b\*)(.*?)(?=\*)
This gives me the string in between but Im having difficulties in doing the swap.
Use String#replace, you only need to capture the text you want to preserve:
var result = theString.replace(/\bb\*(.*?)\*/g, "<b>$1</b>");
The \b at the begining of the regex means word boundary so that it only matches bs that are not part of a word. $1 means the first captured group (.*?).
Example:
var str1 = "b*any string here* and after that b*string b*.";
var str2 = `b*foo bar
foo bar*
bb*foo bar* (b is not after a whitespace or beginning of string).`;
console.log(str1.replace(/\bb\*(.*?)\*/g, "<b>$1</b>"));
console.log(str2.replace(/\bb\*(.*?)\*/g, "<b>$1</b>"));
You could use \b(?:b\*)(.+?)(?:\*), so
const result = yourString.replace(/\b(?:b\*)(.+?)(?:\*)/, "<b>$1</b>");
See the 'Replace' tab https://regexr.com/447cq

Capturing parentheses - /(\d)/ ? or /\s*;\s*/?

I am reading about split and below is a variable looking at the string values. However I do not understand what the symbols are looking for.
According to the page: If separator contains capturing parentheses, matched results are returned in the array.
var myString = 'Hello 1 word. Sentence number 2.';
var splits = myString.split(/(\d)/);
console.log(splits);
// Results
[ "Hello ", "1", " word. Sentence number ", "2", "." ]
My question is, what is happening here? Parentheses "(" or ")" is not part of the string. Why is space or "." separated for some and not the other?
Another one is /\s*;\s*
States it removes semi-colon before and after if there are 0 or more space. Does this mean /\s* mean it looks for a space and remove and ';' in this case is the separator?
var names = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ';
console.log(names);
var re = /\s*;\s*/;
var nameList = names.split(re);
console.log(nameList);
// Results
["Harry Trump", "Fred Barney", "Helen Rigby", "Bill Abel", "Chris Hand "]
If so why doesn't /\s*^\s*/ remobe space before and after ^ symbol if my string looked like this.
var names = 'Harry Trump ^Fred Barney^ Helen Rigby ^ Bill Abel ^Chris Hand ';
console.log(names);
var re = /\s*^\s*/;
var nameList = names.split(re);
console.log(nameList);
I would like to know what the symbols mean and why they are in certain order. Thanks you.
It seems you got your examples from here.
First let's look at this one /(\d)/.
Working inside out, recognize that \d escapes all digits.
Now, from the article, wrapping the parentheses around the escape tells the split method to keep the delimiter (which in this case is any digit) in the returned array. Notice that without the parentheses, the returned array wouldn't have numeric elements (as strings of course). Lastly, it is wrapped in slashes (//) to create a regular expression. Basically this case says: split the string by digits and keep the digits in the returned array.
The second case /\s*;\s* is a little more complicated and will take some understanding of regular expressions. First note that \s escapes a space. In regular expressions, a character c followed by a * says 'look for 0 or more of c, in consecutive order'. So this regular expression matches strings like ' ; ', ';', etc (I added the single quotes to show the spaces). Note that in this case, we don't have parentheses, so the semicolons will be excluded from the returned array.
If you're still stuck, I'd suggest reading about regular expressions and practice writing them. This website is great, just be be weary of the fact that regular expressions on that site may be slightly different than those used in javascript in terms of syntax.
The 1st example below splits the input string at any digit, keeping the delimiter (i.e. the digit) in the final array.
The 2nd example below shows that leaving the parentheses out still splits the array at any digit, but those digit delimiters are not included in the final array.
The 3rd example below splits the input string any time the following pattern is encountered: as many consecutive spaces as possible (including none) immediately followed by a semi-colon immediately followed by as many consecutive spaces as possible (including none).
The 4th example below shows that you can indeed split a similar input string as in the 3rd example but with "^" replacing ";". However, because the "^" by itself means "the start of the string" you have to tell JavaScript to find the actual "^" by putting a backslash (i.e. a special indicator designated for this purpose) right in front of it, i.e. "\^".
const show = (msg) => {console.log(JSON.stringify(msg));};
var myString = 'Hello 1 word. Sentence number 2.';
var splits1 = myString.split(/(\d)/);
show(splits1);
var splits2 = myString.split(/\d/);
show(splits2);
var names1 = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ';
var nameList1 = names1.split(/\s*;\s*/);
show(nameList1);
var names2 = 'Harry Trump ^Fred Barney^ Helen Rigby ^ Bill Abel ^Chris Hand ';
var nameList2 = names2.split(/\s*\^\s*/);
show(nameList2);

Create a permalink with JavaScript

I have a textbox where a user puts a string like this:
"hello world! I think that __i__ am awesome (yes I am!)"
I need to create a correct URL like this:
hello-world-i-think-that-i-am-awesome-yes-i-am
How can it be done using regular expressions?
Also, is it possible to do it with Greek (for example)?
"Γεια σου κόσμε"
turns to
geia-sou-kosme
In other programming languages (Python/Ruby) I am using a translation array. Should I do the same here?
Try this:
function doDashes(str) {
var re = /[^a-z0-9]+/gi; // global and case insensitive matching of non-char/non-numeric
var re2 = /^-*|-*$/g; // get rid of any leading/trailing dashes
str = str.replace(re, '-'); // perform the 1st regexp
return str.replace(re2, '').toLowerCase(); // ..aaand the second + return lowercased result
}
console.log(doDashes("hello world! I think that __i__ am awesome (yes I am!)"));
// => hello-world-I-think-that-i-am-awesome-yes-I-am
As for the greek characters, yeah I can't think of anything else than some sort of lookup table used by another regexp.
Edit, here's the oneliner version:
Edit, added toLowerCase():
Edit, embarrassing fix to the trailing regexp:
function doDashes2(str) {
return str.replace(/[^a-z0-9]+/gi, '-').replace(/^-*|-*$/g, '').toLowerCase();
}
A simple regex for doing this job is matching all "non-word" characters, and replace them with a -. But before matching this regex, convert the string to lowercase. This alone is not fool proof, since a dash on the end may be possible.
[^a-z]+
Thus, after the replacement; you can trim the dashes (from the front and the back) using this regex:
^-+|-+$
You'd have to create greek-to-latin glyps translation yourself, regex can't help you there. Using a translation array is a good idea.
I can't really say for Greek characters, but for the first example, a simple:
/[^a-zA-Z]+/
Will do the trick when using it as your pattern, and replacing the matches with a "-"
As per the Greek characters, I'd suggest using an array with all the "character translations", and then adding it's values to the regular expression.
To roughly build the url you would need something like this.
var textbox = "hello world! I think that __i__ am awesome (yes I am!)";
var url = textbox.toLowerCase().replace(/([^a-z])/, '').replace(/\s+/, " ").replace(/\s/, '-');
It simply removes all non-alpha characters, removes double spacing, and then replaces all space chars with a dash.
You could use another regular expression to replace the greek characters with english characters.

Categories