Search for full word instead of part inside of it - javascript

I'm trying to find exact word in text user is sending, but obviously, when I'm trying to use message.content.includes(), it's also looking for parts of the word in text which I don't need! Any way to search by full words only?
Few examples: **TexT**, HeLlO, etc.

Yes, you can use a regex like this, assuming wordToFind holds the word you are searching for:
// Create regex from word with \b at each end which
// means "word boundary", and the 'i' option means
// case-insensitive
const wordSearch = new RegExp(`\b${wordToFind}\b`, 'i');
// Use the regular expression to test the content:
const hasWord = wordSearch.test(message.content);
// will be true if whole word is found

Put your content in to an array, splitting words if it's a piece of text. Array.includes will match whole words.
So either use [content].includes(word) if content is a single word, or content.split(' ').includes(word) if content is multiple words.

Related

Extract text containing match between new line characters

I am trying to extract paragraphs from OCR'd contracts if that paragraph contains key search terms using JS. A user might search for something such as "ship ahead" to find clauses relating to whether a certain customers orders can be shipped early.
I've been banging my head up against a regex wall for quite some time and am clearly just not grasping something.
If I have text like this and I'm searching for the word "match":
let text = "\n\nThis is an example of a paragraph that has the word I'm looking for The word is Match. \n\nThis paragraph does not have the word I want."
I would want to extract all the text between the double \n characters and not return the second sentence in that string.
I've been trying some form of:
let string = `[^\n\n]*match[^.]*\n\n`;
let re = new RegExp(string, "gi");
let body = text.match(re);
However that returns null. Oddly if I remove the periods from the string it works (sorta):
[
"This is an example of a paragraph that has the word I'm looking for The word is Match \n" +
'\n'
]
Any help would be awesome.
Extracting some text between identical delimiters containing some specific text is not quite possible without any hacks related to context matching.
Thus, you may simply split the text into paragraphs and get those containing your match:
const results = text.split(/\n{2,}/).filter(x=>/\bmatch\b/i.test(x))
You may remove word boundaries if you do not need a whole word match.
See the JavaScript demo:
let text = "\n\nThis is an example of a paragraph that has the word I'm looking for The word is Match. \n\nThis paragraph does not have the word I want.";
console.log(text.split(/\n{2,}/).filter(x=>/\bmatch\b/i.test(x)));
That's pretty easy if you use the fact that a . matches all characters except newline by default. Use regex /.*match.*/ with a greedy .* on both sides:
const text = 'aaaa\n\nbbb match ccc\n\nddd';
const regex = /.*match.*/;
console.log(text.match(regex).toString());
Output:
bbb match ccc
Here is two ways to do it. I am not sure why u need to use regular expression. Split seems much easier to do, isn't it?
const text = "\n\nThis is an example of a paragraph that has the word I'm looking for The word is Match. \n\nThis paragraph does not have the word I want."
// regular expression one
function getTextBetweenLinesUsingRegex(text) {
const regex = /\n\n([^(\n\n)]+)\n\n/;
const arr = regex.exec(text);
if (arr.length > 1) {
return arr[1];
}
return null;
}
console.log(`getTextBetweenLinesUsingRegex: ${ getTextBetweenLinesUsingRegex(text)}`);
console.log(`simple: ${text.split('\n\n')[1]}`);

regex not being called repeatedly for multiple matches (isn't global)

I have this regex /#[a-zA-Z0-9_]+$/g to do a global look up of all user names that are mentioned.
Here is some sample code.
var userRegex = /#[a-zA-Z0-9_]+$/g;
var text = "This is some sample text #Stuff #Stuff2 #Stuff3";
text.replace(userRegex, function(match, text, urlId) {
console.log(match);
});
So basically that console.log only gets called once, in this case it'll just show #Stuff3. I'm not sure why it isn't searching globally. If someone can help fix up that regex for me, that'd be awesome!
$ means "Assert the position at the end of the string (or before a line break at the end of the string, if any)". But you don't seem to want that.
So remove the $ and use /#[a-zA-Z0-9_]+/g instead.
var userRegex = /#[a-zA-Z0-9_]+/g,
text = "This is some sample text #Stuff #Stuff2 #Stuff3";
text.match(userRegex); // [ "#Stuff", "#Stuff2", "#Stuff3" ]
It isn't doing a global search throughout the entire context simply because of the end of string $ anchor (which only asserts at the end of string position). You can use the following here:
var results = text.match(/#\w+/g) //=> [ '#Stuff', '#Stuff2', '#Stuff3' ]
Note: \w is shorthand for matching any word character.
Adding to #Oriol's answer. You can add word boundaries to be more specific.
#([a-zA-Z0-9_]+)\b
the \b will cause the username to match only if it is followed by a non-word character.
Here is the regex demo.

Replace words of text area

I have made a javascript function to replace some words with other words in a text area, but it doesn't work. I have made this:
function wordCheck() {
var text = document.getElementById("eC").value;
var newText = text.replace(/hello/g, '<b>hello</b>');
document.getElementById("eC").innerText = newText;
}
When I alert the variable newText, the console says that the variable doesn't exist.
Can anyone help me?
Edit:
Now it replace the words, but it replaces it with <b>hello</b>, but I want to have it bold. Is there a solution?
Update:
In response to your edit, about your wanting to see the word "hello" show up in bold. The short answer to that is: it can't be done. Not in a simple textarea, at least. You're probably looking for something more like an online WYSIWYG editor, or at least a RTE (Richt Text Editor). There are a couple of them out there, like tinyMCE, for example, which is a decent WYSIWYG editor. A list of RTE's and HTML editors can be found here.
First off: As others have already pointed out: a textarea element's contents is available through its value property, not the innerText. You get the contents alright, but you're trying to update it through the wrong property: use value in both cases.
If you want to replace all occurrences of a string/word/substring, you'll have to resort to using a regular expression, using the g modifier. I'd also recommend making the matching case-insensitive, to replace "hello", "Hello" and "HELLO" all the same:
var txtArea = document.querySelector('#eC');
txtArea.value = txtArea.value.replace(/(hello)/gi, '<b>$1</b>');
As you can see: I captured the match, and used it in the replacement string, to preserve the caps the user might have used.
But wait, there's more:
What if, for some reason, the input already contains <b>Hello</b>, or contains a word containing the string "hello" like "The company is called hellonearth?" Enter conditional matches (aka lookaround assertions) and word boundaries:
txtArea.value = txtArea.value.replace(x.value.replace(/(?!>)\b(hello)\b(?!<)/gi, '<b>$1</b>');
fiddle
How it works:
(?!>): Only match the rest if it isn't preceded by a > char (be more specific, if you want to and use (?!<b>). This is called a negative look-ahead
\b: a word boundary, to make sure we're not matching part of a word
(hello): match and capture the string literal, provided (as explained above) it is not preceded by a > and there is a word boundary
(?!<): same as above, only now we don't want to find a matching </b>, so you can replace this with the more specific (?!<\/b>)
/gi: modifiers, or flags, that affect the entire pattern: g for global (meaning this pattern will be applied to the entire string, not just a single match). The i tells the regex engine the pattern is case-insensitive, ie: h matches both the upper and lowercase character.
The replacement string <b>$1</b>: when the replacement string contains $n substrings, where n is a number, they are treated as backreferences. A regex can group matches into various parts, each group has a number, starting with 1, depending on how many groups you have. We're only grouping one part of the pattern, but suppose we wrote:
'foobar hello foobar'.replace(/(hel)(lo)/g, '<b>$1-$2</b>');
The output would be "foobar <b>hel-lo</b> foobar", because we've split the match up into 2 parts, and added a dash in the replacement string.
I think I'll leave the introduction to RegExp at that... even though we've only scratched the surface, I think it's quite clear now just how powerful regex's can be. Put some time and effort into learning more about this fantastic tool, it is well worth it.
If <textarea>, then you need to use .value property.
document.getElementById("eC").value = newText;
And, as mentioned Barmar, replace() replaces only first word. To replace all word, you need to use simple regex. Note that I removed quotes. /g means global replace.
var newText = text.replace(/hello/g, '<b>hello</b>');
But if you want to really bold your text, you need to use content editable div, not text area:
<div id="eC" contenteditable></div>
So then you need to access innerHTML:
function wordCheck() {
var text = document.getElementById("eC").innerHTML;
var newText = text.replace(/hello/g, '<b>hello</b>');
newText = newText.replace(/<b><b>/g,"<b>");//These two lines are there to prevent <b><b>hello</b></b>
newText = newText.replace(/<\/b><\/b>/g,"</b>");
document.getElementById("eC").innerHTML = newText;
}

Remove string after predefined string

I am pulling content from an RSS feed, before using jquery to format and edit the rss feed (string) that is returned. I am using replace to replace strings and characters like so:
var spanish = $("#wod a").text();
var newspan = spanish.replace("=","-");
$("#wod a").text(newspan);
This works great. I am also trying to remove all text after a certain point. Similar to truncation, I would like to hide all text starting from the word "Example".
In this particular RSS feed, the word example is in every feed. I would like to hide "example" and all text the follows that word. How can I accomplish this?
Though there is not enough jQuery, you even don't need it to remove everything after a certain word in the given string. The first approach is to use substring:
var new_str = str.substring(0, str.indexOf("Example"));
The second is a trick with split:
var new_str = str.split("Example")[0];
If you also want to keep "Example" and just remove everything after that particular word, you can do:
var str = "aaaa1111?bbb&222:Example=123456",
newStr = str.substring(0, str.indexOf('Example') + 'Example'.length);
// will output: aaaa1111?bbb&222:Example
jQuery isn't intended for string manipulation, you should use Vanilla JS for that:
newspan = newspan.replace(/example.*$/i, "");
The .replace() method accepts a regular expression, so in this case I've used /example.*$/i which does a case-insensitive match against the word "example" followed by zero or more of any other characters to the end of the string and replaces them with an empty string.
I would like to hide all text starting from the word "Example"
A solution that uses the simpler replace WITH backreferences so as to "hide" everything starting with the word Example but keeping the stuff before it.
var str = "my house example is bad"
str.replace(/(.*?) example.*/i, "$1") // returns "my house"
// case insensitive. also note the space before example because you
// probably want to throw that out.

Javascript RegExp Matching weirdness

I have a RegExp:
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi
and some text "Champion"
somehow, this is coming back as a match, am I crazy?
0: "pio"
1: "i"
index: 4
input: "Champion"
length: 2
the loop is here:
// contruct the pattern, dynamically
var someText = "Champion";
var phrase = ".?(NCAA|Division|I|Basketball|Champions,|1939-2011).?";
var pat = new RegExp(phrase, "gi"); // <- ends up being
var result;
while( result = pat.exec(someText) ) {
// do stuff!
}
There has to be something wrong with my RegExp, right?
EDIT:
The .? thing was just a quick and dirty attempt to say that I'd like to match one of those words AND/OR one of those words with a single char on either side. ex:
\sNCAA\s
NCAA
NCAA\s
\sNCAA
GOAL:
I'm trying to do some simple hit highlighting based on some search words. I've got a function that gets all of the text nodes on a page, and I'd like to go through them all and highlight any matches to any of the terms in my phrase variable.
I think that I just need to rework how I am building my RegExp.
Well, first of all you're specifying case-insensitivity, and secondly, you are matching the letter I as one of your matchable string.
Champion would match pio and i, because they both match /.?I.?/gi
It however doesn't match /.?Champions,.?/gi because of the trailing comma.
Add start (^) and end ($) anchors to the regexp.
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
Without the anchors, the regexp's match can start and end anywhere in the string, which is why
/.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
can match pio and i: because it's actually matching around the (case-insensitive) I. If you leave the anchors off, but remove the ...|I|..., the regex won't match 'Champion':
> /.?(NCAA|Division|Basketball|Champions,|1939-2011).?/gi.exec('Champion')
null
Champion matches /.?I.?/i.
Your own output notes that it's matching the substring "pio".
Perhaps you meant to bound the expression to the start and end of the input, with ^ and $ respectively:
/^.?(NCAA|Division|I|Basketball|Champions,|1939-2011).?$/gi
I know you said to ignore the .?, but I can't: it's most likely wrong, and it's most likely going to continue to cause you problems. Explain why they're there and we can tell you how to do it properly. :)

Categories