Replace words in text - javascript

I'm programming a part of a Web application in which I replace words from a text. I used the Replace function, but I replaced text that I do not want (below put an example). Now I have implemented a function that by splitting the text into words, but when I want to replace two contiguous words in the text. Obviously, it doesn't work.
The first option:
var str = "iRobot Roomba balbalblablbalbla";
str.replace(/robot/gi, 'Robota');
output -> iRobota Roomba ........(fail !)
Second code:
var patterns: [
{
match: 'robot',
replacement: 'Robota'
},{
match: 'ipad',
replacement: 'tablet'
},
......... more
];
var temp = str.split(' ');
var newStr = temp.map(function(el) {
patterns.forEach(function(item) {
if( el.search( new RegExp( '^'+item.match+'$', 'gi') ) > -1 ) {
el = item.replacement;
return el;
}
});
return el;
});
return newStr.join(' ');
With this last code does not replace a two-word text, as the check only makes one. I have been searching the Internet for some solution and I have not found anything similar.
I just happen to do a split of the word to check (item.match) and if it have more than one element, create a temporal variable and check the contiguous elements, but I guess it affects performance and I do not know if there is a better and easier option.
Can anyone think of a better option?
Thanks !

As I understand, you only want to match whole words and not sub-strings.
The solution would be to add word boundaries to your regex :
str.replace(/\brobot\b/gi, 'Robota');
This will only match whole "robot" words.

Related

is there a way for the content.replace to sort of split them into more words than these?

const filter = ["bad1", "bad2"];
client.on("message", message => {
var content = message.content;
var stringToCheck = content.replace(/\s+/g, '').toLowerCase();
for (var i = 0; i < filter.length; i++) {
if (content.includes(filter[i])){
message.delete();
break
}
}
});
So my code above is a discord bot that deletes the words when someone writes ''bad1'' ''bad2''
(some more filtered bad words that i'm gonna add) and luckily no errors whatsoever.
But right now the bot only deletes these words when written in small letters without spaces in-between or special characters.
I think i have found a solution but i can't seem to put it into my code, i mean i tried different ways but it either deleted lowercase words or didn't react at all and instead i got errors like ''cannot read property of undefined'' etc.
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
bot.on('message', message => {
var words = message.content.toLowerCase().trim().match(/\w+|\s+|[^\s\w]+/g);
var containsBadWord = words.some(word => {
return badWords.includes(word);
});
This is what i am looking at. the var words line. specifically (/\w+|\s+|[^\s\w]+/g);.
Anyway to implement that into my const filter code (top/above) or a different approach?
Thanks in advance.
Well, I'm not sure what you're trying to do with .match(/\w+|\s+|[^\s\w]+/g). That's some unnecessary regex just to get an array of words and spaces. And it won't even work if someone were to split their bad word into something like "t h i s".
If you want your filter to be case insensitive and account for spaces/special characters, a better solution would probably require more than one regex, and separate checks for the split letters and the normal bad word check. And you need to make sure your split letters check is accurate, otherwise something like "wash it" might be considered a bad word despite the space between the words.
A Solution
So here's a possible solution. Note that it is just a solution, and is far from the only solution. I'm just going to use hard-coded string examples instead of message.content, to allow this to be in a working snippet:
//Our array of bad words
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
//A function that tests if a given string contains a bad word
function testProfanity(string) {
//Removes all non-letter, non-digit, and non-space chars
var normalString = string.replace(/[^a-zA-Z0-9 ]/g, "");
//Replaces all non-letter, non-digit chars with spaces
var spacerString = string.replace(/[^a-zA-Z0-9]/g, " ");
//Checks if a condition is true for at least one element in badWords
return badWords.some(swear => {
//Removes any non-letter, non-digit chars from the bad word (for normal)
var filtered = swear.replace(/\W/g, "");
//Splits the bad word into a 's p a c e d' word (for spaced)
var spaced = filtered.split("").join(" ");
//Two different regexes for normal and spaced bad word checks
var checks = {
spaced: new RegExp(`\\b${spaced}\\b`, "gi"),
normal: new RegExp(`\\b${filtered}\\b`, "gi")
};
//If the normal or spaced checks are true in the string, return true
//so that '.some()' will return true for satisfying the condition
return spacerString.match(checks.spaced) || normalString.match(checks.normal);
});
}
var result;
//Includes one banned word; expected result: true
var test1 = "I am a bannedWord1";
result = testProfanity(test1);
console.log(result);
//Includes one banned word; expected result: true
var test2 = "I am a b a N_N e d w o r d 2";
result = testProfanity(test2);
console.log(result);
//Includes one banned word; expected result: true
var test3 = "A bann_eD%word4, I am";
result = testProfanity(test3);
console.log(result);
//Includes no banned words; expected result: false
var test4 = "No banned words here";
result = testProfanity(test4);
console.log(result);
//This is a tricky one. 'bannedWord2' is technically present in this string,
//but is 'bannedWord22' really the same? This prevents something like
//"wash it" from being labeled a bad word; expected result: false
var test5 = "Banned word 22 isn't technically on the list of bad words...";
result = testProfanity(test5);
console.log(result);
I've commented each line thoroughly, such that you understand what I am doing in each line. And here it is again, without the comments or testing parts:
var badWords = [
'bannedWord1',
'bannedWord2',
'bannedWord3',
'bannedWord4'
];
function testProfanity(string) {
var normalString = string.replace(/[^a-zA-Z0-9 ]/g, "");
var spacerString = string.replace(/[^a-zA-Z0-9]/g, " ");
return badWords.some(swear => {
var filtered = swear.replace(/\W/g, "");
var spaced = filtered.split("").join(" ");
var checks = {
spaced: new RegExp(`\\b${spaced}\\b`, "gi"),
normal: new RegExp(`\\b${filtered}\\b`, "gi")
};
return spacerString.match(checks.spaced) || normalString.match(checks.normal);
});
}
Explanation
As you can see, this filter is able to deal with all sorts of punctuation, capitalization, and even single spaces/symbols in between the letters of a bad word. However, note that in order to avoid the "wash it" scenario I described (potentially resulting in the unintentional deletion of a clean message), I made it so that something like "bannedWord22" would not be treated the same as "bannedWord2". If you want it to do the opposite (therefore treating "bannedWord22" the same as "bannedWord2"), you must remove both of the \\b phrases in the normal check's regex.
I will also explain the regex, such that you fully understand what is going on here:
[^a-zA-Z0-9 ] means "select any character not in the ranges of a-z, A-Z, 0-9, or space" (meaning all characters not in those specified ranges will be replaced with an empty string, essentially removing them from the string).
\W means "select any character that is not a word character", where "word character" refers to the characters in ranges a-z, A-Z, 0-9, and underscore.
\b means "word boundary", essentially indicating when a word starts or stops. This includes spaces, the beginning of a line, and the end of a line. \b is escaped with an additional \ (to become \\b) in order to prevent javascript from confusing the regex token with strings' escape sequences.
The flags g and i used in both of the regex checks indicate "global" and "case-insensitive", respectively.
Of course, to get this working with your discord bot, all you have to do in your message handler is something like this (and be sure to replace badWords with your filter variable in testProfanity()):
if (testProfanity(message.content)) return message.delete();
If you want to learn more about regex, or if you want to mess around with it and/or test it out, this is a great resource for doing so.

Get all words starting with X and ending with Y

I have got a textarea with keyup=validate()
I need a javascript function that gets all words starting with # and ending with a character that is not A-Za-z0-9
For example:
This is a text #user1 this is more text #user2. And this is even more #user3!
The function gives an array:
Array("#user1","#user2","#user3");
I am sure there must be a way to do this written on somewhere on the internet if I just google something but I have no idea what I have to look for.. I am very new with regular expresions.
Thank you very much!
The regular expression you want is:
/#[a-z\d]+/ig
This matches # followed by a sequence of letters and numbers. The i modifier makes it case-insensitive, so you don't have to put A-Z in the character class, and g makes it find all the matches.
var str = "This is a text #user1 this is more text #user2. And this is even more #user3!";
var matches = str.match(/#[a-z\d]+/ig);
console.log(matches);
JS
var str = "This is a text #user1 this is more text #user2. And this is even more #user3!",
var textArr = str.split(" ");
for(var i = 0; i < textArr.length; i++) {
var test = textArr[i];
matches = test.match(/^#.*.[A-Za-z0-9]$/);
console.log(matches);
};
Explanation:
You should also read about the regex(http://www.w3schools.com/jsref/jsref_obj_regexp.asp) and match(http://www.w3schools.com/jsref/jsref_match.asp) to get an idea how it works.
Basically, applying ^# means starting the regex look for #. $ means ending with. and .* any character in between.
To Test: http://www.regular-expressions.info/javascriptexample.html
Thanks for the replies above, they've helped me - Where I've written this method that hopefully answers the question about having a start and end regex check.
In this example it looks for ##_ at the start and _## at the end
e.g. ##_ anyTokenYouNeedToFind _##.
Code:
const tokenSearchHelper = (inputText) => {
let matches = inputText.match(/##_[a-zA-Z0-9_\d]+_##/ig);
return matches;
}
const out = tokenSearchHelper("Hello ##_World_##");
console.log(out);

Find and replace using array of keys to find, and array values to replace

I have, in jQuery, written the below:
$(document).ready(function() {
var wordlist = new Array();
wordlist['BioResource'] = 'Bio Resource is a lorem';
var array_length = wordlist.length;
for(var key in wordlist) {
$("p").html(function(index, value) {
return value.replace(new RegExp("\b(" + key + ")\b", "gi"), '$1');
});
}
});
It should (but doesn't), loop through the wordlist array and for each key, try to find that word in any paragraph tags and replace it with itself but wrapped in an anchor with a title tag of the appropriate value of the array at that key.
What am I doing wrong?
The regex itself is working if I remove the array aspect from this and directly input the key and value like this:
return value.replace(/\b(BioResource)\b/gi, '$1');
Thanks in advance for your help.
Paul
Change this:
"\b(" + key + ")\b"
To this:
"\\b(" + key + ")\\b"
\b in string literal represents backspace character. Even if it doesn't have any special meaning, to specify \ in string, you need to escape it: \\. Otherwise, \ will just vaporize, or a syntax error will be thrown.
You could turn it around and replace any word you can find. This way you only iterate over each paragraph text once to perform the actual replacement.
This solution finds all words inside each paragraph, using (\w+) and see whether the contents matches in your wordlist object. When found it makes the replacement, if not, it will leave that word alone.
$('p').html(function(index, old) {
return old.replace(/(\w+)/g, function($0, $1) {
return wordlist[$1] || $0;
});
});

JS Regex to find href of several a tags

I need a regex to find the contents of the hrefs from these a tags :
<p class="bc_shirt_delete">
delete
</p>
Just the urls, not the href/ tags.
I'm parsing a plain text ajax request here, so I need a regex.
You can try this regex:
/href="([^\'\"]+)/g
Example at: http://regexr.com?333d1
Update: or easier via non greedy method:
/href="(.*?)"/g
This will do it nicely. http://jsfiddle.net/grantk/cvBae/216/
Regex example: https://regex101.com/r/nLXheV/1
var str = '<p href="missme" class="test">delete</p>'
var patt = /<a[^>]*href=["']([^"']*)["']/g;
while(match=patt.exec(str)){
alert(match[1]);
}
Here is a robust solution:
let href_regex = /<a([^>]*?)href\s*=\s*(['"])([^\2]*?)\2\1*>/i,
link_text = 'another article link',
href = link_text.replace ( href_regex , '$3' );
What it does:
detects a tags
lazy skips over other HTML attributes and groups (1) so you DRY
matches href attribute
takes in consideration possible whitespace around =
makes a group (2) of ' and " so you DRY
matches anything but group (1) and groups (3) it
matches the group (2) of ' and "
matches the group (1) (other attributes)
matches whatever else is there until closing the tag
set proper flags i ignore case
You may don't need Regex to do that.
o = document.getElementsByTagName('a');
urls = Array();
for (i =0; i < o.length; i++){
urls[i] = o[i].href;
}
If it is a plain text, you may insert it into a displayed non DOM element, i.e display: none, and then deal with it regularly in a way like I described.
It might be easier to use jQuery
var html = '<li><h2 class="saved_shirt_name">new shirt 1</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&OID=3936923&A=Delete">Delete Shirt</button></li><li><h2 class="saved_shirt_name">new shirt 2</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&OID=3936924&A=Delete">Delete Shirt</button></li><li><h2 class="saved_shirt_name">new shirt 3</h2><button class="edit_shirt">Edit Shirt</button><button class="delete_shirt" data-eq="0" data-href="/CustomContentProcess.aspx?CCID=13524&OID=3936925&A=Delete">Delete Shirt</button></li>';
$(html).find('[data-href]');
And iterate each node
UPDATE (because post updated)
Let html be your raw response
var matches = $(html).find('[href]');
var hrefs = [];
$.each(matches, function(i, el){ hrefs.push($(el).attr('href'));});
//hrefs is an array of matches
I combined a few solutions around and came up with this (Tested in .NET):
(?<=href=[\'\"])([^\'\"]+)
Explanation:
(?<=) : look behind so it wont include these characters
[\'\"] : match both single and double quote
[^] : match everything else except the characters after '^' in here
+ : one or more occurrence of last character.
This works well and is not greedy with the quote as it would stop matching the moment it finds a quote
var str = "";
str += "<p class=\"bc_shirt_delete\">";
str += "delete";
str += "</p>";
var matches = [];
str.replace(/href=("|')(.*?)("|')/g, function(a, b, match) {
matches.push(match);
});
console.log(matches);
or if you don't care about the href:
var matches = str.match(/href=("|')(.*?)("|')/);
console.log(matches);
how about spaces around = ?
this code will fix it:
var matches = str.match(/href( *)=( *)("|'*)(.*?)("|'*)( |>)/);
console.log(matches);
It's important to be non-greedy. And to cater for —matching— ' or "
test = "<a href="#" class="foo bar"> banana
<a href='http://google.de/foo?yes=1&no=2' data-href='foobar'/>"
test.replace(/href=(?:\'.*?\'|\".*?\")/gi,'');
disclaimer: The one thing it does not catch is html5 attribs data-href...
In this specified case probably this is fastest pregmatch:
/f="([^"]*)/
gets ALL signs/characters (letters, numbers, newline signs etc.) form f=" to nearest next ", excluding it, flags for example /is are unnecesary, return null if empty
but if the source contains lots of other links, it will be necessary to determine that this is exactly the one you are looking for and here we can do it this way, just include in your pregmatch more of the source code, for example (of course its depend from source site code...)
/bc_shirt_delete">\s*<a href="([^"]*)

Replacing multiple patterns in a block of data

I need to find the most efficient way of matching multiple regular expressions on a single block of text. To give an example of what I need, consider a block of text:
"Hello World what a beautiful day"
I want to replace Hello with "Bye" and "World" with Universe. I can always do this in a loop ofcourse, using something like String.replace functions availiable in various languages.
However, I could have a huge block of text with multiple string patterns, that I need to match and replace.
I was wondering if I can use Regular Expressions to do this efficiently or do I have to use a Parser like LALR.
I need to do this in JavaScript, so if anyone knows tools that can get it done, it would be appreciated.
Edit
6 years after my original answer (below) I would solve this problem differently
function mreplace (replacements, str) {
let result = str;
for (let [x, y] of replacements)
result = result.replace(x, y);
return result;
}
let input = 'Hello World what a beautiful day';
let output = mreplace ([
[/Hello/, 'Bye'],
[/World/, 'Universe']
], input);
console.log(output);
// "Bye Universe what a beautiful day"
This has as tremendous advantage over the previous answer which required you to write each match twice. It also gives you individual control over each match. For example:
function mreplace (replacements, str) {
let result = str;
for (let [x, y] of replacements)
result = result.replace(x, y);
return result;
}
let input = 'Hello World what a beautiful day';
let output = mreplace ([
//replace static strings
['day', 'night'],
// use regexp and flags where you want them: replace all vowels with nothing
[/[aeiou]/g, ''],
// use captures and callbacks! replace first capital letter with lowercase
[/([A-Z])/, $0 => $0.toLowerCase()]
], input);
console.log(output);
// "hll Wrld wht btfl nght"
Original answer
Andy E's answer can be modified to make adding replacement definitions easier.
var text = "Hello World what a beautiful day";
text.replace(/(Hello|World)/g, function ($0){
var index = {
'Hello': 'Bye',
'World': 'Universe'
};
return index[$0] != undefined ? index[$0] : $0;
});
// "Bye Universe what a beautiful day";
You can pass a function to replace:
var hello = "Hello World what a beautiful day";
hello.replace(/Hello|World/g, function ($0, $1, $2) // $3, $4... $n for captures
{
if ($0 == "Hello")
return "Bye";
else if ($0 == "World")
return "Universe";
});
// Output: "Bye Universe what a beautiful day";
An improved answer:
var index = {
'Hello': 'Bye',
'World': 'Universe'
};
var pattern = '';
for (var i in index) {
if (pattern != '') pattern += '|';
pattern += i;
}
var text = "Hello World what a beautiful day";
text.replace(new RegExp(pattern, 'g'), function($0) {
return index[$0] != undefined ? index[$0] : $0;
});
If the question is how to replace multiple generic patterns with corresponding replacements - either strings or functions, it's quite tricky because of special characters, capturing groups and backreference matching.
You can use https://www.npmjs.com/package/union-replacer for this exact purpose. It is basically a string.replace(regexp, string|function) counterpart, which allows multiple replaces to happen in one pass while preserving full power of string.replace(...).
Disclosure: I am the author and the library was developed because we had to support user-configured replaces.
A common task involving replacing a number of patterns is making a user's or other string "safe" for rendering on Web pages, which means preventing HTML tags from being active. This can be done in JavaScript using HTML entities and the forEach function, allowing a set of exceptions (that is, a set of HTML tags that will be allowed to render).
This is a common task, and here is a fairly brief way to accomplish it:
// Make a string safe for rendering or storing on a Web page
function SafeHTML(str)
{
// Make all HTML tags safe
let s=str.replace(/</gi,'<');
// Allow certain safe tags to be rendered
['br','strong'].forEach(item=>
{
let p=new RegExp('<(/?)'+item+'>','gi');
s=s.replace(p,'<$1'+item+'>');
});
return s;
} // SafeHTML

Categories