Match and replace a substring while ignoring special characters - javascript

I am currently looking for a way to turn matching text into a bold html line. I have it partially working except for special characters giving me problems because I desire to maintain the original string, but not compare the original string.
Example:
Given the original string:
Taco John's is my favorite place to eat.
And wanting to match:
is my 'favorite'
To get the desired result:
Taco John's <b>is my favorite</b> place to eat.
The way I'm currently getting around the extra quotes in the matching string is by replacing them
let regex = new RegExp('('+escapeRegexCharacters(matching_text.replace(/[^a-z 0-9]/gi,''))+')',"gi")
let html= full_text.replace(/[^a-z 0-9]/gi,'').replace(regex, "<b>$1</b>")}}></span>
This almost works, except that I lose all punctuation:
Taco Johns <b>is my favorite</b> place to eat
Is there any way to use regex, or another method, to add tags surrounding a matching phrase while ignoring both case and special characters during the matching process?
UPDATE #1:
It seems that I am being unclear. I need the original string's puncuation to remain in the end result's html. And I need the matching text logic to ignore all special characters and capitalization. So is my favorite is My favorite and is my 'favorite' should all trigger a match.

Instead of removing the special characters from the string being searched, you could inject in your regular expression a pattern between each character-to-match that will skip any special characters that might occur. That way you build a regular expression that can be applied directly to the string being searched, and the replacing operation will thus not touch the special characters outside of the matches:
let escapeRegexCharacters =
s => s.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&"),
full_text = "Taco John's is My favorite place to eat.";
matching_text = "is my 'favorite'";
regex = new RegExp(matching_text.replace(/[^a-z\s\d]/gi, '')
.split().map(escapeRegexCharacters).join('[^a-z\s\d]*'), "gi"),
html = full_text.replace(regex, "<b>$&</b>");
console.log(html);

Regexps are useful where there is a pattern, but, in this case you have a direct match, so, the good approach is using a String.prototype.replace:
function wrap(source, part, tagName) {
return source
.replace(part,
`<${tagName}>${part}</${tagName}>`
)
;
}
At least, if there is a pattern, you should edit your question and provide it.

As an option, for single occurrence case - use String.split
Example replacing '###' with '###' :
let inputString = '1234###5678'
const chunks = inputString.split('###')
inputString = `${chunks[0]}###${chunks[1]}`

It's possible to avoid using a capture group with the $& replacement string, which means "entire matched substring":
var phrase = "Taco John's is my favorite place to eat."
var matchingText = "is my favorite"
var re = new RegExp(escapeRegexCharacters(matchingText), "ig");
phrase.replace(re, "<b>$&</b>");
(Code based on obarakon's answer.)

Generalizing, the regex you could use is my /w+. You can use that in a replacer function so that you can javascript manipulate the resultant text:
var str = "Taco John's is my favorite place to eat.";
var html = str.replace(/is my \w*/, function (x) {
return "<b>" + x + "</b>";
} );
console.log(html);

Related

Capitalize the first letter of each word

var name = "AlbERt EINstEiN";
function nameChanger(oldName) {
var finalName = oldName;
// Your code goes here!
finalName = oldName.toLowerCase();
finalName = finalName.replace(finalName.charAt(0), finalName.charAt(0).toUpperCase());
for(i = 0; i < finalName.length; i++) {
if (finalName.charAt(i) === " ")
finalName.replace(finalName.charAt(i+1), finalName.charAt(i+1).toUpperCase());
}
// Don't delete this line!
return finalName;
};
// Did your code work? The line below will tell you!
console.log(nameChanger(name));
My code as is, returns 'Albert einstein'. I'm wondering where I've gone wrong?
If I add in
console.log(finalName.charAt(i+1));
AFTER the if statement, and comment out the rest, it prints 'e', so it recognizes charAt(i+1) like it should... I just cannot get it to capitalize that first letter of the 2nd word.
There are two problems with your code sample. I'll go through them one-by-one.
Strings are immutable
This doesn't work the way you think it does:
finalName.replace(finalName.charAt(i+1), finalName.charAt(i+1).toUpperCase());
You need to change it to:
finalName = finalName.replace(finalName.charAt(i+1), finalName.charAt(i+1).toUpperCase());
In JavaScript, strings are immutable. This means that once a string is created, it can't be changed. That might sound strange since in your code, it seems like you are changing the string finalName throughout the loop with methods like replace().
But in reality, you aren't actually changing it! The replace() function takes an input string, does the replacement, and produces a new output string, since it isn't actually allowed to change the input string (immutability). So, tl;dr, if you don't capture the output of replace() by assigning it to a variable, the replaced string is lost.
Incidentally, it's okay to assign it back to the original variable name, which is why you can do finalName = finalName.replace(...).
Replace is greedy
The other problem you'll run into is when you use replace(), you'll be replacing all of the matching characters in the string, not just the ones at the position you are examining. This is because replace() is greedy - if you tell it to replace 'e' with 'E', it'll replace all of them!
What you need to do, essentially, is:
Find a space character (you've already done this)
Grab all of the string up to and including the space; this "side" of the string is good.
Convert the very next letter to uppercase, but only that letter.
Grab the rest of the string, past the letter you converted.
Put all three pieces together (beginning of string, capitalized letter, end of string).
The slice() method will do what you want:
if (finalName.charAt(i) === " ") {
// Get ONLY the letter after the space
var startLetter = finalName.slice(i+1, i+2);
// Concatenate the string up to the letter + the letter uppercased + the rest of the string
finalName = finalName.slice(0, i+1) + startLetter.toUpperCase() + finalName.slice(i+2);
}
Another option is regular expression (regex), which the other answers mentioned. This is probably a better option, since it's a lot cleaner. But, if you're learning programming for the first time, it's easier to understand this manual string work by writing the raw loops. Later you can mess with the efficient way to do it.
Working jsfiddle: http://jsfiddle.net/9dLw1Lfx/
Further reading:
Are JavaScript strings immutable? Do I need a "string builder" in JavaScript?
slice() method
You can simplify this down a lot if you pass a RegExp /pattern/flags and a function into str.replace instead of using substrings
function nameChanger(oldName) {
var lowerCase = oldName.toLowerCase(),
titleCase = lowerCase.replace(/\b./g, function ($0) {return $0.toUpperCase()});
return titleCase;
};
In this example I've applied the change to any character . after a word boundary \b, but you may want the more specific /(^| )./g
Another good answer to this question is to use RegEx to do this for you.
var re = /(\b[a-z](?!\s))/g;
var s = "fort collins, croton-on-hudson, harper's ferry, coeur d'alene, o'fallon";
s = s.replace(re, function(x){return x.toUpperCase();});
console.log(s); // "Fort Collins, Croton-On-Hudson, Harper's Ferry, Coeur D'Alene, O'Fallon"
The regular expression being used may need to be changed up slightly, but this should give you an idea of what you can do with regular expressions
Capitalize Letters with JavaScript
The problem is twofold:
1) You need to return a value for finalName.replace, as the method returns an element but doesn't alter the one on which it's predicated.
2) You're not iterating through the string values, so you're only changing the first word. Don't you want to change every word so it's in lower case capitalized?
This code would serve you better:
var name = "AlbERt EINstEiN";
function nameChanger(oldName) {
// Your code goes here!
var finalName = [];
oldName.toLowerCase().split(" ").forEach(function(word) {
newWord = word.replace(word.charAt(0), word.charAt(0).toUpperCase());
finalName.push(newWord);
});
// Don't delete this line!
return finalName.join(" ");
};
// Did your code work? The line below will tell you!
console.log(nameChanger(name));
if (finalName.charAt(i) === " ")
Shouldn't it be
if (finalName.charAt(i) == " ")
Doesn't === check if the object types are equal which should not be since one it a char and the other a string.

Replace words of text area

I have made a javascript function to replace some words with other words in a text area, but it doesn't work. I have made this:
function wordCheck() {
var text = document.getElementById("eC").value;
var newText = text.replace(/hello/g, '<b>hello</b>');
document.getElementById("eC").innerText = newText;
}
When I alert the variable newText, the console says that the variable doesn't exist.
Can anyone help me?
Edit:
Now it replace the words, but it replaces it with <b>hello</b>, but I want to have it bold. Is there a solution?
Update:
In response to your edit, about your wanting to see the word "hello" show up in bold. The short answer to that is: it can't be done. Not in a simple textarea, at least. You're probably looking for something more like an online WYSIWYG editor, or at least a RTE (Richt Text Editor). There are a couple of them out there, like tinyMCE, for example, which is a decent WYSIWYG editor. A list of RTE's and HTML editors can be found here.
First off: As others have already pointed out: a textarea element's contents is available through its value property, not the innerText. You get the contents alright, but you're trying to update it through the wrong property: use value in both cases.
If you want to replace all occurrences of a string/word/substring, you'll have to resort to using a regular expression, using the g modifier. I'd also recommend making the matching case-insensitive, to replace "hello", "Hello" and "HELLO" all the same:
var txtArea = document.querySelector('#eC');
txtArea.value = txtArea.value.replace(/(hello)/gi, '<b>$1</b>');
As you can see: I captured the match, and used it in the replacement string, to preserve the caps the user might have used.
But wait, there's more:
What if, for some reason, the input already contains <b>Hello</b>, or contains a word containing the string "hello" like "The company is called hellonearth?" Enter conditional matches (aka lookaround assertions) and word boundaries:
txtArea.value = txtArea.value.replace(x.value.replace(/(?!>)\b(hello)\b(?!<)/gi, '<b>$1</b>');
fiddle
How it works:
(?!>): Only match the rest if it isn't preceded by a > char (be more specific, if you want to and use (?!<b>). This is called a negative look-ahead
\b: a word boundary, to make sure we're not matching part of a word
(hello): match and capture the string literal, provided (as explained above) it is not preceded by a > and there is a word boundary
(?!<): same as above, only now we don't want to find a matching </b>, so you can replace this with the more specific (?!<\/b>)
/gi: modifiers, or flags, that affect the entire pattern: g for global (meaning this pattern will be applied to the entire string, not just a single match). The i tells the regex engine the pattern is case-insensitive, ie: h matches both the upper and lowercase character.
The replacement string <b>$1</b>: when the replacement string contains $n substrings, where n is a number, they are treated as backreferences. A regex can group matches into various parts, each group has a number, starting with 1, depending on how many groups you have. We're only grouping one part of the pattern, but suppose we wrote:
'foobar hello foobar'.replace(/(hel)(lo)/g, '<b>$1-$2</b>');
The output would be "foobar <b>hel-lo</b> foobar", because we've split the match up into 2 parts, and added a dash in the replacement string.
I think I'll leave the introduction to RegExp at that... even though we've only scratched the surface, I think it's quite clear now just how powerful regex's can be. Put some time and effort into learning more about this fantastic tool, it is well worth it.
If <textarea>, then you need to use .value property.
document.getElementById("eC").value = newText;
And, as mentioned Barmar, replace() replaces only first word. To replace all word, you need to use simple regex. Note that I removed quotes. /g means global replace.
var newText = text.replace(/hello/g, '<b>hello</b>');
But if you want to really bold your text, you need to use content editable div, not text area:
<div id="eC" contenteditable></div>
So then you need to access innerHTML:
function wordCheck() {
var text = document.getElementById("eC").innerHTML;
var newText = text.replace(/hello/g, '<b>hello</b>');
newText = newText.replace(/<b><b>/g,"<b>");//These two lines are there to prevent <b><b>hello</b></b>
newText = newText.replace(/<\/b><\/b>/g,"</b>");
document.getElementById("eC").innerHTML = newText;
}

Regex trying to match characters before and after symbol

I'm trying to match characters before and after a symbol, in a string.
string: budgets-closed
To match the characters before the sign -, I do: ^[a-z]+
And to match the other characters, I try: \-(\w+) but, the problem is that my result is: -closed instead of closed.
Any ideas, how to fix it?
Update
This is the piece of code, where I was trying to apply the regex http://jsfiddle.net/trDFh/1/
I repeat: It's not that I don't want to use split; it's just I was really curious, and wanted to see, how can it be done the regex way. Hacking into things spirit
Update2
Well, using substring is a solution as well: http://jsfiddle.net/trDFh/2/ and is the one I chosed to use, since the if in question, is actually an else if in a more complex if syntax, and the chosen solutions seems to be the most fitted for now.
Use exec():
var result=/([^-]+)-([^-]+)/.exec(string);
result is an array, with result[1] being the first captured string and result[2] being the second captured string.
Live demo: http://jsfiddle.net/Pqntk/
I think you'll have to match that. You can use grouping to get what you need, though.
var str = 'budgets-closed';
var matches = str.match( /([a-z]+)-([a-z]+)/ );
var before = matches[1];
var after = matches[2];
For that specific string, you could also use
var str = 'budgets-closed';
var before = str.match( /^\b[a-z]+/ )[0];
var after = str.match( /\b[a-z]+$/ )[0];
I'm sure there are better ways, but the above methods do work.
If the symbol is specifically -, then this should work:
\b([^-]+)-([^-]+)\b
You match a boundry, any "not -" characters, a - and then more "not -" characters until the next word boundry.
Also, there is no need to escape a hyphen, it only holds special properties when between two other characters inside a character class.
edit: And here is a jsfiddle that demonstrates it does work.

Remove string after predefined string

I am pulling content from an RSS feed, before using jquery to format and edit the rss feed (string) that is returned. I am using replace to replace strings and characters like so:
var spanish = $("#wod a").text();
var newspan = spanish.replace("=","-");
$("#wod a").text(newspan);
This works great. I am also trying to remove all text after a certain point. Similar to truncation, I would like to hide all text starting from the word "Example".
In this particular RSS feed, the word example is in every feed. I would like to hide "example" and all text the follows that word. How can I accomplish this?
Though there is not enough jQuery, you even don't need it to remove everything after a certain word in the given string. The first approach is to use substring:
var new_str = str.substring(0, str.indexOf("Example"));
The second is a trick with split:
var new_str = str.split("Example")[0];
If you also want to keep "Example" and just remove everything after that particular word, you can do:
var str = "aaaa1111?bbb&222:Example=123456",
newStr = str.substring(0, str.indexOf('Example') + 'Example'.length);
// will output: aaaa1111?bbb&222:Example
jQuery isn't intended for string manipulation, you should use Vanilla JS for that:
newspan = newspan.replace(/example.*$/i, "");
The .replace() method accepts a regular expression, so in this case I've used /example.*$/i which does a case-insensitive match against the word "example" followed by zero or more of any other characters to the end of the string and replaces them with an empty string.
I would like to hide all text starting from the word "Example"
A solution that uses the simpler replace WITH backreferences so as to "hide" everything starting with the word Example but keeping the stuff before it.
var str = "my house example is bad"
str.replace(/(.*?) example.*/i, "$1") // returns "my house"
// case insensitive. also note the space before example because you
// probably want to throw that out.

Create a permalink with JavaScript

I have a textbox where a user puts a string like this:
"hello world! I think that __i__ am awesome (yes I am!)"
I need to create a correct URL like this:
hello-world-i-think-that-i-am-awesome-yes-i-am
How can it be done using regular expressions?
Also, is it possible to do it with Greek (for example)?
"Γεια σου κόσμε"
turns to
geia-sou-kosme
In other programming languages (Python/Ruby) I am using a translation array. Should I do the same here?
Try this:
function doDashes(str) {
var re = /[^a-z0-9]+/gi; // global and case insensitive matching of non-char/non-numeric
var re2 = /^-*|-*$/g; // get rid of any leading/trailing dashes
str = str.replace(re, '-'); // perform the 1st regexp
return str.replace(re2, '').toLowerCase(); // ..aaand the second + return lowercased result
}
console.log(doDashes("hello world! I think that __i__ am awesome (yes I am!)"));
// => hello-world-I-think-that-i-am-awesome-yes-I-am
As for the greek characters, yeah I can't think of anything else than some sort of lookup table used by another regexp.
Edit, here's the oneliner version:
Edit, added toLowerCase():
Edit, embarrassing fix to the trailing regexp:
function doDashes2(str) {
return str.replace(/[^a-z0-9]+/gi, '-').replace(/^-*|-*$/g, '').toLowerCase();
}
A simple regex for doing this job is matching all "non-word" characters, and replace them with a -. But before matching this regex, convert the string to lowercase. This alone is not fool proof, since a dash on the end may be possible.
[^a-z]+
Thus, after the replacement; you can trim the dashes (from the front and the back) using this regex:
^-+|-+$
You'd have to create greek-to-latin glyps translation yourself, regex can't help you there. Using a translation array is a good idea.
I can't really say for Greek characters, but for the first example, a simple:
/[^a-zA-Z]+/
Will do the trick when using it as your pattern, and replacing the matches with a "-"
As per the Greek characters, I'd suggest using an array with all the "character translations", and then adding it's values to the regular expression.
To roughly build the url you would need something like this.
var textbox = "hello world! I think that __i__ am awesome (yes I am!)";
var url = textbox.toLowerCase().replace(/([^a-z])/, '').replace(/\s+/, " ").replace(/\s/, '-');
It simply removes all non-alpha characters, removes double spacing, and then replaces all space chars with a dash.
You could use another regular expression to replace the greek characters with english characters.

Categories