Regex acronym matching and typo correction - javascript

I'm trying to fix some typos and one common one is a space missing betweens sentences: "This is a sentence.Here is another sentence." I want to match and add a space so I wrote this regular expression:
var re = /\.(?=[A-Z]|\()/g;
var res = str.replace(re, '. ');
That covers the squished together sentences, as well as another typo involving parenthesis which is not important for this question.
The problem is that there are acronyms that show up, which are also matched and (incorrectly) replace. Example: "The U.S. is a country" is replaced to "The U. S. is a country". I'm trying to prevent these acronyms from being matched. I think maybe what I want is a "lookbehind", but javascript doesn't support that.
Any idea how to solve this?

You could try:
\.(?=[A-Z]|\()(?![A-Z]\.)
This ensures the proceeding characters after the "." do not include a capital letter followed by a "."

This seems to work:
var str = "A sentance.Another sentance with an A.C.R.O.N.Y.M.Yet another sentence."
var re = /\.(?=[A-Z][^.]|\()/g;
var res = str.replace(re, '. ');
res // => "A sentance. Another sentance with an A.C.R.O.N.Y.M. Yet another sentence."

Related

Capturing parentheses - /(\d)/ ? or /\s*;\s*/?

I am reading about split and below is a variable looking at the string values. However I do not understand what the symbols are looking for.
According to the page: If separator contains capturing parentheses, matched results are returned in the array.
var myString = 'Hello 1 word. Sentence number 2.';
var splits = myString.split(/(\d)/);
console.log(splits);
// Results
[ "Hello ", "1", " word. Sentence number ", "2", "." ]
My question is, what is happening here? Parentheses "(" or ")" is not part of the string. Why is space or "." separated for some and not the other?
Another one is /\s*;\s*
States it removes semi-colon before and after if there are 0 or more space. Does this mean /\s* mean it looks for a space and remove and ';' in this case is the separator?
var names = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ';
console.log(names);
var re = /\s*;\s*/;
var nameList = names.split(re);
console.log(nameList);
// Results
["Harry Trump", "Fred Barney", "Helen Rigby", "Bill Abel", "Chris Hand "]
If so why doesn't /\s*^\s*/ remobe space before and after ^ symbol if my string looked like this.
var names = 'Harry Trump ^Fred Barney^ Helen Rigby ^ Bill Abel ^Chris Hand ';
console.log(names);
var re = /\s*^\s*/;
var nameList = names.split(re);
console.log(nameList);
I would like to know what the symbols mean and why they are in certain order. Thanks you.
It seems you got your examples from here.
First let's look at this one /(\d)/.
Working inside out, recognize that \d escapes all digits.
Now, from the article, wrapping the parentheses around the escape tells the split method to keep the delimiter (which in this case is any digit) in the returned array. Notice that without the parentheses, the returned array wouldn't have numeric elements (as strings of course). Lastly, it is wrapped in slashes (//) to create a regular expression. Basically this case says: split the string by digits and keep the digits in the returned array.
The second case /\s*;\s* is a little more complicated and will take some understanding of regular expressions. First note that \s escapes a space. In regular expressions, a character c followed by a * says 'look for 0 or more of c, in consecutive order'. So this regular expression matches strings like ' ; ', ';', etc (I added the single quotes to show the spaces). Note that in this case, we don't have parentheses, so the semicolons will be excluded from the returned array.
If you're still stuck, I'd suggest reading about regular expressions and practice writing them. This website is great, just be be weary of the fact that regular expressions on that site may be slightly different than those used in javascript in terms of syntax.
The 1st example below splits the input string at any digit, keeping the delimiter (i.e. the digit) in the final array.
The 2nd example below shows that leaving the parentheses out still splits the array at any digit, but those digit delimiters are not included in the final array.
The 3rd example below splits the input string any time the following pattern is encountered: as many consecutive spaces as possible (including none) immediately followed by a semi-colon immediately followed by as many consecutive spaces as possible (including none).
The 4th example below shows that you can indeed split a similar input string as in the 3rd example but with "^" replacing ";". However, because the "^" by itself means "the start of the string" you have to tell JavaScript to find the actual "^" by putting a backslash (i.e. a special indicator designated for this purpose) right in front of it, i.e. "\^".
const show = (msg) => {console.log(JSON.stringify(msg));};
var myString = 'Hello 1 word. Sentence number 2.';
var splits1 = myString.split(/(\d)/);
show(splits1);
var splits2 = myString.split(/\d/);
show(splits2);
var names1 = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ';
var nameList1 = names1.split(/\s*;\s*/);
show(nameList1);
var names2 = 'Harry Trump ^Fred Barney^ Helen Rigby ^ Bill Abel ^Chris Hand ';
var nameList2 = names2.split(/\s*\^\s*/);
show(nameList2);

Match and replace a substring while ignoring special characters

I am currently looking for a way to turn matching text into a bold html line. I have it partially working except for special characters giving me problems because I desire to maintain the original string, but not compare the original string.
Example:
Given the original string:
Taco John's is my favorite place to eat.
And wanting to match:
is my 'favorite'
To get the desired result:
Taco John's <b>is my favorite</b> place to eat.
The way I'm currently getting around the extra quotes in the matching string is by replacing them
let regex = new RegExp('('+escapeRegexCharacters(matching_text.replace(/[^a-z 0-9]/gi,''))+')',"gi")
let html= full_text.replace(/[^a-z 0-9]/gi,'').replace(regex, "<b>$1</b>")}}></span>
This almost works, except that I lose all punctuation:
Taco Johns <b>is my favorite</b> place to eat
Is there any way to use regex, or another method, to add tags surrounding a matching phrase while ignoring both case and special characters during the matching process?
UPDATE #1:
It seems that I am being unclear. I need the original string's puncuation to remain in the end result's html. And I need the matching text logic to ignore all special characters and capitalization. So is my favorite is My favorite and is my 'favorite' should all trigger a match.
Instead of removing the special characters from the string being searched, you could inject in your regular expression a pattern between each character-to-match that will skip any special characters that might occur. That way you build a regular expression that can be applied directly to the string being searched, and the replacing operation will thus not touch the special characters outside of the matches:
let escapeRegexCharacters =
s => s.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&"),
full_text = "Taco John's is My favorite place to eat.";
matching_text = "is my 'favorite'";
regex = new RegExp(matching_text.replace(/[^a-z\s\d]/gi, '')
.split().map(escapeRegexCharacters).join('[^a-z\s\d]*'), "gi"),
html = full_text.replace(regex, "<b>$&</b>");
console.log(html);
Regexps are useful where there is a pattern, but, in this case you have a direct match, so, the good approach is using a String.prototype.replace:
function wrap(source, part, tagName) {
return source
.replace(part,
`<${tagName}>${part}</${tagName}>`
)
;
}
At least, if there is a pattern, you should edit your question and provide it.
As an option, for single occurrence case - use String.split
Example replacing '###' with '###' :
let inputString = '1234###5678'
const chunks = inputString.split('###')
inputString = `${chunks[0]}###${chunks[1]}`
It's possible to avoid using a capture group with the $& replacement string, which means "entire matched substring":
var phrase = "Taco John's is my favorite place to eat."
var matchingText = "is my favorite"
var re = new RegExp(escapeRegexCharacters(matchingText), "ig");
phrase.replace(re, "<b>$&</b>");
(Code based on obarakon's answer.)
Generalizing, the regex you could use is my /w+. You can use that in a replacer function so that you can javascript manipulate the resultant text:
var str = "Taco John's is my favorite place to eat.";
var html = str.replace(/is my \w*/, function (x) {
return "<b>" + x + "</b>";
} );
console.log(html);

Ignore pattern with speciefic word in it

I've got the following code:
var one = "What a lovely puppy!
Is it
a beagle?",
two = "You've got a lovely funny beagle!",
three = "Where's my lovely dog? Where's my little beagle?";
var target = (".* lovely(?!funny).* beagle.*");
What I need is to "catch" words between lovely and beagle. If there's word funny between my "catch" boundaries (var two), the method
match() should ignore it.
In other words, it should check for matches all phrases from lovely to beagle, which involve no word funny.
My var target seems to be written incorrect.
I expect:
one.match(target) //return: "puppy! Is it a"
two.match(target) //must be ignored, because of word "funny"
three.match(target) //return: "dog? Where's my little"
Any help will be appreciated!
You'll have to join the repeated test for funny and the .*. Try
lovely(?:(?!funny).)*beagle
Check it out here at regex101.
Regards
I suggest to make a per character check for the first letter, then check for 'funny' if current letter is an inital f:
var regex = /\blovely\b([^f]|\Bf|f(?!unny\b))+\bbeagle\b/;

Extract the last few words in a sentence using regex?

I am trying to write a regex to extract the last few words from a sentence. However, I can't seem to make it work.
var str = "Tell me about robin hood";
const TELL_ME_ABOUT_REGEX = /^Tell me about (\w+\s*)*/
if( str.match(TELL_ME_ABOUT_REGEX) ){
var matches = TELL_ME_ABOUT_REGEX.exec( str );
console.log( "user wants to know about ",matches);
}
I am trying to get the word "robin hood". But I only end up with "hood".
[ 'Tell me about robin hood',
'hood',
index: 0,
input: 'Tell me about robin hood' ]
What do I change in my regex?
Why you need regex for this? You can do it without regex like it
var str = "Tell me about robin hood";
var str = str.split(" ").splice(3).join(" ");
alert(str);
http://codepen.io/anon/pen/aNvrYz
Edit : Regex Solution
var str = "Tell me about robin hood";
var match = str.match(/^Tell me about (.*)/i);
var textInDe = match[1];
alert(textInDe);
http://codepen.io/anon/pen/BKoerY
Here is a correct regex:
^Tell me about ((?:\w+\s*)*)
Compare to your original one which is
^Tell me about (\w+\s*)*
That's so closed. The point is you should use the non-capturing group for an inner bracket and capturing group for an outer bracket.
Note that (\w+\s*)* from your regex, it might captures robin to \1 at the first time and then overwrites hood to \1. So, now you might understand that regex engine will work in this way.
The problem with /^Tell me about (\w+\s*)*/ is that are multiples matches for "Tell me about robin hood", i.e., robin is a possible match, as well robin\s and so forth. The repetition characters may confuse you sometimes, but when you think on all match possibilities, it can be clearer.
For matching all that came after Tell me about you can simply get it all at once, with only one possible match: (.*).
Regex demo
Try this
Tell me about ([\w\s]+)
Regex Demo
(\w+\s*)* will capture 'robin' then 'hood' but only keep the last iteration => 'hood'

Deleting empty spaces in a string

Okay I have a simple Javascript problem, and I hope some of you are eager to help me. I realize it's not very difficult but I've been working whole day and just can't get my head around it.
Here it goes: I have a sentence in a Textfield form and I need to reprint the content of a sentence but WITHOUT spaces.
For example: "My name is Slavisha" The result: "MynameisSlavisha"
Thank you
You can replace all whitespace characters:
var str = "My name is Slavisha" ;
str = str.replace(/\s+/g, ""); // "MynameisSlavisha"
The /\s+/g regex will match any whitespace character, the g flag is necessary to replace all the occurrences on your string.
Also, as you can see, we need to reassign the str variable because Strings are immutable -they can't really change-.
Another way to do it:
var str = 'My name is Slavisha'.split(' ').join('');

Categories