RegExp "i" case insensitive VS toLowerCase() (javascript)

RegExp "i" case insensitive VS toLowerCase() (javascript) - javascript

I'm hoping someone can explain to me why I need to use "toLowerCase()" if I'm already using a regular expression that is case insensitive "i".
The exercise is a pangram that can accept numbers and non-ascii characters, but all letters of the alphabet MUST be present in lower case, upper case, or mixed. I wasn't able to solve this exercise correctly until I added "toLowerCase()". This is one of the javascript exercises from exercism.io. Below is my code:
var Pangram = function (sentence) {
this.sentence = sentence;
};
Pangram.prototype.isPangram = function (){
var alphabet = "abcdefghijklmnopqrstuvwxyz", mustHave = /^[a-z]+$/gi,
x = this.sentence.toLowerCase(), isItValid = mustHave.test(x);
for (var i = 0; i < alphabet.length; i++){
if (x.indexOf(alphabet[i]) === -1 && isItValid === false){
return false;
}
}
return true;
};
module.exports = Pangram;

The regex may not be doing what you think it's doing. Here is your code commented with what's going on:
Pangram.prototype.isPangram = function (){
var alphabet = "abcdefghijklmnopqrstuvwxyz", mustHave = /^[a-z]+$/gi,
x = this.sentence.toLowerCase(), isItValid = mustHave.test(x);
// for every letter in the alphabet
for (var i = 0; i < alphabet.length; i++){
// check the following conditions:
// letter exists in the sentence (case sensitive)
// AND sentence contains at least one letter between a-z (start to finish, case insensitive)
if (x.indexOf(alphabet[i]) === -1 && isItValid === false){
return false;
}
}
return true;
}
The logic that is checking whether each letter is present has nothing to do with the regex, the two are serving separate purposes. In fact, based on your description of the problem, the regex will cause your solution to fail in some cases. For example, assume we have the string "abcdefghijklmnopqrstuvwxyz-". In that case your regex will test false even though this sentence should return true.
My advice would be to remove the regex, use toLowerCase on the sentence, and iterate through the alphabet checking if the sentence has each letter - which you seems to be the track you were on.
Below is a sample solution with some tests. Happy learning!
function isPangram (str) {
const alphabet = 'abcdefghijklmnopqrstuvwxyz'
const strChars = new Set(str.toLowerCase().split(''))
return alphabet.split('').every(char => strChars.has(char))
}
const tests = [
"abc",
"abcdefghijklmnopqrstuvwxyz",
"abcdefghijklmnopqRstuvwxyz",
"abcdefghijklmnopqRstuvwxyz-",
]
tests.forEach(test => {
console.log(test, isPangram(test))
})

It's because you're manually checking for lowercase letters:
if (x.indexOf(alphabet[i]) === -1)
alphabet[i] will be one of your alphabet string, which you have defined as lowercase.
It looks like you don't need the regex at all here, or at least it's not doing what you think it's doing. Since your regex only allows for alpha characters, it will fail if your sentence has any spaces.

Related

Algorithm - Search and Replace a string

I am doing a algorithm in freeCodeCamp.(https://www.freecodecamp.org/learn/javascript-algorithms-and-data-structures/intermediate-algorithm-scripting/search-and-replace)
The task is as below:
Perform a search and replace on the sentence using the arguments provided and return the new sentence.
First argument is the sentence to perform the search and replace on.
Second argument is the word that you will be replacing (before).
Third argument is what you will be replacing the second argument with (after).
Note:
Preserve the case of the first character in the original word when you are replacing it. For example if you mean to replace the word "Book" with the word "dog", it should be replaced as "Dog"
**
myReplace("Let us get back to more Coding", "Coding", "algorithms") should return "Let us get back to more Algorithms".
myReplace("Let us go to the store", "store", "mall") should return "Let us go to the mall".
**
//if the before is uppercase, the after should be uppercase also
// str = str.replace(before, after);
var regex = /[A-Z]+/g; //check for uppercase
var newStr = "";
console.log(regex.test(before));
if (regex.test(before)) {
//if uppercase, return true, "after" convert to uppercase
after = after.toUpperCase();
newStr = after[0];
for (var i = 1; i < after.length; i++) {
//start at index=1 letter, all convert to
newStr += after[i].toLowerCase();
}
console.log(newStr);
str = str.replace(before, newStr);
} else {
str = str.replace(before, after);
}
// console.log(newStr);
console.log(str);
return str;
}
I think there should be OK for the code, but can anyone help find why the if statement can't work.
Much thanks!

The problem is that you're calling regex.test() multiple times on the same regular expression instance.
[...]
var regex = /[A-Z]+/g; //check for uppercase
var newStr = "";
console.log(regex.test(before));
if (regex.test(before)) {
//if uppercase, return true, "after" convert to uppercase
after = after.toUpperCase();
[...]
If your string is Hello_there, the first regex.test() will return true, because Hello matched. If you call regex.test() again with the same regex instance, it will have advanced in the string, and try to match starting with _there. In this case, it will fail, because _there does not begin with a capital letter between A and Z.
There are a lot of ways to fix this issue. Perhaps the simplest is to store the result of the first call to a variable, and use it everywhere you're calling regex.test():
[...]
var regex = /[A-Z]+/g; //check for uppercase
var newStr = "";
var upper_check = regex.test(before);
console.log(upper_check);
if (upper_check) {
[...]

It seems overkill to use a regex, when you really need to only check the first character. Your regex will find uppercase letters anywhere...
If the assignment is to only change one occurrence, then a regex is not really the right tool here: it does not really help to improve the code nor the efficiency. Just do:
function myReplace(str, before, after) {
if (before[0] === before[0].toUpperCase()) {
after = after[0].toUpperCase() + after.slice(1);
} else {
after = after[0].toLowerCase() + after.slice(1);
}
return str.replace(before, after);
}

function myReplace(str, before, after) {
var upperRegExp = /[A-Z]/g
var lowerRegExp = /[a-z]/g
var afterCapitalCase = after.replace(/^./, after[0].toUpperCase());
if (before[0].match(upperRegExp)) {
return str.replace(before, afterCapitalCase)
} else if (after[0].match(upperRegExp) && before[0].match(lowerRegExp)) {
return str.replace(before, after.toLowerCase());
} else {
return str.replace(before, after)
}
}

Turn lowercase letters of a string to uppercase and the inverse

let message = "heY, WHAt are you dOING?";
let count_changes = 0;
let isLetter = (letter) => {
if (('a'<=letter && letter >='z') || ('A'<=letter && letter >='Z')) {
return letter;
} else {
return -1;
}
}
for(let i = 0; i <= message.length; i++) {
if (isLetter(i) && message[i].toLowerCase()) {
message[i].toUpperCase();
count_changes++;
console.log(message[i].toLowerCase());
}
else if (isLetter(i) && message[i].toUpperCase()) {
message[i].toLowerCase();
count_changes++;
}
else {
console.error('Bad stirng');
}
}
Hello, I want to use the function isLetter to check the string message every character and when i use isLetter in the for loop to check in the if statement whether i is a Letter or not and also if its Lowercase letter later to when there is a change to Uppercase i increment count_changes++. Again with the second if statement if also i is Letter and in this case Uppercase letter then if change to lowercase letter to increment the count_changes++ so the count_changes to be my final result
thank you

By default, javascript's comparison of strings is case sensitive, therefore you can check a character's case by comparing it to either an upper or lower case converted value.
If it is the same, then the case is what you checked against, if not, the case is different.
"TRY" == "TrY" would return false, whereas "TRY" == "TRY" would return true;
So, use a variable to indicate the case of the last letter checked, then compare the next letter to the opposite case. If it matches, the case has changed, otherwise it is still the same case.
The isLetter function checks a value to be a single character, and using a regex test ensures that it is a letter - no punctuation or digits etc.
Your loop would always produce an error because you were iterating outside the lenth of the message string - arrays are 0 based.
let message = "heY, WHAt are you dOING?";
let count_changes = 0;
let lowerCase = message[0] == message[0].toLowerCase();
let messageLength = message.length;
function isLetter (val) {
// Check val is a letter of the alphabet a - z ignoring case.
return val.length == 1 && val.match(/[a-z]/i);
}
for (let i = 0; i < messageLength; i++) {
var char = message[i];
if (isLetter(char)) {
if(lowerCase) {
// Check to see if the next letter is upper case when the last one was lower case.
if(char == char.toUpperCase()) {
lowerCase = false;
count_changes++;
}
}
else {
// Check to see if the next letter is lower case when the last one was upper case.
if(char == char.toLowerCase()) {
lowerCase = true;
count_changes++;
}
}
}
else {
// Found a non-letter character.
console.error('Not a letter.');
}
}
console.log("Number of times the case changed: " + count_changes);

TL;DR:
let message = "heY, WHAt are you dOING?";
let newMessage = "";
let count_changes = 0;
let isLowerCaseLetter = (letter) => 'a' <= letter && letter <= 'z';
let isUpperCaseLetter = (letter) => 'A' <= letter && letter <= 'Z';
/* Iterate over every character of the message. */
for (let i = 0; i < message.length; i++) {
/* Cache the character at the current index. */
let character = message[i];
/* Check whether the character is a lowercase letter. */
if (isLowerCaseLetter(character)) {
newMessage += character.toUpperCase();
count_changes++;
}
/* Check whether the character is an uppercase letter. */
else if (isUpperCaseLetter(character)) {
newMessage += character.toLowerCase();
count_changes++;
}
/* Otherwise, just add the current character to the new message. */
else newMessage += character;
}
console.log("New Message: ", newMessage);
console.log("Changes: ", count_changes);
Your Mistakes:
The way you're checking if a character is a letter is wrong, due to >='z'. It should be <='z'. The same goes for the check against 'Z'.
Functions that have a Boolean connotation had better return true or false instead of -1 or the character itself as you do.
Inside isLetter you pass the index instead of the character itself. The function call should be isLetter(message[i]) instead of isLetter(i).
The very message you are testing will be deemed a 'bad string', because of the comma and the spaces between the words.
In your loop, the condition should be i < message.length, otherwise, every message will be deemed a 'bad string', because you'll exceed all characters and get an undefined value.
The methods toLowerCase and toUpperCase do not affect the original string but create a new one instead. If you want to assemble the resulting characters together, you have to initialise a newMessage string and concatenate it the processed character each loop.
Suggested solution:
Instead of one isLetter function create one checking if a character is a lowercase letter and one checking if it's an uppercase letter. That way you combine your checks and your if clause will be much simpler and more readable.
Ditch the isLetter check and the good string / bad string thing completely, so as not to have problems with in-between characters such as spaces and punctuation.
Attempt to minimise function calls, as for large strings, they will slow down your code a lot. In the code below, only 2 function calls per loop are used, compared to the accepted answer, which makes:
3 function calls per loop plus,
3 function calls when a character is letter (the majority of the time)
3 one-time function calls for from, map and join, which will matter for large strings.
Speedtest:
In a series of 5 tests using a massive string (2,825,856 chars long) the answers stack up as follows:
this answer (jsFiddle used):
[1141.91ms, 1150.93ms, 1093.75ms, 1048.50ms, 1183.03ms]
accepted answer (jsFiddle used):
[2211.30ms, 2985.22ms, 2136.73ms, 2279.26ms, 2482.34ms]

From what I understand, you want to count the number of characters in the string and return a string where all uppercase characters are replaced with lowercase characters and all lowercase characters are replaced with uppercase characters. Additionally, you want to increment countChanges once for every character changed.
This code should do what you want:
let message = "heY, WHAt are you dOING?";
let countChanges = 0;
let isLetter = c => c.toLowerCase() !== c.toUpperCase();
let isLowerCase = c => c.toLowerCase() === c;
let flippedMessage = Array.from(message).map((c)=>{
if(!isLetter(c)){
return c;
}
countChanges++;
// return uppercase character if c is a lowercase char
if(isLowerCase(c)){
return c.toUpperCase();
}
// Here, we know c is an uppercase character, so return the lowercase
return c.toLowerCase();
}).join('');
// flippedMessage is "HEy, whaT ARE YOU Doing?"
// countChanges is 18

replace non matches between delimiters

I've have a input string:
12345,3244,654,ffgv,87676,988ff,87657
I'm having a difficulty to transform all terms in the string that are not five digit numbers to a constant 34567 using regular expressions. So, the output would be like this:
12345,34567,34567,34567,87676,34567,87657
For this, I looked at two options:
negated character class: Not useful because it does not execute directly on this expression ,[^\d{5}],
lookahead and lookbehind: Issue here is that it doesn't include non-matched part in the result of this expression ,(?!\d{5}) or (?<!\d{5}), for the purpose of substitution/replace.
Once the desired expression is found, it would give a result so that one can replace non-matched part using tagged regions like \1, \2.
Is there any mechanism in regular expression tools to achieve the output as mentioned in the above example?
Edit: I really appreciate those who have answered non-regex solutions, but I would be more thankful if you provide a regex-based solution.

You don't need regex for this. You can use str.split to split the string at commas first and then for each item check if its length is greater than or equal to 5 and it contains only digits(using str.isdigit). Lastly combine all the items using str.join.
>>> s = '12345,3244,654,ffgv,87676,988ff,87657'
>>> ','.join(x if len(x) >= 5 and x.isdigit() else '34567' for x in s.split(','))
'12345,34567,34567,34567,87676,34567,87657'
Javascript version:
function isdigit(s){
for(var i=0; i <s.length; i++){
if(!(s[i] >= '0' && s[i] <= '9')){
return false;
}
}
return true;
}
arr = "12345,3244,654,ffgv,87676,988ff,87657".split(",");
for(var i=0; i < arr.length; i++){
if(arr[i].length < 5 || ! isdigit(arr[i])) arr[i] = '34567';
}
output = arr.join(",")

Try the following: /\b(?!\d{5})[^,]+\b/g
It constrains the expression between word boundaries (\b),
Followed by a negative look-ahead for non five digit numbers (!\d{5}),
Followed by any characters between ,
const expression = /\b(?!\d{5})[^,]+\b/g;
const input = '12345,3244,654,ffgv,87676,988ff,87657';
const expectedOutput = '12345,34567,34567,34567,87676,34567,87657';
const output = input.replace(expression, '34567');
console.log(output === expectedOutput, expectedOutput, output);

This approach uses /\b(\d{5})|(\w+)\b/g:
we match on boundaries (\b)
our first capture group captures "good strings"
our looser capture group gets the leftovers (bad strings)
our replacer() function knows the difference
const str = '12345,3244,654,ffgv,87676,988ff,87657';
const STAND_IN = '34567';
const massageString = (str) => {
const pattern = /\b(\d{5})|(\w+)\b/g;
const replacer = (match, goodstring, badstring) => {
if (goodstring) {
return goodstring;
} else {
return STAND_IN;
}
}
const r = str.replace(pattern,replacer);
return r;
};
console.log( massageString(str) );

I think the following would work for value no longer than 5 alphanumeric characters:
(,(?!\d{5})\w{1,5})
if longer than 5 alphanumeric characters, then remove 5 in above expression:
(,(?!\d{5})\w{1,})
and you can replace using:
,34567
You can see a demo on regex101. Of course, there might be faster non-regex methods for specific languages as well (python, perl or JS)

Remove all special characters except space from a string using JavaScript

I want to remove all special characters except space from a string using JavaScript.
For example,
abc's test#s
should output as
abcs tests.

You should use the string replace function, with a single regex.
Assuming by special characters, you mean anything that's not letter, here is a solution:
const str = "abc's test#s";
console.log(str.replace(/[^a-zA-Z ]/g, ""));

You can do it specifying the characters you want to remove:
string = string.replace(/[&\/\\#,+()$~%.'":*?<>{}]/g, '');
Alternatively, to change all characters except numbers and letters, try:
string = string.replace(/[^a-zA-Z0-9]/g, '');

The first solution does not work for any UTF-8 alphabet. (It will cut text such as Привіт). I have managed to create a function which does not use RegExp and use good UTF-8 support in the JavaScript engine. The idea is simple if a symbol is equal in uppercase and lowercase it is a special character. The only exception is made for whitespace.
function removeSpecials(str) {
var lower = str.toLowerCase();
var upper = str.toUpperCase();
var res = "";
for(var i=0; i<lower.length; ++i) {
if(lower[i] != upper[i] || lower[i].trim() === '')
res += str[i];
}
return res;
}
Update: Please note, that this solution works only for languages where there are small and capital letters. In languages like Chinese, this won't work.
Update 2: I came to the original solution when I was working on a fuzzy search. If you also trying to remove special characters to implement search functionality, there is a better approach. Use any transliteration library which will produce you string only from Latin characters and then the simple Regexp will do all magic of removing special characters. (This will work for Chinese also and you also will receive side benefits by making Tromsø == Tromso).

search all not (word characters || space):
str.replace(/[^\w ]/, '')

I don't know JavaScript, but isn't it possible using regex?
Something like [^\w\d\s] will match anything but digits, characters and whitespaces. It would be just a question to find the syntax in JavaScript.

I tried Seagul's very creative solution, but found it treated numbers also as special characters, which did not suit my needs. So here is my (failsafe) tweak of Seagul's solution...
//return true if char is a number
function isNumber (text) {
if(text) {
var reg = new RegExp('[0-9]+$');
return reg.test(text);
}
return false;
}
function removeSpecial (text) {
if(text) {
var lower = text.toLowerCase();
var upper = text.toUpperCase();
var result = "";
for(var i=0; i<lower.length; ++i) {
if(isNumber(text[i]) || (lower[i] != upper[i]) || (lower[i].trim() === '')) {
result += text[i];
}
}
return result;
}
return '';
}

const str = "abc's#thy#^g&test#s";
console.log(str.replace(/[^a-zA-Z ]/g, ""));

Try to use this one
var result= stringToReplace.replace(/[^\w\s]/g, '')
[^] is for negation, \w for [a-zA-Z0-9_] word characters and \s for space,
/[]/g for global

With regular expression
let string = "!#This tool removes $special *characters* /other/ than! digits, characters and spaces!!!$";
var NewString= string.replace(/[^\w\s]/gi, '');
console.log(NewString);
Result //This tool removes special characters other than digits characters and spaces
Live Example : https://helpseotools.com/text-tools/remove-special-characters

dot (.) may not be considered special. I have added an OR condition to Mozfet's & Seagull's answer:
function isNumber (text) {
reg = new RegExp('[0-9]+$');
if(text) {
return reg.test(text);
}
return false;
}
function removeSpecial (text) {
if(text) {
var lower = text.toLowerCase();
var upper = text.toUpperCase();
var result = "";
for(var i=0; i<lower.length; ++i) {
if(isNumber(text[i]) || (lower[i] != upper[i]) || (lower[i].trim() === '') || (lower[i].trim() === '.')) {
result += text[i];
}
}
return result;
}
return '';
}

Try this:
const strippedString = htmlString.replace(/(<([^>]+)>)/gi, "");
console.log(strippedString);

const input = `#if_1 $(PR_CONTRACT_END_DATE) == '23-09-2019' #
Test27919<alerts#imimobile.com> #elseif_1 $(PR_CONTRACT_START_DATE) == '20-09-2019' #
Sender539<rama.sns#gmail.com> #elseif_1 $(PR_ACCOUNT_ID) == '1234' #
AdestraSID<hello#imimobile.co> #else_1#Test27919<alerts#imimobile.com>#endif_1#`;
const replaceString = input.split('$(').join('->').split(')').join('<-');
console.log(replaceString.match(/(?<=->).*?(?=<-)/g));

Whose special characters you want to remove from a string, prepare a list of them and then user javascript replace function to remove all special characters.
var str = 'abc'de#;:sfjkewr47239847duifyh';
alert(str.replace("'","").replace("#","").replace(";","").replace(":",""));
or you can run loop for a whole string and compare single single character with the ASCII code and regenerate a new string.

Finding uppercase characters within a string

I am trying to write a function that decryptes an encrypted message that has uppercase letters (showing its a new word) and lower case characters (which is the word itself). The function needs to search through the encrypted message for all the uppercase letters and then returns the uppercase character along with lower case that follows it. I have been given a function to call on within the decrypt function:
function isUpperCase(aCharacter)
{
return (aCharacter >= 'A') && (aCharacter <= 'Z');
}
I was thinking that I would search through the word for all the uppercase characters first and assign that as a new string. I could then do while loop that will pick up each of the letters in the new string and then search for the lower case characters that are next to it in the old string.
However, I am completely stuck at the first part - I cant even work out the structured English.
The code is:
encryptMessage is a string containing uppercase and lowercase characters
indexCharacter is used at a later date for another function
upperAlphabet - alphabet of uppercase characters - used later
lowerAlphabet - alphabet lowercase characters - used later
The function:
function decryptMessage(encryptMessage, indexCharacter, upperAlphabet, lowerAlphabet)
{
var letter
var word = "";
for (var count = 0; count < encryptMessage.length; count = count +1);
{
letter = encryptMessage.charAt(count)
if (isUpperCase(letter));
{
word = word + letter;
}
document.write(word); //this is just to test to see if it returns the uppercase - I would use the return word
}
The above just doesnt seem to work, so I cant even continue with the rest of the code. Can anyone help me identify where i have gone wrong - have I completely gone the wrong direction with this anyway, reading it back I dont think it really makes much sense ?? Its a very basic code, I have only learnt, for, while loops - if and else functions really, i am just soooooo stuck.
thanks in advance for your advice :-)
Issy

I'm not too sure I follow, but you can strip using the replace method and regular expressions
var str = 'MaEfSdsfSsdfsAdfssdGsdfEsdf';
var newmsg = str.replace(/[a-z]/g, '');
var old = str.replace(/[A-Z]/g, '');
In this case, newmsg = 'MESSAGE'.

A simple condition for checking uppercase characters in a string would be...
var str = 'aBcDeFgHiJkLmN';
var sL = str.length;
var i = 0;
for (; i < sL; i++) {
if (str.charAt(i) === str.charAt(i).toUpperCase()) {
console.log('uppercase:',str.charAt(i));
}
}
/*
uppercase: B
uppercase: D
uppercase: F
uppercase: H
uppercase: J
uppercase: L
uppercase: N
*/

EDIT
String input = "ThisIsASecretText";
for(int i = 0; i < input.Length; i++)
{
if(isUpperCase(input.charAt(i))
{
String nextWord = String.Empty;
for(int j = i; j < input.Length && !isUpperCase(input.charAt(j)); j++)
{
nextWord += input.charAt(j);
i++;
}
CallSomeFunctionWithTheNextWord(nextWord);
}
}
The following calls would be made:
CallSomeFunctionWithTheNextWord("This");
CallSomeFunctionWithTheNextWord("Is");
CallSomeFunctionWithTheNextWord("A");
CallSomeFunctionWithTheNextWord("Secret");
CallSomeFunctionWithTheNextWord("Text");
You can do the same thing with much less code using regular expressions, but since you said that you are taking a very basic course on programming, this solution might be more appropriate.

Use Unicode property escapes, in particular the "Lu" General Property Category, which matches uppercase. There are categories for numbers, punctuation, currency, and just about any other category of character you might be interested in.
In the example below, the "u" modifier enables Unicode matching.
"HeLlo WoRld".match(/\p{Lu}/gu) // [ 'H', 'L', 'W', 'R' ]

I would rather use Array.reduce as follows:
say, example sample = 'SampleStringAsFollows';
let capWord = [...sample].reduce((caps,char) => (char.match(/[A-Z]/)) ? caps + char : caps,'');
console.log(capWord); //SSAF
capWord will be a string of CAPITAL CHARACTERS and will also tackle the boundary cases where in the string may contain special characters.

Please Use Below code to get first Capital letter of the sentence :
Demo Code
var str = 'i am a Web developer Student';
var sL = str.length;
var i = 0;
for (; i < sL; i++) {
if (str.charAt(i) != " ") {
if (str.charAt(i) === str.charAt(i).toUpperCase()){
console.log(str.charAt(i));
}
}
}

We Keep Coding

JavaScript is the programming language of the Web.

RegExp "i" case insensitive VS toLowerCase() (javascript) - javascript

Related

Algorithm - Search and Replace a string

Turn lowercase letters of a string to uppercase and the inverse

replace non matches between delimiters

Remove all special characters except space from a string using JavaScript

Finding uppercase characters within a string

Categories

Resources