Removing all special characters except "some" apostrophes

Removing all special characters except "some" apostrophes - javascript

I'm trying to create a function that removes all special characters (including periods) except apostrophes when they are naturally part of a word. The regex pattern I've made is supposed to remove anything that doesn't fit the schema of word either followed by an apostrophe ' and/or another word:
function removeSpecialCharacters(str) {
return str.toLowerCase().replace(/[^a-z?'?a-z ]/g, ``)
}
console.log(removeSpecialCharacters(`I'm a string.`))
console.log(removeSpecialCharacters(`I'm a string with random stuff.*/_- '`))
console.log(removeSpecialCharacters(`'''`))
As you can see from the snippet it works well except for removing the rogue apostrophes.
And if I add something like [\s'\s] or ['] to the pattern it breaks it completely. Why is it doing this and what am I missing here?

Alternate the pattern with '\B, which will match and remove apostrophes which are not followed by a word character, eg ab' or ab'#, while preserving strings like ab'c:
function removeSpecialCharacters(str) {
return str.toLowerCase().replace(/'\B|[^a-z'? ]/g, ``)
}
console.log(removeSpecialCharacters(`I'm a string.`))
console.log(removeSpecialCharacters(`I'm a string with random stuff.*/_- '`))
console.log(removeSpecialCharacters(`'''`))
(you can also remove the duplicated characters from the character set)

Not sure what went wrong with yours as I can't see what you attempted. However, I got this to work.
function removeSpecialCharacters(str) {
str = str.toLowerCase();
// reduce duplicate apostrophes to single
str = str.replace(/'+/g,`'`);
// get rid of wacky chars
str = str.replace(/[^a-z'\s]/g,'');
// replace dangling apostrophes
str = str.replace(/(^|\s)'(\s|$)/g, ``);
return str;
}
console.log(removeSpecialCharacters(`I'm a string.`))
console.log(removeSpecialCharacters(`I'm a string with random stuff.*/_- '`))
console.log(removeSpecialCharacters(`'''`))
console.log(removeSpecialCharacters(`regex 'til i die`))

Here's one very easy solution. To remove certain characteristics from a string, you can run a bunch of if-statements through a while loop. This allows you to chose exactly which symbols to remove.
while (increment < string.length)
{
if (string[increment] == "!")
}
delete "!";
}
increment += 1;
}
That's a simple rundown of what'll look like (not actual code) to give you a sense of what you're doing.

Related

Regex excluding matches wrapped in specific bbcode tags

I'm trying to replace double quotes with curly quotes, except when the text is wrapped in certain tags, like [quote] and [code].
Sample input
[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>"Why no goodbye?" replied [b]Bob[/b]. "It's always Hello!"</p>
Expected output
[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>“Why no goodbye?” replied [b]Bob[/b]. “It's always Hello!”</p>
I figured how to elegantly achieve what I want in PHP by using (*SKIP)(*F), however my code will be run in javascript, and the javascript solution is less than ideal.
Right now I'm splitting the string at those tags, running the replace, then putting the string together:
var o = 3;
a = a
.split(/(\[(?<first>(?:icode|quote|code))[^\]]*?\](?:[\s]*?.)*?[\s]*?\[\/(?:\k<first>)\])/i)
.map(function(x,i) {
if (i == o-1 && x) {
x = '';
}
else if (i == o && x)
{
x = x.replace(/(?![^<]*>|[^\[]*\])"([^"]*?)"/gi, '“$1”')
o = o+3;
}
return x;
}).join('');
Javascript Regex Breakdown
Inside split():
(\[(?<first>icode|quote|code)[^\]]*?\](?:.)*?\[\/(\k<first>)\]) - captures the pattern inside parentheses:
\[(?<first>quote|code|icode)[^\]]*?\] - a [quote], [code], or [icode] opening tag, with or without parameters like =html, eg [code=html]
(?:[\s]*?.)*? - any 0+ (as few as possible) occurrences of any char (.), preceded or not by whitespace, so it doesn't break if the opening tag is followed by a line break
[\s]*? - 0+ whitespaces
\[\/(\k<first>)\] - [\quote], [\code], or [\icode] closing tags. Matches the text captured in the (?<first>) group. Eg: if it's a quote opening tag, it'll be a quote closing tag
Inside replace():
(?![^<]*>|[^\[]*\])"([^"]*?)" - captures text inside double quotes:
(?![^<]*>|[^\[]*\]) - negative lookahead, looks for characters (that aren't < or [) followed by either > or ] and discards them, so it won't match anything inside bbcode and html tags. Eg: [spoiler="Name"] or <span style="color: #24c4f9">. Note that matches wrapped in tags are left untouched.
" - literal opening double quotes character.
([^"]*?) - any 0+ character, except double quotes.
" - literal closing double quotes character.
SPLIT() REGEX DEMO: https://regex101.com/r/Ugy3GG/1
That's awful, because the replace is executed multiple times.
Meanwhile, the same result can be achieved with a single PHP regex. The regex I wrote was based on Match regex pattern that isn't within a bbcode tag.
(\[(?<first>quote|code|icode)[^\]]*?\](?:[\s]*?.)*?[\s]*?\[\/(\k<first>)\])(*SKIP)(*F)|(?![^<]*>|[^\[]*\])"([^"]*?)"
PHP Regex Breakdown
(\[(?<first>quote|code|icode)[^\]]*?\](?:[\s]*?.)*?[\s]*?\[\/(\k<first>)\])(*SKIP)(*F) - matches the pattern inside capturing parentheses just like javascript split() above, then (*SKIP)(*F) make the regex engine omit the matched text.
| - or
(?![^<]*>|[^\[]*\])"([^"]*?)" - captures text inside double quotes in the same way javascript replace() does
PHP DEMO: https://regex101.com/r/fB0lyI/1
The beauty of this regex is that it only needs to be run once. No splitting and joining of strings. Is there a way to implement it in javascript?

Because JS lacks backtracking verbs you will need to consume those bracketed chunks but later replace them as is. By obtaining the second side of the alternation from your own regex the final regex would be:
\[(quote|i?code)[^\]]*\][\s\S]*?\[\/\1\]|(?![^<]*>|[^\[]*\])"([^"]*)"
But the tricky part is using a callback function with replace() method:
str.replace(regex, function($0, $1, $2) {
return $1 ? $0 : '“' + $2 + '”';
})
Above ternary operator returns $0 (whole match) if first capturing group exists otherwise it encloses second capturing group value in curly quotes and returns it.
Note: this may fail in different cases.
See live demo here

Nested markup is hard to parse with rx, and JS's RegExp in particular. Complex regular expressions also hard to read, maintain, and debug. If your needs are simple, a tag content replacement with some banned tags excluded, consider a simple code-based alternative to run-on RegExps:
function curly(str) {
var excludes = {
quote: 1,
code: 1,
icode: 1
},
xpath = [];
return str.split(/(\[[^\]]+\])/) // breakup by tag markup
.map(x => { // for each tag and content:
if (x[0] === "[") { // tag markup:
if (x[1] === "/") { // close tag
xpath.pop(); // remove from current path
} else { // open tag
xpath.push(x.slice(1).split(/\W/)[0]); // add to current path
} //end if open/close tag
} else { // tag content
if (xpath.every(tag =>!excludes[tag])) x = x.replace(/"/g, function repr() {
return (repr.z = !repr.z) ? "“" : "”"; // flip flop return value (naive)
});
} //end if markup or content?
return x;
}) // end term map
.join("");
} /* end curly() */
var input = `[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>"Why no goodbye?" replied [b]Bob[/b]. "It's always Hello!"</p>`;
var wants = `[quote="Name"][b]Alice[/b] said, "Hello world!"[/quote]
<p>“Why no goodbye?” replied [b]Bob[/b]. “It's always Hello!”</p>`;
curly(input) == wants; // true
To my eyes, even though it a bit longer, code allows documentation, indentation, and explicit naming that makes these sort of semi-complicated logical operations easier to understand.
If your needs are more complex, use a true BBCode parser for JavaScript and map/filter/reduce it's model as needed.

Javascript pig latin regex not working

Trying to make a function that will convert English words into Pig Latin. The problem I have is when I check to see if the first letter is a vowel. I check using a regular expression: if (str[0] === /[aeiou]/i) but it doesn't work. Something is wrong with my regex but I look at references and it seems like that should work. I don't know what's going on. Can someone explain why the regex I am using does not work and what would be a similar working solution? If you run the code below, it doesn't give the right result, just saying beforehand.
function translate(str) {
var tag = "";
if (str[0] === /[aeiou]/i) {
tag = "way";
return str + tag;
}
else {
var count = 0;
for (var i = 0; i< str.length; i++) {
if (str[i] !== /[aeiou]/i)
tag += str[i];
else
break;
count = i;
}
console.log(count);
return str.slice(count + 1) + tag + "ay";
}
}
So when I run say translate(overjoyed) it should return "overjoyedway". And if I run translate(glove) it should return "oveglay".

What you have written is not the way you use regular expressions. The code if (str[0] === /[aeiou]/i) tests whether the first element of the str string array is both equal value and equal type as the regular expression: /[aeiou]/i. Characters are not the same type as regular expressions, so such a comparison will evaluate to false.
Think of the regular expression as a tool that can be used to search an entire string array for a match (all of str, not just str[0]). The web has a bunch of great examples, but to get you started, you might try using str.search(regexp) which will return the index of the first match (if found) or -1 (if no match).
Your code then becomes (without too much deviation from the original, and without trying to be clever or optimal):
function translate(str) {
var tag = "";
var pos = str.search(/[aeiou]/i); // This is ONE way to use regular expressions.
if (pos == 0) { // First letter is a vowel.
tag = "way";
return str + tag;
} else if (pos > 0) { // Some letter (not the first) is a vowel.
// Instead of the loop checking each element, we already know where
// the match is found: at position = pos.
console.log(pos); // Log the match position of the first vowel.
tag = str.slice(0, pos); // The string before the first vowel.
return str.slice(pos) + tag + "ay";
}
}

This works
function pig(str) {
if (/^[aeiou]/i.test(str)) {
return str + 'way';
}
else {
return str.replace(/^(.[^aeiou]+)([aeiou].*)$/i, "$2$1ay");
}
}
console.log(pig('overjoyed'));
console.log(pig('glove'));

I know this is old, but I just recently did something similar in Ruby and thought I'd rewrite in in JavaScript and supply it as an answer.
function translate( words )
{
return words.split(' ').map(function(word){
return word.split(/\b([^a,e,i,o,u]{0,}qu|[^a,e,i,o,u]+)?([a,e,i,o,u][a-z]+)?/i).reverse().join('') + 'ay'
}).join(' ')
}
So in mine I take a string words that can be a single word or a full sentence.
I split it up into an array of words by splitting on spaces.
I then used map to run code on each word in that array, in there I have it split the word using my regex (which I'll explain at the end) which splits it into the first sound if it is anything other than a vowel sound and the rest of the word.
On the word quiet that split actually results in the following array: ["","qu","iet",""], and on the word aqua it results in ["",undefined,"aqua",""].
I can ignore those undefined and empty strings though because when we join it back together they get ignored.
So after it is split up I reverse the array and then join it back together as a word (joining it using an empty string '') and then tack on 'ay' to the end of the resulting string.
Now to explain the regex:
\b says we are looking for the start of a word, it could alternatively be ^ for the start of the string.
([^a,e,i,o,u]{0,}qu|[^a,e,i,o,u]+)? is an optional capture group looking for that first consonant sound, let's further break it up.
So within it we have two alternatives we are looking for, either [^a,e,i,o,u]{0,}qu or [^a,e,i,o,u]+, the first one checks for the first sound containing qu with or without preceding non-vowel characters (so the qu from quiet, or the squ from square get matched instead of stopping before the u), and the second one is checking for all non-vowel letters before the first vowel in the case that there is no qu at the start.
Now that final part of it ([a,e,i,o,u][a-z]+)? is just grabbing from that first vowel on as the rest of that match
I hope someone somewhere finds this useful :)

function translatePigLatin(str) {
if(["a","e","i","o","u"].indexOf(str[0])!==-1){
str+="way";
}else{
while(["a","e","i","o","u"].indexOf(str[0])==-1){
str+=str[0];
str=str.slice(1);
}
str+="ay";
}
return str;
}
translatePigLatin("glove");
this works:
1.check the string's first letter .if the letter is vowel then is easy for us to complete the word with "way"
2.if the first letter is not vowel,remove the first letter to end.
repeat the loop until the "first" letter is vowel.
3.now the string have changed and we add an another string("ay") to the string's end

JavaScript: How can I remove any words containing (or directly preceding) capital letters, numbers, or commas, from a string?

I'm trying to write the code so it removes the "bad" words from the string (the text).
The word is "bad" if it has comma or any special sign thereafter. The word is not "bad" if it contains only a to z (small letters).
So, the result I'm trying to achieve is:
<script>
String.prototype.azwords = function() {
return this.replace(/[^a-z]+/g, "0");
}
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove.".azwords();//should be "good gooood"
//Remove has a capital letter
//remove1 has 1
//remove, has comma
//### has three #
//rem0ve? has 0 and ?
//RemoVE has R and V and E
//remove. has .
alert(res);//should alert "good gooood"
</script>

Try this:
return this.replace(/(^|\s+)[a-z]*[^a-z\s]\S*(?!\S)/g, "");
It tries to match a word (that is surrounded by whitespaces / string ends) and contains any (non-whitespace) character but at least one that is not a-z. However, this is quite complicated and unmaintainable. Maybe you should try a more functional approach:
return this.split(/\s+/).filter(function(word) {
return word && !/[^a-z]/.test(word);
}).join(" ");

okay, first off you probably want to use the word boundary escape \b in your regex. Also, it's a bit tricky if you match the bad words, because a bad word might contain lower case chars, so your current regex will exclude anything which does have lowecase letters.
I'd be tempted to pick out the good words and put them in a new string. It's a much easier regex.
/\b[a-z]+\b/g
NB: I'm not totally sure that it'll work for the first and last words in the string so you might need to account for that as well. http://www.regextester.com/ is exceptionally useful.
EDIT: as you want punctiation after the word to be 'bad', this will actually do what I was suggesting
(^|\s)[a-z]+(\s|$)

Firstly I wouldn't recommend changing the prototype of String (or of any native object) if you can avoid because you leave yourself open to conflicts with other code that might define the same property in different ways. Much better to put custom methods like this on a namespaced object, though I'm sure some will disagree.
Second, is there any need to use RegEx completely? (Genuine question; not trying to be facetious.)
Here is an example of the function with plain old JS using a little bit of RegEx here and there. Easier to comment, debug, and reuse.
Here is the code:
var azwords = function(str) {
var arr = str.split(/\s+/),
len = arr.length,
i = 0,
res = "";
for (i; i < len; i += 1) {
if (!(arr[i].match(/[^a-z]/))) {
res += (!res) ? arr[i] : " " + arr[i];
}
}
return res;
}
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove."; //should be "good gooood"
//Remove has a capital letter
//remove1 has 1
//remove, has comma
//### has three #
//rem0ve? has 0 and ?
//RemoVE has R and V and E
//remove. has .
alert(azwords(res));//should alert "good gooood";

Try this one:
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove.";
var new_one = res.replace(/\s*\w*[#A-Z0-9,.?\\xA1-\\xFF]\w*/g,'');
//Output `good gooood`
Description:
\s* # zero-or-more spaces
\w* # zero-or-more alphanumeric characters
[#A-Z0-9,.?\\xA1-\\xFF] # matches any list of characters
\w* # zero-or-more alphanumeric characters
/g - global (run over all string)

This will find all the words you want /^[a-z]+\s|\s[a-z]+$|\s[a-z]+\s/g so you could use match.
this.match(/^[a-z]+\s|\s[a-z]+$|\s[a-z]+\s/g).join(" "); should return the list of valid words.
Note that this took some time as a JSFiddle so it maybe more efficient to split and iterate your list.

Javascript Regular Expression for Removing all Spaces except for what between double quotes

I have a String that I need to strip out all the spaces except for what between "". Here is the Regex that I am using to strip out spaces.
str.replace(/\s/g, "");
I cant seem to figure out how to get it to ignore spaces between quotes.
Example
str = 'Here is my example "leave spaces here", ok im done'
Output = 'Hereismyexample"leave spaces here",okimdone'

Another way to do it. This has the assumption that no escaping is allowed within double quoted part of the string (e.g. no "leave \" space \" here"), but can be easily modified to allow it.
str.replace(/([^"]+)|("[^"]+")/g, function($0, $1, $2) {
if ($1) {
return $1.replace(/\s/g, '');
} else {
return $2;
}
});
Modified regex to allow escape of " within quoted string:
/([^"]+)|("(?:[^"\\]|\\.)+")/

var output = input.split('"').map(function(v,i){
return i%2 ? v : v.replace(/\s/g, "");
}).join('"');
Note that I renamed the variables because I can't write code with a variable whose name starts with an uppercase and especially when it's a standard constructor of the language. I'd suggest you stick with those guidelines when in doubt.

Rob, resurrecting this question because it had a simple solution that only required one replace call, not two. (Found your question while doing some research for a regex bounty quest.)
The regex is quite short:
"[^"]+"|( )
The left side of the alternation matches complete quoted strings. We will ignore these matches. The right side matches and captures spaces to Group 1, and we know they are the right spaced because they were not matched by the expression on the left.
Here is working code (see demo):
var subject = 'Here is my example "leave spaces here", ok im done';
var regex = /"[^"]+"|( )/g;
replaced = subject.replace(regex, function(m, group1) {
if (group1 == "" ) return m;
else return "";
});
document.write(replaced);
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...

Replace multiple whitespaces with single whitespace in JavaScript string

I have strings with extra whitespace characters. Each time there's more than one whitespace, I'd like it be only one. How can I do this using JavaScript?

Something like this:
var s = " a b c ";
console.log(
s.replace(/\s+/g, ' ')
)

You can augment String to implement these behaviors as methods, as in:
String.prototype.killWhiteSpace = function() {
return this.replace(/\s/g, '');
};
String.prototype.reduceWhiteSpace = function() {
return this.replace(/\s+/g, ' ');
};
This now enables you to use the following elegant forms to produce the strings you want:
"Get rid of my whitespaces.".killWhiteSpace();
"Get rid of my extra whitespaces".reduceWhiteSpace();

Here's a non-regex solution (just for fun):
var s = ' a b word word. word, wordword word ';
// with ES5:
s = s.split(' ').filter(function(n){ return n != '' }).join(' ');
console.log(s); // "a b word word. word, wordword word"
// or ES2015:
s = s.split(' ').filter(n => n).join(' ');
console.log(s); // "a b word word. word, wordword word"
Can even substitute filter(n => n) with .filter(String)
It splits the string by whitespaces, remove them all empty array items from the array (the ones which were more than a single space), and joins all the words again into a string, with a single whitespace in between them.

using a regular expression with the replace function does the trick:
string.replace(/\s/g, "")

I presume you're looking to strip spaces from the beginning and/or end of the string (rather than removing all spaces?
If that's the case, you'll need a regex like this:
mystring = mystring.replace(/(^\s+|\s+$)/g,' ');
This will remove all spaces from the beginning or end of the string. If you only want to trim spaces from the end, then the regex would look like this instead:
mystring = mystring.replace(/\s+$/g,' ');
Hope that helps.

jQuery.trim() works well.
http://api.jquery.com/jQuery.trim/

I know I should not necromancy on a subject, but given the details of the question, I usually expand it to mean:
I want to replace multiple occurences of whitespace inside the string with a single space
...and... I do not want whitespaces in the beginnin or end of the string (trim)
For this, I use code like this (the parenthesis on the first regexp are there just in order to make the code a bit more readable ... regexps can be a pain unless you are familiar with them):
s = s.replace(/^(\s*)|(\s*)$/g, '').replace(/\s+/g, ' ');
The reason this works is that the methods on String-object return a string object on which you can invoke another method (just like jQuery & some other libraries). Much more compact way to code if you want to execute multiple methods on a single object in succession.

var x = " Test Test Test ".split(" ").join("");
alert(x);

Try this.
var string = " string 1";
string = string.trim().replace(/\s+/g, ' ');
the result will be
string 1
What happened here is that it will trim the outside spaces first using trim() then trim the inside spaces using .replace(/\s+/g, ' ').

How about this one?
"my test string \t\t with crazy stuff is cool ".replace(/\s{2,9999}|\t/g, ' ')
outputs "my test string with crazy stuff is cool "
This one gets rid of any tabs as well

If you want to restrict user to give blank space in the name just create a if statement and give the condition. like I did:
$j('#fragment_key').bind({
keypress: function(e){
var key = e.keyCode;
var character = String.fromCharCode(key);
if(character.match( /[' ']/)) {
alert("Blank space is not allowed in the Name");
return false;
}
}
});
create a JQuery function .
this is key press event.
Initialize a variable.
Give condition to match the character
show a alert message for your matched condition.

We Keep Coding

JavaScript is the programming language of the Web.

Removing all special characters except "some" apostrophes - javascript

Related

Regex excluding matches wrapped in specific bbcode tags

Javascript pig latin regex not working

JavaScript: How can I remove any words containing (or directly preceding) capital letters, numbers, or commas, from a string?

Javascript Regular Expression for Removing all Spaces except for what between double quotes

Replace multiple whitespaces with single whitespace in JavaScript string

Categories

Resources