I have a text file which has strings separated by whitespace. The text file contains some special characters (latin, currency, punctuations etc.) Which need to be discarded from final output. Please note that legal characters are all characters in Unicode except these special characters.
We need to separate/split text by whitespaces and then remove only leading and trailing special characters. If special characters are in between two legal characters then we won't remove them.
I can easily do it in two phases. Split text by whitespaces and then remove only leading and trailing special characters from each string. However, I need to process string only once. Is there any way, it could be achieved in one pass. Note: We can't use RegEx.
For this question assume that these characters are special:
[: , ! . < ; ' " > [ ] { } ` ~ = + - ? / ]
Example:
:!/,.<;:.?;,BBM!/,.<;:.?;,` IS TALKING TO `B!?AM!/,.<;:.?;,
Here output would be an array of valid strings: ["BBM", "IS", "TALKING", "TO", "B!?AM"]
Make simple state machine (finite automata)
Walk in a loop through all chars
At every step check if current char is letter, space or special
Execute some operation (perhaps empty) depending on state and char kind
Change state if needed
for example, you may stay in "special" state until letter is met. Remember starting index of the word and make state "inside word". Continue until special char or space is met (it is still not clear from your question).
I have used typescript and have done it in a single pass.
Please note that isSpecialCharacterCode(charCode) function simply checks whether unicode of text character is same as unicode of provided special characters.Same is true for isWhitespaceCode(charCode) function.
parseText(text: string): string[]{
let words : string[] = [];
let word = "";
let charCode = 1;
let haveSeenLegalChar = false; //set it if we have encountered legal character in text
let seenSpecialCharsToInclude = false; //set it if we have encountered //special character in text
let inBetweenSpecialChars = ""; // string containing special chars //which may be included in between legal word
for(let index = 0; index < text.length; index++){
charCode = text.charCodeAt(index);
let isSpecialChar = isSpecialCharacterCode(charCode);
let isWhitespace = isWhitespaceCode(charCode);
if(isSpecialChar && !isWhitespace){
//if this is a special character then two cases
//first is: It can be part of word (it is only possible if we have already seen atleast one legal character)
//Since it can be part of word but we are not sure whether this will be part of word so store it for now
//second is: This is either leading or trailing special character..we should not include these in word
if(haveSeenLegalChar){
inBetweenSpecialChars += text[index];
seenSpecialCharsToInclude = true;
}else{
//since we have not seen any legal character till now so it must be either leading or trailing special chars
seenSpecialCharsToInclude = false;
inBetweenSpecialChars = "";
}
}else if(isWhitespace){
//we have encountered a whitespace.This is either beginning of word or ending of word.
//if we have encountered any leagl char, push word into array
if(haveSeenLegalChar){
words.push(word);
word = "";
inBetweenSpecialChars = "";
}
haveSeenLegalChar = false;
}else if(!isSpecialChar){
//legal character case
haveSeenLegalChar = true;
if(seenSpecialCharsToInclude){
word += inBetweenSpecialChars;
seenSpecialCharsToInclude = false;
inBetweenSpecialChars = "";
}
word += text[index];
}
}
return words;
}
I have this string ‘Some string here’. I want to remove these weird characters(‘, ’) from this string. I am currently using replace() function but it does not replace it with empty string. Below is the script. How can I remove it?
for (var i = 0, len = el.length; i < len; i++) {
$(el[i]).text().replace("‘", "");
}
you have to just remove the elements whose ascii value is less then 127
var input="‘Some string here’.";
var output = "";
for (var i=0; i<input.length; i++) {
if (input.charCodeAt(i) <= 127) {
output += input.charAt(i);
}
}
alert(output);//Some string here.
fiddle link
OR
remove your loop and try
$(el[i]).text().replace("‘","").replace("’","");
Those weird characters probably aren't so weird; they're most likely a symptom of a character encoding problem. At a guess, they're smart quotes that aren't showing up correctly.
Rather than try to strip them out of your text, you should update your page so it displays as UTF-8. Add this in your page header:
<meta charset="utf-8" />
So why does this happen? Basically, most character encodings are the same for "simple" text - letters, numbers, some symbols - but have different representations for less common characters (accents, other alphabets, less common symbols, etc). When your browser gets a document without any indication of its character encoding, the browser will make a guess. Sometimes it gets it wrong, and you see weird characters like ‘ instead of what you expected.
This code works fine for me:
"‘Some string here’".replace("‘","").replace("’","");
Created a fiddle for your problem solution
Code Snippet:
var str = "‘Some string hereâ€";
str = str.replace("‘", "");
str = str.replace("â€", "");
alert(str);
.filter('SpecialCharacterToSingleQuote', function() {
return function(text) {
return text ? String(text).replace(/â/g, "'").replace(/|||/g, "") : '';
};
});
I have input as 23 digit key from input box which will be separated by '-'.
E.g: XXXXX-XXXXX-XXXXX-XXXXX
This is expected format means, 5 digit followed by -(hyphen).
Problem:
User can input any data/wrong format, like XXX-XXXXX-XXXXX-XXXXXXX, in this case index of hyphen is invalid. How can I valided the index of hyphen?
I tried:
if((prd_len==23) && (n!=-1))
{
var indices = [];
for(var i=0; i<prd_id.length;i++)
{
if (prd_id[i] === "-")
{
indices.push(i);
}
}
for(var x=0;x<indices.length;x++)
{
if((indices[x]!=5) || (indices[x]!=11) || (indices[x]!=17))
{
$('#msgErr1').text('Please enter valid key.');
flag=1;
}
}
}
where prd_len=length of the accepted input from user.
Try regular expressions
if(input.match(/^(\d{5}-){3}\d{5}$/))
everything is OK
This expression basically reads "five digits and a dash - three times, then five digits". For further reference see
http://www.regular-expressions.info/
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
As thg435 said, but more human-readable :-)
var correct = input.match(/^\d\d\d\d\d-\d\d\d\d\d-\d\d\d\d\d-\d\d\d\d\d$)
I would like to limit the substr by words and not chars. I am thinking regular expression and spaces but don't know how to pull it off.
Scenario: Limit a paragraph of words to 200 words using javascript/jQuery.
var $postBody = $postBody.substr(' ',200);
This is great but splits words in half :) Thanks ahead of time!
function trim_words(theString, numWords) {
expString = theString.split(/\s+/,numWords);
theNewString=expString.join(" ");
return theNewString;
}
if you're satisfied with a not-quite accurate solution, you could simply keep a running count on the number of space characters within the text and assume that it is equal to the number of words.
Otherwise, I would use split() on the string with " " as the delimiter and then count the size of the array that split returns.
very quick and dirty
$("#textArea").val().split(/\s/).length
I suppose you need to consider punctuation and other non-word, non-whitespace characters as well. You want 200 words, not counting whitespace and non-letter characters.
var word_count = 0;
var in_word = false;
for (var x=0; x < text.length; x++) {
if ( ... text[x] is a letter) {
if (!in_word) word_count++;
in_word = true;
} else {
in_word = false;
}
if (!in_word && word_count >= 200) ... cut the string at "x" position
}
You should also decide whether you treat digits as a word, and whether you treat single letters as a word.
I need to find a reg ex that only allows alphanumeric. So far, everyone I try only works if the string is alphanumeric, meaning contains both a letter and a number. I just want one what would allow either and not require both.
/^[a-z0-9]+$/i
^ Start of string
[a-z0-9] a or b or c or ... z or 0 or 1 or ... 9
+ one or more times (change to * to allow empty string)
$ end of string
/i case-insensitive
Update (supporting universal characters)
if you need to this regexp supports universal character you can find list of unicode characters here.
for example: /^([a-zA-Z0-9\u0600-\u06FF\u0660-\u0669\u06F0-\u06F9 _.-]+)$/
this will support persian.
If you wanted to return a replaced result, then this would work:
var a = 'Test123*** TEST';
var b = a.replace(/[^a-z0-9]/gi, '');
console.log(b);
This would return:
Test123TEST
Note that the gi is necessary because it means global (not just on the first match), and case-insensitive, which is why I have a-z instead of a-zA-Z. And the ^ inside the brackets means "anything not in these brackets".
WARNING: Alphanumeric is great if that's exactly what you want. But if you're using this in an international market on like a person's name or geographical area, then you need to account for unicode characters, which this won't do. For instance, if you have a name like "Âlvarö", it would make it "lvar".
Use the word character class. The following is equivalent to a ^[a-zA-Z0-9_]+$:
^\w+$
Explanation:
^ start of string
\w any word character (A-Z, a-z, 0-9, _).
$ end of string
Use /[^\w]|_/g if you don't want to match the underscore.
/^([a-zA-Z0-9 _-]+)$/
the above regex allows spaces in side a string and restrict special characters.It Only allows
a-z, A-Z, 0-9, Space, Underscore and dash.
^\s*([0-9a-zA-Z]*)\s*$
or, if you want a minimum of one character:
^\s*([0-9a-zA-Z]+)\s*$
Square brackets indicate a set of characters. ^ is start of input. $ is end of input (or newline, depending on your options). \s is whitespace.
The whitespace before and after is optional.
The parentheses are the grouping operator to allow you to extract the information you want.
EDIT: removed my erroneous use of the \w character set.
For multi-language support:
var filtered = 'Hello Привет 你好 123_456'.match(/[\p{L}\p{N}\s]/gu).join('')
console.log(filtered) // --> "Hello Привет 你好 123456"
This matches any letter, number, or space in most languages.
[...] -> Match with conditions
[ab] -> Match 'a' OR 'b'
\p{L} -> Match any letter in any language
\p{N} -> Match any number in any language
\s -> Match a space
/g -> Don't stop after first match
/u -> Support unicode pattern matching
Ref: https://javascript.info/regexp-unicode
This will work
^(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9]+$
It accept only alphanumeriuc characters alone:
test cases pased :
dGgs1s23 - valid
12fUgdf - valid,
121232 - invalid,
abchfe - invalid,
abd()* - invalid,
42232^5$ - invalid
or
You can also try this one. this expression satisfied at least one number and one character and no other special characters
^(?=.*[0-9])(?=.*[a-zA-Z])([a-zA-Z0-9]+)$
in angular can test like:
$scope.str = '12fUgdf';
var pattern = new RegExp('^(?=.*[0-9])(?=.*[a-zA-Z])([a-zA-Z0-9]+)$');
$scope.testResult = pattern.test($scope.str);
PLUNKER DEMO
Refered:Regular expression for alphanumeric in Angularjs
Instead of checking for a valid alphanumeric string, you can achieve this indirectly by checking the string for any invalid characters. Do so by checking for anything that matches the complement of the valid alphanumeric string.
/[^a-z\d]/i
Here is an example:
var alphanumeric = "someStringHere";
var myRegEx = /[^a-z\d]/i;
var isValid = !(myRegEx.test(alphanumeric));
Notice the logical not operator at isValid, since I'm testing whether the string is false, not whether it's valid.
I have string similar to Samsung Galaxy A10s 6.2-Inch (2GB,32GB ROM) Android 9.0, (13MP+2MP)+ 8MP Dual SIM 4000mAh 4G LTE Smartphone - Black (BF19)
Below is what i did:
string.replace(/[^a-zA-Z0-9 ,._-]/g, '').split(',').join('-').split(' ').join('-').toLowerCase()
Notice i allowed ,._- then use split() and join() to replace , to - and space to - respectively.
I ended up getting something like this:
samsung-galaxy-a10s-6.2-inch-2gb-32gb-rom-android-9.0-13mp-2mp-8mp-dual-sim-4000mah-4g-lte-smartphone-black-bf19-20 which is what i wanted.
There might be a better solution but this is what i found working fine for me.
Extend the string prototype to use throughout your project
String.prototype.alphaNumeric = function() {
return this.replace(/[^a-z0-9]/gi,'');
}
Usage:
"I don't know what to say?".alphaNumeric();
//Idontknowwhattosay
Even better than Gayan Dissanayake pointed out.
/^[-\w\s]+$/
Now ^[a-zA-Z0-9]+$ can be represented as ^\w+$
You may want to use \s instead of space. Note that \s takes care of whitespace and not only one space character.
Input these code to your SCRATCHPAD and see the action.
var str=String("Blah-Blah1_2,oo0.01&zz%kick").replace(/[^\w-]/ig, '');
JAVASCRIPT to accept only NUMBERS, ALPHABETS and SPECIAL CHARECTERS
document.getElementById("onlynumbers").onkeypress = function (e) {
onlyNumbers(e.key, e)
};
document.getElementById("onlyalpha").onkeypress = function (e) {
onlyAlpha(e.key, e)
};
document.getElementById("speclchar").onkeypress = function (e) {
speclChar(e.key, e)
};
function onlyNumbers(key, e) {
var letters = /^[0-9]/g; //g means global
if (!(key).match(letters)) e.preventDefault();
}
function onlyAlpha(key, e) {
var letters = /^[a-z]/gi; //i means ignorecase
if (!(key).match(letters)) e.preventDefault();
}
function speclChar(key, e) {
var letters = /^[0-9a-z]/gi;
if ((key).match(letters)) e.preventDefault();
}
<html>
<head></head>
<body>
Enter Only Numbers:
<input id="onlynumbers" type="text">
<br><br>
Enter Only Alphabets:
<input id="onlyalpha" type="text" >
<br><br>
Enter other than Alphabets and numbers like special characters:
<input id="speclchar" type="text" >
</body>
</html>
A little bit late, but this worked for me:
/[^a-z A-Z 0-9]+/g
a-z : anything from a to z.
A-Z : anything from A to Z (upper case).
0-9 : any number from 0 to 9.
It will allow anything inside square brackets, so let's say you want to allow any other character, for example, "/" and "#", the regex would be something like this:
/[^a-z A-Z 0-9 / #]+/g
This site will help you to test your regex before coding.
https://regex101.com/
Feel free to modify and add anything you want into the brackets.
Regards :)
It seems like many users have noticed this these regular expressions will almost certainly fail unless we are strictly working in English. But I think there is an easy way forward that would not be so limited.
make a copy of your string in all UPPERCASE
make a second copy in all lowercase
Any characters that match in those strings are definitely not alphabetic in nature.
let copy1 = originalString.toUpperCase();
let copy2 = originalString.toLowerCase();
for(let i=0; i<originalString.length; i++) {
let bIsAlphabetic = (copy1[i] != copy2[i]);
}
Optionally, you can also detect numerics by just looking for digits 0 to 9.
Try this... Replace you field ID with #name...
a-z(a to z),
A-Z(A to Z),
0-9(0 to 9)
jQuery(document).ready(function($){
$('#name').keypress(function (e) {
var regex = new RegExp("^[a-zA-Z0-9\s]+$");
var str = String.fromCharCode(!e.charCode ? e.which : e.charCode);
if (regex.test(str)) {
return true;
}
e.preventDefault();
return false;
});
});
Save this constant
const letters = /^[a-zA-Z0-9]+$/
now, for checking part use .match()
const string = 'Hey there...' // get string from a keyup listner
let id = ''
// iterate through each letters
for (var i = 0; i < string.length; i++) {
if (string[i].match(letters) ) {
id += string[i]
} else {
// In case you want to replace with something else
id += '-'
}
}
return id
Alphanumeric with case sensitive:
if (/^[a-zA-Z0-9]+$/.test("SoS007")) {
alert("match")
}
Also if you were looking for just Alphabetical characters, you can use the following regular expression:
/[^a-zA-Z]/gi
Sample code in typescript:
let samplestring = "!#!&34!# Alphabet !!535!!! is safe"
let regex = new RegExp(/[^a-zA-Z]/gi);
let res = samplestring.replace(regex,'');
console.log(res);
Note: if you are curious about RegEx syntax, visit regexr and either use the cheat-sheet or play with regular expressions.
Edit: alphanumeric --> alphabetical
Only accept numbers and letters (No Space)
function onlyAlphanumeric(str){
str.value=str.value.replace(/\s/g, "");//No Space
str.value=str.value.replace(/[^a-zA-Z0-9 ]/g, "");
}
<div>Only accept numbers and letters </div>
<input type="text" onKeyUp="onlyAlphanumeric(this);" >
Here is the way to check:
/**
* If the string contains only letters and numbers both then return true, otherwise false.
* #param string
* #returns boolean
*/
export const isOnlyAlphaNumeric = (string: string) => {
return /^(?=.*[a-zA-Z])(?=.*[0-9])[a-zA-Z0-9]+$/.test(string);
}
Jquery to accept only NUMBERS, ALPHABETS and SPECIAL CHARECTERS
<html>
<head>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
</head>
<body>
Enter Only Numbers:
<input type="text" id="onlynumbers">
<br><br>
Enter Only Alphabets:
<input type="text" id="onlyalpha">
<br><br>
Enter other than Alphabets and numbers like special characters:
<input type="text" id="speclchar">
<script>
$('#onlynumbers').keypress(function(e) {
var letters=/^[0-9]/g; //g means global
if(!(e.key).match(letters)) e.preventDefault();
});
$('#onlyalpha').keypress(function(e) {
var letters=/^[a-z]/gi; //i means ignorecase
if(!(e.key).match(letters)) e.preventDefault();
});
$('#speclchar').keypress(function(e) {
var letters=/^[0-9a-z]/gi;
if((e.key).match(letters)) e.preventDefault();
});
</script>
</body>
</html>
**JQUERY to accept only NUMBERS , ALPHABETS and SPECIAL CHARACTERS **
<!DOCTYPE html>
$('#onlynumbers').keypress(function(e) {
var letters=/^[0-9]/g; //g means global
if(!(e.key).match(letters)) e.preventDefault();
});
$('#onlyalpha').keypress(function(e) {
var letters=/^[a-z]/gi; //i means ignorecase
if(!(e.key).match(letters)) e.preventDefault();
});
$('#speclchar').keypress(function(e) {
var letters=/^[0-9a-z]/gi;
if((e.key).match(letters)) e.preventDefault();
});
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">
Enter Only Numbers:
Enter Only Alphabets:
Enter other than Alphabets and numbers like special characters:
</body>
</html>