Number is different than itself (trimming strange characters)

Number is different than itself (trimming strange characters) - javascript

I've copied the first number from the windows calculator, and typed the second one. In Chrome console I get:
"‭65033‬" == "65033"
//false
65033‬ == 65033
//Uncaught SyntaxError: Invalid or unexpected token
It seems there is an unknown character at the beginning and end of it.
1) Is there a way to trim all "strange" characters without knowing them a priori?
2) Why does the windows calculator puts such chars in the number?
Edit: Was not explicit in the question, but any chars with valid information, such as ã,ü,ç,¢,£ would also be valid. What I don't want is characters that do not carry any information for the human reader.

Edit: after the edit of the original question, this answer no longer offers a bulletproof solution.
var myNumber = 'foo123bar';
var realNumber = window.parseInt(myNumber.replace(/\D*/g, ''), 10);
What this does?
It replaces all the non-digit characters with empty character and then parses the integer out of numbers left in the string.

A quick solution for this case:
eval("65033‬ == 65033".replace(/[^a-zA-Z0-9 =-_.]/, ''))
You can place your copied text in a string, then remove all unnecessary characters (by explicitly listing the ones that should stay there).
These may include non-alphanumerical characters + hyphen, underscore, equality, space et cetera - actual character that need to stay there will depend on your choice and needs.
Alternatively, you may try to remove all non-printable characters, as suggested here.
Finally, evaluate resulting code. Remember this is not necessarily the best idea for production code.

Related

How to filter out characters that aren't letters, numbers or punctuation

I have a string that will have a lot of formatting things like bullet points or arrows or whatever. I want to clean this string so that it only contains letters, numbers and punctuation. Multiple spaces should be replaced by a single space too.
Allowed punctuation: , . : ; [ ] ( ) / \ ! # # $ % ^ & * + - _ { } < > = ? ~ | "
Basically anything allowed in this ASCII table.
This is what I have so far:
let asciiOnly = y.replace(/[^a-zA-Z0-9\s]+/gm, '')
let withoutSpacing = asciiOnly.replace(/\s{2,}/gm, ' ')
Regex101: https://regex101.com/r/0DC1tz/2
I also tried the [:punct:] tag but apparently it's not supported by javascript. Is there a better way I can clean this string other than regex? A library or something maybe (I didn't find any). If not, how would I do this with regex? Would I have to edit the first regex to add every single character of punctuation?
EDIT: I'm trying to paste an example string in the question but SO just removes characters it doesn't recognize so it looks like a normal string. Heres a paste.
EDIT2: I think this is what I needed:
let asciiOnly = x.replace(/[^\x20-\x7E]+/gm, '')
let withoutSpacing = asciiOnly.replace(/\s{2,}/gm, ' ')
I'm testing it with different cases to make sure.

You can achieve this using below regex, which finds any non-ascii characters (also excludes non-printable ascii characters and excluding extended ascii too) and removes it with empty string.
[^ -~]+
This is assuming you want to retain all printable ASCII characters only, which range from space (ascii value 32) to tilde ~ hence usage of this char set [^ !-~]
And then replaces all one or more white space with a single space
var str = `Determine the values of P∞ and E∞ for each of the following signals: b.
d.
f.
Periodic and aperiodic signals Determine whether or not each of the following signals is periodic:
b.
Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period.
b.
d.
Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals:
b. c.
d. e. f. Figure 1: Problem Set 1.4
Even and Odd Signals
For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero.
b.
d. -------------------------`;
console.log(str.replace(/[^ -~]+/g,'').replace(/\s+/g, ' '));
<!-- begin snippet: js hide: false console: true babel: false -->
console.log(str.replace(/[^ !-~]+/g,'').replace(/\s+/g, ' '));
Also, if you just want to allow all alphanumeric characters and mentioned special characters, then you can use this regex to first retain all needed characters using this regex ,
[^ a-zA-Z0-9,.:;[\]()/\!##$%^&*+_{}<>=?~|"-]+
Replace this with empty string and then replace one or more white spaces with just a single space.
var str = `Determine the values of P∞ and E∞ for each of the following signals: b.
d.
f.
Periodic and aperiodic signals Determine whether or not each of the following signals is periodic:
b.
Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period.
b.
d.
Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals:
b. c.
d. e. f. Figure 1: Problem Set 1.4
Even and Odd Signals
For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero.
b.
d. -------------------------`;
console.log(str.replace(/[^ a-zA-Z0-9,.:;[\]()/\!##$%^&*+_{}<>=?~|"-]+/g,'').replace(/\s+/g, ' '));

This is how i will do. I will remove the all the non allowed character first and than replace the multiple spaces with a single space.
let str = `Determine the values of P∞ and E∞ for each of the following signals: b.
d.
f.
Periodic and aperiodic signals Determine whether or not each of the following signals is periodic:!!!23
b.
Determine whether or not each of the following signals is periodic. If a signal is periodic, specify its fundamental period.
b.
d.
Transformation of Independent variables A continuous-time signal x(t) is shown in Figure 1. Sketch and label carefully each of the following signals:
b. c.
d. e. f. Figure 1: Problem Set 1.4
Even and Odd Signals
For each signal given below, determine all the values of the independent variable at which the even part of the signal is guaranteed to be zero.
b.
d. ------------------------- `
const op = str.replace(/[^\w,.:;\[\]()/\!##$%^&*+{}<>=?~|" -]/g, '').replace(/\s+/g, " ")
console.log(op)
EDIT : In case you want to keep \n or \t as it is use (\s)\1+, "$1" in second regex.

There probably isn't a better solution than a regex. The under-the-hood implementation of regex actions is usually well optimized by virtue of age and ubiquity.
You may be able to explicitly tell the regex handler to "compile" the regex. This is usually a good idea if you know the regex is going to be used a lot within a program, and may help with performance here. But I don't know if javascript exposes such an option.
The idea of "normal punctuation" doesn't have an excellent foundation. There are some common marks like "90°" that aren't ASCII, and some ASCII marks like "" () that you almost certainly don't want. I would expect you to find similar edge cases with any pre-made list. In any case, just explicitly listing all the punctuation you want to allow is better in general, because then no one will ever have to look up what's in the list you chose.
You may be able to perform both substitutions in a single pass, but it's unclear if that will perform better and it almost certainly won't be clearer to any co-workers (including yourself-from-the-future). There will be a lot of finicky details to work out such as whether " ° " should be replaced with "", " ", or " ".

charCodeAt is not behaving as expected

How can this be possible:
var string1 = "🌀", string2 = "🌀🌂";
//comparing the charCode
console.log(string1.charCodeAt(0) === string2.charCodeAt(0)); //true
//comparing the character
console.log(string1 === string2.substring(0,1)); //false
//This is giving me a headache.
http://jsfiddle.net/DerekL/B9Xdk/
If their char codes are the same in both strings, by comparing the character itself should return true. It is true when I put in a and ab. But when I put in these strings, it simply breaks.
Some said that it might be the encoding that is causing the problem. But since it works perfectly fine when there's only one character in the string literal, I assume encoding has nothing to do with it.
(This question addresses the core problem in my previous questions. Don't worry I deleted them already.)

In JavaScript, strings are treated by characters instead of bytes, but only if they can be expressed in 16-bit code points.
A majority of the characters will cause no issues, but in this case they don't "fit" and so they occupy 2 characters as far as JavaScript is concerned.
In this case you need to do:
string2.substring(0, 2) // "🌀"
For more information on Unicode quirkiness, see UTF-8 Everywhere.

Substring parameters are the index where he starts, and the end, where as if you change it to substr, the parameters are index where to start and how many characters.

You can use the method to compare 2 strings:
string1.localeCompare(string2);

Regex for integer, integer + dot, and decimals

I have searched StackOverflow and I can't find an answer as to how to check for regex of numeric inputs for a calculator app that will check for the following format with every keyup (jquery key up):
Any integer like: 34534
When a dot follows the integer when the user is about to enter a decimal number like this: 34534. Note that a dot can only be entered once.
Any float: 34534.093485
I don't plan to use commas to separate the thousands...but I would welcome if anyone can also provide a regex for that.
Is it possible to check the above conditions with just one regex? Thanks in advance.

Is a lone . a successful match or not? If it is then use:
\d+(\.\d*)?|\.\d*
If not then use:
\d+(\.\d*)?|\.\d+
Rather than incorporating commas into the regexes, I recommend stripping them out first: str = str.replace(/,/g, ''). Then check against the regex.
That wouldn't verify that digits are properly grouped into groups of three, but I don't see much value in such a check. If a user types 1,024 and then decides to add a digit (1,0246), you probably shouldn't force them to move the comma.

Let's write our your specifications, and develop from that.
Any integer: \d+
A comma, optionally followed by an integer: \.\d*
Combine the two and make the latter optional, and you get:
\d+\.?\d*
As for handling commas, I'd rather not go into it, as it gets very ugly very fast. You should simply strip all commas from input if you still care about them.

you can use in this way:
[/\d+./]
I think this can be used for any of your queries.
Whether it's 12445 or 1244. or 12445.43

I'm going to throw in a potentially downvoted answer here - this is a better solution:
function valid_float (num) {
var num = (num + '').replace(/,/g, ''), // don't care about commas, this turns `num` into a String
float_num = parseFloat(num);
return float_num == num || float_num + '.' == num; // allow for the decimal point, deliberately using == to ignore type as `num` is a String now
}
Any regex that does your job correctly will come with a big asterisk after it saying "probably", and if it's not spot on, it'll be an absolute pig to debug.
Sure, this answer isn't giving you the most awesomely cool one-liner that's going to make you go "Cool!", but in 6 months time when you realise it's going wrong somewhere, or you want to change it to do something slightly different, it's going to be a hell of a lot easier to see where, and to fix.

I'm using ^(\d)+(.(\d)+)+$ to capture each integer and to have an unlimited length, so long as the string begins and ends with integers and has dots between each integer group. I'm capturing the integer groups so that I can compare them.

regex to validate intl phone number

Can anyone helps me to write a regex that satisfies these conditions to validate international phone number:
it must starts with +, 00 or 011.
the only allowed characters are [0-9],-,.,space,(,)
length is not important
so these tests should pass:
+1 703 335 65123
001 (703) 332-6261
+1703.338.6512
This is my attempt ^\+?(\d|\s|\(|\)|\.|\-)+$ but it's not working properly.

To clean up the regexp use square-brackets to define "OR" situations of characters, instead of |.
Below is a rewritten version of your regular-expression, matching the provided description.
/^(?:\+|00|011)[0-9 ().-]+$/
What is the use of ?:?
When doing ?: directly inside a parenthesis it's for telling the regular-expression engine that you'd want to group something, but not store away the information for later use.

with only 1 space and more successive space is not allowed ( note the " ?" at the end of second group)
(\+|00|011)([\d-.()]+ ?)+$
faster (i guess) with adding passive groups modifier (?:) at the beginnings of each group
(?:\+|00|011)(?:[\d-.()]+ ?)+$
you can use some regex cheat sheets like this one and Linqpad for faster tuning this regex to your needs.
in case you are not familiar with Linqpad, you should just copy & paste this next block to it and change language to C# statements and press F5
string pattern = #"^(?:\+|00|011)(?:[\d-.()]+ ?)+$";
Regex.IsMatch("+1 703 335 65123", pattern).Dump();
Regex.IsMatch("001 (703) 332-6261",pattern).Dump();
Regex.IsMatch("+1703.338.6512",pattern).Dump();

^(?:\+|00|011)[\d. ()-]*$
To specify a length (in case you do care about length later on), use the following:
^(?:\+|00|011)(?:[. ()-]*\d){11,12}[. ()-]*$
And you could obviously change the 11,12 to whatever you want. And just for fun, this also does the same exact thing as the one above:
^(?:\+|00|011)[. ()-]*(?:\d[. ()-]*){11,12}$

I'd go for a completely different route (in fact I had the same problem as you at one point, except I did it in Java).
The plan here is to take the input, make replacements on it and check that the input is empty:
first substitute \s* with nothing, globally;
then substitute \(\d+\) by nothing, globally;
then substitute ^(\+|00|011)\d+([-.]\d+)*$ by nothing.
after these, if the result string is empty, you have a match, otherwise you don't.
Since I did it in Java, I found Google's libphonenumber since then and have dropped that. But it still works:
fge#erwin ~ $ perl -ne '
> s,\s*,,g;
> s,\(\d+\),,g;
> s,^(\+|00|011)\d+([-.]\d+)*$,,;
> printf("%smatch\n", $_ ? "no " : "");
> '
+1 703 335 65123
match
001 (703) 332-6261
match
+1703.338.6512
match
+33209283892
match
22989018293
no match
Note that a further test is required to see if the input string is at least of length 1.

Try this:
^(\([+]?\d{1,3}\)|([+0]?\d{1,3}))?( |-)?(\(\d{1,3}\)|\d{1,3})( |-)?\d{3}( |-)?\d{4}$
It is compatible with E164 standard along with some combinations of brackets, space and hyphen.

Need regex to match unformatted phone number syntax

I need a regex for Javascript that will match a phone number stripped of all characters except numbers and 'x' (for extension). Here are some example formats:
12223334444
2223334444
2223334444x5555
You are guaranteed to always have a minimum of 10 numerical digits, as the leading '1' and extension are optional. There is also no limit on the number of numerical digits that may appear after the 'x'. I want the numbers to be split into the following backreferences:
(1)(222)(333)(4444)x(5555)
The parenthesis above demonstrate how I want the number to be split up into backreferences. The first set of parenthesis would be assigned to backreference $1, for example.
So far, here is what I've come up with for a regex. Keep in mind that I'm not really that great with regex, and regexlib.com hasn't really helped me out in this department.
(\d{3})(\d{3})(\d{4})
The above regex handles the 2nd case in my list of example test cases in my first code snippet above. However, this regex needs to be modified to handle both the optional '1' and extension. Any help on this? Thanks!

Regex option seems perfectly fine to me.
var subject = '2223334444';
result = subject.replace(/^1?(\d{3})(\d{3})(\d{4})(x\d+)?$/mg, "1$1$2$3$4");
alert(result);
if(!result.match(/^\d{11}(?:x\d+)?/))
alert('The phone number came out invalid. Perhaps it was entered incorrectly');
This will say 12223334444 when there is no extension
I expect you want to tweak this out some, let me know how it should be.

If I were you, I would not go with a regular expression for this — it would cause more headaches than it solved. I would:
Split the phone number on the "x", store the last part in the extension.
See how long the initial part is, 9 or 10 digits
If it's 10 digits, check that the first is a 1, slice it off, and then continue with the 9-digit process:
If it's 9 digits, split it up into 3-3-4 and split them into area code, exchange, number.
Validate the area code and exchange code according to the rules of the NANP.
This will validate your phone number and be much, much easier and will make it possible for you to enforce rules like "no X11 area codes" or "no X11 exchange codes" more-easily — you'd have to do this anyway, and it's probably easier to just use plain string manipulation to split it into substrings.

I did a bit more testing and here's a solution I've found. I haven't found a case where this breaks yet, but if someone sees something wrong with it please let me know:
(1)?(\d{3})(\d{3})(\d{4})(?:x(\d+))?
Update:
I've revised the regex above to handle some more edge cases. This new version will fail completely if something unexpected is present.
(^1|^)(\d{3})(\d{3})(\d{4})($|(?:x(\d+))$)

My regex is:
/\+?[0-9\-\ \(\)]{10,22}/g

We Keep Coding

JavaScript is the programming language of the Web.

Number is different than itself (trimming strange characters) - javascript

Related

How to filter out characters that aren't letters, numbers or punctuation

charCodeAt is not behaving as expected

Regex for integer, integer + dot, and decimals

regex to validate intl phone number

Need regex to match unformatted phone number syntax

Categories

Resources