Numeric Regex Expression for Detection Using Javascript

Numeric Regex Expression for Detection Using Javascript - javascript

I am completely new to regex hence the long question.
I would like to know about the regex expression codes to detect different types of numbers in a html paragraph tag.
Integer number (eg: 0 , 1,000 , 1000 , 028, -1 , etc)
Floating number (eg: 2.3 , 2.13 , 0.18 , .18 , -1.2 , etc)
or regex that can combine both 1. & 2. -- all integer and float number together will be so good! I tried some solution in Stackoverflow but the results are always undefined/null, else not detectable already
Ratio (eg: 1:3:4 detect as a whole if possible)
Fractional number (eg: 0/485 , 1/1006 , 2b/3 , etc)
Percentage number (eg: 15.5% , (15.5%) , 15% , 0.9%, .9%)
Also, would like to know if regex can detect symbols and numbers together in a whole (15.5% , 1:3:4), or must they be split into different parts before the detection of number can be performed (eg: 15.5 + % , 1 + : + 3 + : + 4 ) ?
These different expressions are meant to be written into Javascript code as different exceptions of cases later on. The expressions are planned to be used like the regex that detects basic integer in attached Javascript snippet below:
var paragraphText = document.getElementById("detect").innerHTML;
var allNumbers = paragraphText.match( /\d+/g ) + '';
var numbersArray = allNumbers.split(',');
for (i = 0; i < numbersArray.length; i++) {
//console.log(numbersArray[i]);
numbersArray[i] = "<span>" + numbersArray[i] + "</span>";
console.log(numbersArray[i]);
}
});
Thank you very much for your help!

The following are simple implementations:
'2,13.00'.match(/[.,\d]+/g) // 1 & 2
'1:3:4'.match(/[:\d]+/g) // 3
'0/485'.match(/[\/\d]+/g) // 4
'15.5%'.match(/[.%\d]+/g) // 5
You can loop through them using for statement, and check if one is detected and break, or continue otherwise.

For decimals numbers:
-> ((?:\d+|)(?:\.|)(?:\d+))
For percentage numbers : It is the same as decimal numbers followed by % symbol
-> ((?:\d+|)(?:\.|)(?:\d+))%
For whole numbers: the following regex would work and would exclude any decimal numbers as well, returning you just the integers
-> (^|[^\d.])\b\d+\b(?!\.\d)
For the ration requirement, I have created a complicated one, but you would get the entire ratio as a whole.
-> (((?:\d+|)(?:\.|)(?:\d+)):)*((?:\d+|)(?:\.|)(?:\d+))

Related

How do I get the UTF code value "0041" from "A" by using only Javascript? [duplicate]

I'm using a barcode scanner to read a barcode on my website (the website is made in OpenUI5).
The scanner works like a keyboard that types the characters it reads. At the end and the beginning of the typing it uses a special character. These characters are different for every type of scanner.
Some possible characters are:
█
▄
–
—
In my code I use if (oModelScanner.oData.scanning && oEvent.key == "\u2584") to check if the input from the scanner is ▄.
Is there any way to get the code from that character in the \uHHHH style? (with the HHHH being the hexadecimal code for the character)
I tried the charCodeAt but this returns the decimal code.
With the codePointAt examples they make the code I need into a decimal code so I need a reverse of this.

Javascript strings have a method codePointAt which gives you the integer representing the Unicode point value. You need to use a base 16 (hexadecimal) representation of that number if you wish to format the integer into a four hexadecimal digits sequence (as in the response of Nikolay Spasov).
var hex = "▄".codePointAt(0).toString(16);
var result = "\\u" + "0000".substring(0, 4 - hex.length) + hex;
However it would probably be easier for you to check directly if you key code point integer match the expected code point
oEvent.key.codePointAt(0) === '▄'.codePointAt(0);
Note that "symbol equality" can actually be trickier: some symbols are defined by surrogate pairs (you can see it as the combination of two halves defined as four hexadecimal digits sequence).
For this reason I would recommend to use a specialized library.
you'll find more details in the very relevant article by Mathias Bynens

var hex = "▄".charCodeAt(0).toString(16);
var result = "\\u" + "0000".substring(0, 4 - hex.length) + hex;

If you want to print the multiple code points of a character, e.g., an emoji, you can do this:
const facepalm = "🤦🏼‍♂️";
const codePoints = Array.from(facepalm)
.map((v) => v.codePointAt(0).toString(16))
.map((hex) => "\\u{" + hex + "}");
console.log(codePoints);
["\u{1f926}", "\u{1f3fc}", "\u{200d}", "\u{2642}", "\u{fe0f}"]
If you are wondering about the components and the length of 🤦🏼‍♂️, check out this article.

JavaScript Regex not matching mobile number with international code

Am trying to validate a mobile number 254777123456 against a regex /^((254|255)[0-9]+){9,15}$/, the mobile number should be prefixed with the country codes specified but the total length of the mobile number should not be more than 15 characters, doing this via javascript am getting null, can anyone point out what am doing wrong.
PS. Am using way more country codes than the ones I specified, I just put those two as a test before I add the others because they will all be separated by the pipe.

Your regex ^((254|255)[0-9]+){9,15}$ means, that pick at least 4 digits (of which first 3 should be either 254 or 255) and whole of them must occur at least 9 times to max 15 times, which will mean the minimum length of string that will match should be of 36 characters. Which obviously you don't want. Your regex needs little correction where you need to take [0-9] part out and have {9,12} quantifier separately. Correct regex to be used should be this,
^(?:(?:254|255)[0-9]{9,12})$
This regex will match 254 or 255 separately and will restrict remaining number to match from 9 to 12 (as you want max number to be matched of length 15 where 3 numbers we have already separated out)
Demo
var nums = ['254777123456','255777123456','255777123456123','2557771234561231']
for (n of nums) {
console.log(n + " --> " + /^(?:(?:254|255)[0-9]{9,12})$/g.test(n));
}

Regex character count, but some count for three

I'm trying to build a regular expression that places a limit on the input length, but not all characters count equal in this length. I'll put the rationale at the bottom of the question. As a simple example, let's limit the maximum length to 12 and allow only a and b, but b counts for 3 characters.
Allowed are:
aa (anything less than 12 is fine).
aaaaaaaaaaaa (exactly 12 is fine).
aaabaaab (6 + 2 * 3 = 12, which is fine).
abaaaaab (still 6 + 2 * 3 = 12).
Disallowed is:
aaaaaaaaaaaaa (13 a's).
bbbba (1 + 4 * 3 = 13, which is too much).
baaaaaaab (7 + 2 * 3 = 13, which is too much).
I've made an attempt that gets fairly close:
^(a{0,3}|b){0,4}$
This matches on up to 4 clusters that may consist of 0-3 a's or one b.
However, it fails to match on my last positive example: abaaaaab, because that forces the first cluster to be the single a at the beginning, consumes a second cluster for the b, then leaves only 2 more clusters for the rest, aaaaab, which is too long.
Constraints
Must run in JavaScript. This regex is supplied to Qt, which apparently uses JavaScript's syntax.
Doesn't really need to be fast. In the end it'll only be applied to strings of up to 40 characters. I hope it validates within 50ms or so, but slightly slower is acceptable.
Rationale
Why do I need to do this with a regular expression?
It's for a user interface in Qt via PyQt and QML. The user can type a name in a text field here for a profile. This profile name is url-encoded (special characters are replaced by %XX), and then saved on the user's file system. We encounter problems when the user types a lot of special characters, such as Chinese, which then encode to a very long file name. Turns out that at somewhere like 17 characters, this file name becomes too long for some file systems. The URL-encoding encodes as UTF-8, which has up to 4 bytes per character, resulting in up to 12 characters in the file name (as each of these gets percent-encoded).
16 characters is too short for profile names. Even some of our default names exceed that. We need a variable limit based on these special characters.
Qt normally allows you to specify a Validator to determine which values are acceptable in a text box. We tried implementing such a validator, but that resulted in a segfault upstream, due to a bug in PyQt. It can't seem to handle custom Validator implementations at the moment. However, PyQt also exposes three built-in validators. Two apply only to numbers. The third is a regex validator that allows you to put a regular expression that matches all valid strings. Hence the need for this regular expression.

There is no real straightforward way to do this, given the limitations of regexp. You're going to have to test for all combinations, such as thirteen b with up to one a, twelve b with up to four a, and so on. We will build a little program to generate these for us. The basic format for testing for up to four a will be
/^(?=([^a]*a){0,4}[^a]*$)/
We'll write a little routine to create these lookaheads for us, given some letter and a minimum and maximum number of occurrences:
function matchLetter(c, m, n) {
return `(?=([^${c}]*${c}){${m},${n}}[^${c}]*$)`;
}
> matchLetter('a', 0, 4)
< "(?=([^a]*a){0,4}[^a]*$)"
We can combine these to test for three b with up to three a:
/^(?=([^b]*b){3}[^b]*$)(?=([^a]*a){0,3}[^a]*$)/
We will write a function to create such combined lookaheads which matches exactly m occurrences of c1 and up to n occurrences of c2:
function matchTwoLetters(c1, m, c2, n) {
return matchLetter(c1, m, m) + matchLetter(c2, 0, n);
}
We can use this to match exactly twelve b and up to four a, for a total of forty or less:
> matchTwoLetters('b', 12, 'a', 1, 4)
< "(?=([^b]*b){12,12}[^b]*$)(?=([^a]*a){0,4}[^a]*$)"
It remains to simply create versions of this for each count of b, and glom them together (for the case of a max count of 12):
function makeRegExp() {
const res = [];
for (let bs = 0; bs <= 4; bs++)
res.push(matchTwoLetters('b', bs, 'a', 12 - bs*3));
return new RegExp(`^(${res.join('|')})`);
}
> makeRegExp()
< "^((?=([^b]*b){0,0}[^b]*$)(?=([^a]*a){0,12}[^a]*$)|(?=([^b]*b){1,1}[^b]*$)(?=([^a]*a){0,9}[^a]*$)|(?=([^b]*b){2,2}[^b]*$)(?=([^a]*a){0,6}[^a]*$)|(?=([^b]*b){3,3}[^b]*$)(?=([^a]*a){0,3}[^a]*$)|(?=([^b]*b){4,4}[^b]*$)(?=([^a]*a){0,0}[^a]*$))"
Now you can do the test with
makeRegExp().test("baabaaa");
For the case of length=40, the regxp is 679 characters long. A very rough benchmark shows that it executes in under a microsecond.

If you want to count bytes when multibyte encoding is present, you can use this function:
function bytesLength(str) {
var s = str.length;
for (var i = s-1; i > -1; i--) {
var code = str.charCodeAt(i);
if (code > 0x7f && code <= 0x7ff) {s++;}
else if (code > 0x7ff && code <= 0xffff) {s+=2;}
if (code >= 0xDC00 && code <= 0xDFFF) {i--;}
}
return s;
}
console.log(bytesLength('敗')); // length 3

Try using something like this:
^((a{1,3}|b){1,4}|(a{1,4}|a?b|ba){1,3}|((a{2,3}|b){2}|aaba|abaa){2})$
Example: https://regex101.com/r/yTTiEX/6
This breaks it up into the logical possibilities:
4 parts, each with a value up to 3.
3 parts, each with a value up to 4.
2 parts, each with a value up to 6.

Can anyone explain this process with converting decimal numbers to Binary

I have looked around the internet for a way to convert decimal numbers into binary numbers. and i found this piece of code in some forum.
var number = prompt("Type a number!") //Asks user to input a number
var converted = []; // creates an array with nothing in it
while(number>=1) { //While the number the user typed is over or equal to 1 its shoud loop
converted.unshift(number%2); // takes the "number" and see if you can divid it by 2 and if theres any rest it puts a "1" otherwise "0"
number = Math.floor(number/2); // Divides the number by 2, then starts over again
}
console.log(converted)
I'm not understanding everything completely, so i made some comments of what i think the pieces of code do. But anyone that can explain in more detail? or is the way i think the code does correct?

This code is based on a technique for converting decimal numbers to binary.
If I take a decimal number. I divide it by two and get the remainder which will either be 0 or 1. Once you divide 57 all the way down to 0. You get the binary number for example:
57 / 2 = 28 r 1; 28 / 2 = 14 r 0; 14 / 2 = 7 r 0; 7 / 2 = 3 r 1; 3 / 2 = 1 r 1; 1 / 2 = 0 r 1;
The remainders are the binary number. Sorry if it's a bit hard to read. I definitely recommend writing it out on paper. Read from the last remainder to the first, the remainders look like this: 111001
Reverse it to make it correct. array.unshift() can do this or you could use array.push() then array.reverse() after the while loop. Unshift() is probably a better approach.
57 in decimal is equal to 111001, which you can check.
BTW, this algorithm works for other bases, as long you are converting from decimal. Or at least as far as I know.
I hope this helped.

It seems like you've got the gist of it down.
Let's start with a random number:
6 === 110b
Now let's see what the above method does:
The number is geq than 1, hence, let's add the last bit of the number to the output
6%2 === 0 //output [0]
the number we're working with after dividing the number by two, which is essentially just bit-shifting the whole thing to the right is now 11b (from the original 110b). 11b === 3, as you'd expect.
You can alternatively think of number % 2 as a bit-wise AND operation (number & 1):
110
& 1
-----
0
The rest of the loop simply carries the same operation out as long as needed: find the last bit of the current state, add it to the output, shift the current state.

From character to binary (adding the left 0)

You can convert from char to binary in JS using this code:
var txt = "H";
bits = txt.charCodeAt(0).toString(2); //bits=1001000
The result is mathematically correct but no literally, i mean, there is a missing 0 to the left, that again, is correct, but I wonder if there is a way to make it consider the left zeros.

You need a byte? Try this:
var txt = "H",
bits = txt.charCodeAt(0).toString(2),
aByte = new Array(9 - bits.length).join('0') + bits;
The snippet creates a new array with length of missing bits + 1, then it converts the newly created array to a string with amount of zeroes needed. 9 is the wanted "byte length" + 1.
However, this is relatively slow method, if you're having a time-critical task, I'd suggest you to use while or for loop instead.

charCodeAt() returns the code of a character, which is a general number.
A number itself does not have any kind of preffered alignment.
By convention numbers are printed without any leading zeros.
In fact, charCodeAt() returns unicode character code,
which in general can take more than 8 bits to store.
Therefore such behaviour is correct.

Try this
var bits = txt.charCodeAt( 0 ).toString( 2 );
var padding = 8 - bits.length;
res = [ ];
res.push( new Array( padding+1 ).join( '0' ) + bits );

We Keep Coding

JavaScript is the programming language of the Web.