How can I "count characters" in a regex?

How can I "count characters" in a regex? - javascript

I'm not sure if count is the right word to use because it doesn't really matter to me how many there are, but let me explain. My data will be formatted like this: (hi,(1,2),hey),(yo,(3,(rawr),4),howdy) and I have no control over how many dimensions there are. And I want to grab the lowest groups ["hi", Array[], "hey"] and ["yo", Array[], "howdy"] So if there was a way to "count" I could count the open parenthesis, and then count the closed ones and when it hits 0, that's when the regex ends. For example:
(hi,(1,2),hey),(yo,(3,(rawr),4),howdy)
1---2---1----0-1---2--3----2--1------0
Now with that being said, I don't believe counting is possible but what I want is a subsitute solution. This is what I have so far /\([^\(]*?\)/ but that only returns the highest level group from each of the low-level groups aka (1,2) and (rawr).

You can use a stack to track the (and).
Array.reduce(
'(hi,(1,2),hey),(yo,(3,(rawr),4),howdy)',
function(x,y){
if(y=='(')
return [x[0]+1, x[1]+(x[0]+1)]
else if(y==')')
return [x[0]-1, x[1]+(x[0]-1)]
else
return [x[0], x[1]+'-']
},
[0,'']
)[1]
Try it in firebug console.

This works for the original use-case and #Barmar's use-case - and it counts the parenthesis, if you really wanted that...
Also, I added arbitrary spaces all over the data strings - just in case (since you have no control over the incoming data)
var results = [];
var dataString = "(hi, (1,2) , hey), ( yo,( 3, ( rawr ), 4) , howdy )";
//var dataString = "(hi, (1 , 2 ), (3, 4), hey), (yo ,(3,(rawr ), 4), howdy)";
var dataSplit = dataString.split(",");
var trimRegex = /^\s+|\s+$/g;
var openParensRegex = /\(/;
var closeParensRegex = /\)/;
var parensRegex = /\(|\)/;
var parensCount = 0;
for (var x = 0, lenx = dataSplit.length; x < lenx; x++){
var cleanString = dataSplit[x].replace(trimRegex, "");
if (openParensRegex.test(cleanString)){ parensCount++; };
if (parensCount < 2){
results.push(cleanString.replace(parensRegex, "").replace(trimRegex, ""));
};
if (closeParensRegex.test(cleanString)){ parensCount--; };
};
console.log(results);
Hope that helps!

The following script might help, it will identify the parenthesis levels:
var string="(hi,(1,2),hey),(yo,(3,(rawr),4),howdy)",i=0;
while (string.indexOf("(")>=0) {
i++;
string=string.replace(/\(([^()]+)\)/g,"|l"+i+"|$1|l"+i+"|");
}
Result:
|l2|hi,|l1|1,2|l1|,hey|l2|,|l3|yo,|l2|3,|l1|rawr|l1|,4|l2|,howdy|l3|
Fiddle: http://jsfiddle.net/GfUZh/

If you want to get the highest levels you can probably do it with regex by finding the intersection points, in this case ),(.
var str = '(hi,(1,2),hey),(yo,(3,(rawr),4),howdy)';
var re = /(\(.+\)),(\(.+\))/;
var results = re.exec(str);
results.shift(); // remove first item which is useless
console.log(results); //=> ["(hi,(1,2),hey)", "(yo,(3,(rawr),4),howdy)"]
Demo: http://jsfiddle.net/elclanrs/fFmfE/

Related

TypeError: Cannot find function "sort" in "object"... but the said object is supposed to be a string according to google docs

TL:DR
According to the google docs, getResponseText() should return a string... but I get a message that claims it is an object when I try to sort it.. huh?
TypeError: Cannot find function sort in object
I was under the impression that a javascript string sort of works like an array, and it seems to behave like one because string[0] returns the first letter of a string..
DETAILS:
here is the sheet I am working
Hello everyone, I have a very unique situation where I need to update dirty strings (courtesy of an undesirable OCR import).
I have created a function that does the job but needs additional functionality.
Currently, the process goes like this:
enter your desired string
each cell (in your selection) is checked for that string
cells are updating with desired string if the match is over 50% alike
the check works like this:
compare the first letter of desired string (txtT[0])
against the first letter of target cell (valT[0])
compare additional letters [x] up to the length of the longest string
for example:
desired string = "testing"
target cell = "t3st1ng"
the loop goes like this:
create a point system do to math
(total points = length of longest string)
compare t and t ... if matching, add one point (+1 in this case because it matches)
compare e and 3 ... if matching, add one point (+0 in this case because it does not match)
compare s and s ... if matching, add one point (+1 in this case because it matches)
compare t and t ... if matching, add one point (+1 in this case because it matches)
compare i and 1 ... if matching, add one point (+0 in this case because it does not match)
compare n and n ... if matching, add one point (+1 in this case because it matches)
compare g and g ... if matching, add one point (+1 in this case because it matches)
points earned/total points = % of alike
The problem with this system is that if is based on the position of the letters in each string.
This causes problems when comparing strings like "testing" and "t est ing"
I tried to update it so that the first thing it does is SORT the string alphabetically, ignoring all special characters and non alphabetical characters.
That's when I came across an error:
TypeError: Cannot find function sort in object testing.
This does not make sense because my desired string is a string. See code where it says "this is where i get my error":
According to the google docs, getResponseText() should return a string... but I cannot call the sort method on the string.. which makes no sense!
function sandboxFunction() {
try {
var ui = SpreadsheetApp.getUi();
var ss = SpreadsheetApp.getActiveSpreadsheet();
var as = ss.getActiveSheet();
var ar = as.getActiveRange();
var sv = ui.prompt('enter desired string');
var txt = sv.getResponseText();
var txtT = txt.trim();
txtT = txtT.replace(/ /g, ''); //this is the trimmed comparison string
txtT = txtT.sort(); //***this is where I get my error***
ui.alert(txtT);
var vals = ar.getValues();
for (var r = 0; r < vals.length; r++) {
var row = vals[r];
for (var c = 0; c < row.length; c++) {
var val = row[c];
var valT = val.trim();
valT = valT.replace(/ /g, ''); // this is the trimmed comparison cell
ui.alert(valT);
//this is where we test the two
//test length
var tl = txtT.length;
var vl = valT.length;
if (vl < tl) {
ui.alert("different lengths.. applying fix");
for (vl; vl < tl; vl++) {
valT = valT.concat("x");
ui.alert(valT);
}
}
else if (tl < vl) {
ui.alert("different lengths.. applying fix");
for (tl; tl < vl; tl++) {
txtT = txtT.concat("x");
ui.alert(txtT);
}
}
if (valT.toUpperCase() == txtT.toUpperCase()) {
ui.alert("your strings match");
}
else {
var total = txtT.length;
var pts = 0;
for (var x = 0; x < total; x++) {
if (valT[x] == txtT[x]) {
pts++;
}
}
if (pts / total >= 0.5) {
ui.alert("at least 50% match, fixing text");
vals[r][c] = txt;
}
}
}
}
ar.setValues(vals);
}
catch (err) {
ui.alert(err);
}
}

You can't sort a string in that way, sort is a method of arrays.
You can convert your string to an array, later you can sort
var txtT = "This is a string".trim();
txtT = txtT.replace(/ /g, ''); //this is the trimmed comparison string
var txtArray = txtT.split(''); // Convert to array
var txtSorted = txtArray.sort(); // Use sort method
console.log(txtSorted);
See sort() docs

Mask a portion of a String using RegExp

I'm trying to mask a portion of a string using JavaScript.
e.g. Mask second and third segment of credit-card number like this using regex:
4567 6365 7987 3783 → 4567 **** **** 3783
3457 732837 82372 → 3457 ****** 82372
I just want to keep the first 4 numbers and the last 5 characters.
This is my first attempt: /(?!^.*)[^a-zA-Z\s](?=.{5})/g
https://regex101.com/r/ZBi54c/2

You can try this:
var cardnumber = '4567 6365 7987 3783';
var first4 = cardnumber.substring(0, 4);
var last5 = cardnumber.substring(cardnumber.length - 5);
mask = cardnumber.substring(4, cardnumber.length - 5).replace(/\d/g,"*");
console.log(first4 + mask + last5);

You could slice the first four digits and apply a replacement for the rest.
console.log(
['4567 6365 7987 3783', '3457 732837 82372'].map(
s => s.slice(0, 4) + s.slice(4).replace(/\d(?=.* )/g, '*')
)
);

The answer apparently satisfies the OP. Here is another solution using only Regexes:
function starry(match, gr1, gr2, gr3) {
var stars = gr2.replace(/\d/g, '*');
return gr1 + " " + stars + " " + gr3;
}
function ccStarry(str) {
var rex = /(\d{4})\s(\d{4}\s\d{4}|\d{6})\s(\d{4}|\d{5})/;
if (rex.test(str))
return str.replace(rex, starry);
else return "";
}
var s1 = "4567 6365 7987 3783";
var s2 = "3457 732837 82372";
var s3 = "dfdfdf";
console.log(ccStarry(s1));
console.log(ccStarry(s2));
console.log(ccStarry(s3));
This ensures that the pattern matches before trying any replacements. For example, in the third test case, it returns an empty string. The pattern can be updated to match other credit card patterns besides the ones given in the question.

I would like to elaborate more on the answer from #Nina Scholz, I use .slice() in the following sample code for masking the variable in 2 condition.
Just a simple variable var n = '12345567890'
Array object
// Single number
var n = '601115558888';
var singleNumber = n.slice(0, 4) + n.slice(4, n.length -4).replace(/\d/g,'*') + n.slice(n.length -4);
console.log(singleNumber);
// array of object
var obj = [{
contacts_name: 'Jason',
contacts_num : '651231239991'
},
{
contacts_name: 'King',
contacts_num : '60101233321'
}];
// Mask for the middle number, showing the first4 number and last4 number
// and replace the rest number with *
var num = obj.map((element, index) =>
element.contacts_num.slice(0,4)
+ element.contacts_num.slice(4, element.contacts_num.length-4).replace(/\d/g, '*')
+ element.contacts_num.slice(element.contacts_num.length -4)
);
console.log(num);

If it's JavaScript doing the regex masking, you've already failed because JS should never need to know the original card number, except when you've just received it from the user and are sending it to the server for the first time, in which case you shouldn't be masking it anyway so the user can check for typos.
I can't really help you there, you've already failed in the worst way.
Server-side, if the number is already broken into spaces*, then one option is: (in PHP but the same idea applies to all)
$parts = explode(" ",$fullnumber);
$first = array_shift($parts);
$last = array_pop($parts);
$middle = implode(" ",$parts);
$mask = preg_replace("/\d/","*",$middle);
$result = "$first $mask $last";
* it shouldn't be

GET item from brackets in array javascript

I need some help. Here I have a string.
n[0] = '3(10)';
The task is to get only 10 from brackets. How to do it in javascript?

You can solve this with Regex :
This will do :
var a= '3(10)'.match(/\((.*?)\)/)
alert(a[1]) ;//10
The captured group will appear in the second index of the array (1)
Regarding your other comment/question :
I have a[0] = '3(10,5) 7(9,4)'; 10 and 9 are chances the task is to
get the number (3 or 7) with a bigger chance (10)
var finalNumber=-1;
var finalChance=-1;
var a = '3(10,5) 7(9,4)';
var m=a.match(/(\d+?)\((\d+?)\,/g);
for (var i=0;i<m.length;i++)
{
var number=m[i].match(/(\d+)\(/)[1]
var chance=m[i].match(/\((\d+)\,/)[1]
if (+chance>+finalChance)
{
finalChance=chance;
finalNumber=number;
}
}
console.log(finalNumber)
Jsbin

Use split() function to split your string with brackets two times :
var first_split = n[0].split(')')[0]; //first_split will return "3(10"
var result = first_split.split('(')[1]; //second split will return "10";
//To reduce the code you can do it in 1 line like this
var result = n[0].split(')')[0].split('(')[1]; // result = "10"

Parse semi-structured values

it's my first question here. I tried to find an answer but couldn't, honestly, figure out which terms should I use, so sorry if it has been asked before.
Here it goes:
I have thousands of records in a .txt file, in this format:
(1, 3, 2, 1, 'John (Finances)'),
(2, 7, 2, 1, 'Mary Jane'),
(3, 7, 3, 2, 'Gerald (Janitor), Broflowski'),
... and so on. The first value is the PK, the other 3 are Foreign Keys, the 5th is a string.
I need to parse them as JSON (or something) in Javascript, but I'm having troubles because some strings have parentheses+comma (on 3rd record, "Janitor", e.g.), so I can't use substring... maybe trimming the right part, but I was wondering if there is some smarter way to parse it.
Any help would be really appreciated.
Thanks!

You can't (read probably shouldn't) use a regular expression for this. What if the parentheses contain another pair or one is mismatched?
The good news is that you can easily construct a tokenizer/parser for this.
The idea is to keep track of your current state and act accordingly.
Here is a sketch for a parser I've just written here, the point is to show you the general idea. Let me know if you have any conceptual questions about it.
It works demo here but I beg you not to use it in production before understanding and patching it.
How it works
So, how do we build a parser:
var State = { // remember which state the parser is at.
BeforeRecord:0, // at the (
DuringInts:1, // at one of the integers
DuringString:2, // reading the name string
AfterRecord:3 // after the )
};
We'll need to keep track of the output, and the current working object since we'll parse these one at a time.
var records = []; // to contain the results
var state = State.BeforeRecord;
Now, we iterate the string, keep progressing in it and read the next character
for(var i = 0;i < input.length; i++){
if(state === State.BeforeRecord){
// handle logic when in (
}
...
if(state === State.AfterRecord){
// handle that state
}
}
Now, all that's left is to consume it into the object at each state:
If it's at ( we start parsing and skip any whitespaces
Read all the integers and ditch the ,
After four integers, read the string from ' to the next ' reaching the end of it
After the string, read until the ) , store the object, and start the cycle again.
The implementation is not very difficult too.
The parser
var State = { // keep track of the state
BeforeRecord:0,
DuringInts:1,
DuringString:2,
AfterRecord:3
};
var records = []; // to contain the results
var state = State.BeforeRecord;
var input = " (1, 3, 2, 1, 'John (Finances)'), (2, 7, 2, 1, 'Mary Jane'), (3, 7, 3, 2, 'Gerald (Janitor), Broflowski')," // sample input
var workingRecord = {}; // what we're reading into.
for(var i = 0;i < input.length; i++){
var token = input[i]; // read the current input
if(state === State.BeforeRecord){ // before reading a record
if(token === ' ') continue; // ignore whitespaces between records
if(token === '('){ state = State.DuringInts; continue; }
throw new Error("Expected ( before new record");
}
if(state === State.DuringInts){
if(token === ' ') continue; // ignore whitespace
for(var j = 0; j < 4; j++){
if(token === ' ') {token = input[++i]; j--; continue;} // ignore whitespace
var curNum = '';
while(token != ","){
if(!/[0-9]/.test(token)) throw new Error("Expected number, got " + token);
curNum += token;
token = input[++i]; // get the next token
}
workingRecord[j] = Number(curNum); // set the data on the record
token = input[++i]; // remove the comma
}
state = State.DuringString;
continue; // progress the loop
}
if(state === State.DuringString){
if(token === ' ') continue; // skip whitespace
if(token === "'"){
var str = "";
token = input[++i];
var lenGuard = 1000;
while(token !== "'"){
str+=token;
if(lenGuard-- === 0) throw new Error("Error, string length bounded by 1000");
token = input[++i];
}
workingRecord.str = str;
token = input[++i]; // remove )
state = State.AfterRecord;
continue;
}
}
if(state === State.AfterRecord){
if(token === ' ') continue; // ignore whitespace
if(token === ',') { // got the "," between records
state = State.BeforeRecord;
records.push(workingRecord);
workingRecord = {}; // new record;
continue;
}
throw new Error("Invalid token found " + token);
}
}
console.log(records); // logs [Object, Object, Object]
// each object has four numbers and a string, for example
// records[0][0] is 1, records[0][1] is 3 and so on,
// records[0].str is "John (Finances)"

I echo Ben's sentiments about regular expressions usually being bad for this, and I completely agree with him that tokenizers are the best tool here.
However, given a few caveats, you can use a regular expression here. This is because any ambiguities in your (, ), , and ' can be attributed (AFAIK) to your final column; as all of the other columns will always be integers.
So, given:
The input is perfectly formed (with no unexpected (, ), , or ').
Each record is on a new line, per your edit
The only new lines in your input will be to break to the next record
... the following should work (Note "new lines" here are \n. If they're \r\n, change them accordingly):
var input = /* Your input */;
var output = input.split(/\n/g).map(function (cols) {
cols = cols.match(/^\((\d+), (\d+), (\d+), (\d+), '(.*)'\)/).slice(1);
return cols.slice(0, 4).map(Number).concat(cols[4]);
});
The code splits on new lines, then goes through row by row and splits into cells using a regular expression, which greedily attributes as much as it can to the final cell. It then turns the first 4 elements into integers, and sticks the 5th element (the string) onto the end.
This gives you an array of records, where each record is itself an array. The first 4 elements are your PK's (as integers) and your 5th element is the string.
For example, given your input, use output[0][4] to get "Gerald (Janitor), Broflowski", and output[1][0] to get the first PK 2 for the second record (don't forget JavaScript arrays are zero-indexed).
You can see it working here: http://jsfiddle.net/56ThR/

Another option would be to convert it into something that looks like an Array and eval it. I know it is not recommended to use eval, but it's a cool solution :)
var lines = input.split("\n");
var output = [];
for(var v in lines){
// Remove opening (
lines[v] = lines[v].slice(1);
// Remove closing ) and what is after
lines[v] = lines[v].slice(0, lines[v].lastIndexOf(')'));
output[v] = eval("[" + lines[v] + "]");
}
So, the eval parameter would look like: [1, 3, 2, 1, 'John (Finances)'], which is indeed an Array.
Demo: http://jsfiddle.net/56ThR/3/
And, it can also be written shorter like this:
var lines = input.split("\n");
var output = lines.map( function(el) {
return eval("[" + el.slice(1).slice(0, el.lastIndexOf(')') - 1) + "]");
});
Demo: http://jsfiddle.net/56ThR/4/

You can always do it "manually" :)
var lines = input.split("\n");
var output = [];
for(var v in lines){
output[v] = [];
// Remove opening (
lines[v] = lines[v].slice(1);
// Get integers
for(var i = 0; i < 4; ++i){
var pos = lines[v].indexOf(',');
output[v][i] = parseInt(lines[v].slice(0, pos));
lines[v] = lines[v].slice(pos+1);
}
// Get string betwen apostrophes
lines[v] = lines[v].slice(lines[v].indexOf("'") + 1);
output[v][4] = lines[v].slice(0, lines[v].indexOf("'"));
}
Demo: http://jsfiddle.net/56ThR/2/

What you have here is basically a csv (comma separated value) file which you wish to parse.
The easiest way would be to use an wxternal library that will take care of most of the issues you have
Example: jquery csv library is a good one. https://code.google.com/p/jquery-csv/

Javascript for Variations with Repetition (combinatorics) of missing string characters

My question is similar to THIS question that hasn't been answered yet.
How can I make my code (or any javascript code that might be suggested?) find all possible solutions of a known string length with multiple missing characters in variation with repetition?
I'm trying to take a string of known character lengths and find missing characters from that string. For example:
var missing_string = "ov!rf!ow"; //where "!" are the missing characters
I'm hoping to run a script with a specific array such as:
var r = new Array("A","B","C","D","E","F","G","H","I","J","K",
"L","M","N","O","P","Q","R","S","T","U","V",
"W","X","Y","Z",0,1,2,3,4,5,6,7,8,9);
To find all the possible variations with repetition of those missing characters to get a result of:
ovArfAow
ovBrfAow
ovCrfAow
...
ovBrfBow
ovBrfCow
...
etc //ignore the case insensitive, just to emphasize the example
and of course, eventually find ovErfLow within all the variations with repetition.
I've been able to make it work with 1 (single) missing character. However, when I put 2 missing characters with my code it obviously repeats the same array character for both missing characters which is GREAT for repition but I also need to find without repetition as well and might need to have 3-4 missing characters as well which may or may not be repeated. Here's what I have so far:
var r = new Array("A","B","C","D","E","F","G","H","I","J","K",
"L","M","N","O","P","Q","R","S","T","U","V",
"W","X","Y","Z",0,1,2,3,4,5,6,7,8,9);
var missing_string = "he!!ow!r!d";
var bt_lng = missing_string.length;
var bruted="";
for (z=0; z<r.length; z++) {
for(var x=0;x<bt_lng;x++){
for(var y=0;y<r.length;y++){
if(missing_string.charAt(x) == "!"){
bruted += r[z];
break;
}
else if(missing_string.charAt(x) == r[y]){
bruted += r[y];
}
}
}
console.log("br: " + bruted);
bruted="";
}
This works GREAT with just ONE "!":
helloworAd
helloworBd
helloworCd
...
helloworLd
However with 2 or more "!", I get:
heAAowArAd
heBBowBrBd
heCCowCrCd
...
heLLowLrLd
which is good for the repetition part but I also need to test all possible array M characters in each missing character spot.

Maybe the following function in pure javascript is a possible solution for you. It uses Array.prototype.reduce to create the cartesian product c of the given alphabet x, whereby its power n depends on the count of the exclamation marks in your word w.
function combinations(w) {
var x = new Array(
"A","B","C","D","E","F","G","H","I","J","K",
"L","M","N","O","P","Q","R","S","T","U","V",
"W","X","Y","Z",0,1,2,3,4,5,6,7,8,9
),
n = w.match(/\!/g).length,
x_n = new Array(),
r = new Array(),
c = null;
for (var i = n; i > 0; i--) {
x_n.push(x);
}
c = x_n.reduce(function(a, b) {
var c = [];
a.forEach(function(a) {
b.forEach(function(b) {
c.push(a.concat([b]));
});
});
return c;
}, [[]]);
for (var i = 0, j = 0; i < c.length; i++, j = 0) {
r.push(w.replace(/\!/g, function(s, k) {
return c[i][j++];
}));
}
return r;
}
Call it like this console.log(combinations("ov!rf!ow")) in your browser console.

We Keep Coding

JavaScript is the programming language of the Web.

How can I "count characters" in a regex? - javascript

You can use a stack to track the (and). Array.reduce( '(hi,(1,2),hey),(yo,(3,(rawr),4),howdy)', function(x,y){ if(y=='(') return [x[0]+1, x[1]+(x[0]+1)] else if(y==')') return [x[0]-1, x[1]+(x[0]-1)] else return [x[0], x[1]+'-'] }, [0,''] )[1] Try it in firebug console.

Related

TypeError: Cannot find function "sort" in "object"... but the said object is supposed to be a string according to google docs

Mask a portion of a String using RegExp

GET item from brackets in array javascript

Parse semi-structured values

Javascript for Variations with Repetition (combinatorics) of missing string characters

Categories

Resources