Parse semi-structured values

Parse semi-structured values - javascript

it's my first question here. I tried to find an answer but couldn't, honestly, figure out which terms should I use, so sorry if it has been asked before.
Here it goes:
I have thousands of records in a .txt file, in this format:
(1, 3, 2, 1, 'John (Finances)'),
(2, 7, 2, 1, 'Mary Jane'),
(3, 7, 3, 2, 'Gerald (Janitor), Broflowski'),
... and so on. The first value is the PK, the other 3 are Foreign Keys, the 5th is a string.
I need to parse them as JSON (or something) in Javascript, but I'm having troubles because some strings have parentheses+comma (on 3rd record, "Janitor", e.g.), so I can't use substring... maybe trimming the right part, but I was wondering if there is some smarter way to parse it.
Any help would be really appreciated.
Thanks!

You can't (read probably shouldn't) use a regular expression for this. What if the parentheses contain another pair or one is mismatched?
The good news is that you can easily construct a tokenizer/parser for this.
The idea is to keep track of your current state and act accordingly.
Here is a sketch for a parser I've just written here, the point is to show you the general idea. Let me know if you have any conceptual questions about it.
It works demo here but I beg you not to use it in production before understanding and patching it.
How it works
So, how do we build a parser:
var State = { // remember which state the parser is at.
BeforeRecord:0, // at the (
DuringInts:1, // at one of the integers
DuringString:2, // reading the name string
AfterRecord:3 // after the )
};
We'll need to keep track of the output, and the current working object since we'll parse these one at a time.
var records = []; // to contain the results
var state = State.BeforeRecord;
Now, we iterate the string, keep progressing in it and read the next character
for(var i = 0;i < input.length; i++){
if(state === State.BeforeRecord){
// handle logic when in (
}
...
if(state === State.AfterRecord){
// handle that state
}
}
Now, all that's left is to consume it into the object at each state:
If it's at ( we start parsing and skip any whitespaces
Read all the integers and ditch the ,
After four integers, read the string from ' to the next ' reaching the end of it
After the string, read until the ) , store the object, and start the cycle again.
The implementation is not very difficult too.
The parser
var State = { // keep track of the state
BeforeRecord:0,
DuringInts:1,
DuringString:2,
AfterRecord:3
};
var records = []; // to contain the results
var state = State.BeforeRecord;
var input = " (1, 3, 2, 1, 'John (Finances)'), (2, 7, 2, 1, 'Mary Jane'), (3, 7, 3, 2, 'Gerald (Janitor), Broflowski')," // sample input
var workingRecord = {}; // what we're reading into.
for(var i = 0;i < input.length; i++){
var token = input[i]; // read the current input
if(state === State.BeforeRecord){ // before reading a record
if(token === ' ') continue; // ignore whitespaces between records
if(token === '('){ state = State.DuringInts; continue; }
throw new Error("Expected ( before new record");
}
if(state === State.DuringInts){
if(token === ' ') continue; // ignore whitespace
for(var j = 0; j < 4; j++){
if(token === ' ') {token = input[++i]; j--; continue;} // ignore whitespace
var curNum = '';
while(token != ","){
if(!/[0-9]/.test(token)) throw new Error("Expected number, got " + token);
curNum += token;
token = input[++i]; // get the next token
}
workingRecord[j] = Number(curNum); // set the data on the record
token = input[++i]; // remove the comma
}
state = State.DuringString;
continue; // progress the loop
}
if(state === State.DuringString){
if(token === ' ') continue; // skip whitespace
if(token === "'"){
var str = "";
token = input[++i];
var lenGuard = 1000;
while(token !== "'"){
str+=token;
if(lenGuard-- === 0) throw new Error("Error, string length bounded by 1000");
token = input[++i];
}
workingRecord.str = str;
token = input[++i]; // remove )
state = State.AfterRecord;
continue;
}
}
if(state === State.AfterRecord){
if(token === ' ') continue; // ignore whitespace
if(token === ',') { // got the "," between records
state = State.BeforeRecord;
records.push(workingRecord);
workingRecord = {}; // new record;
continue;
}
throw new Error("Invalid token found " + token);
}
}
console.log(records); // logs [Object, Object, Object]
// each object has four numbers and a string, for example
// records[0][0] is 1, records[0][1] is 3 and so on,
// records[0].str is "John (Finances)"

I echo Ben's sentiments about regular expressions usually being bad for this, and I completely agree with him that tokenizers are the best tool here.
However, given a few caveats, you can use a regular expression here. This is because any ambiguities in your (, ), , and ' can be attributed (AFAIK) to your final column; as all of the other columns will always be integers.
So, given:
The input is perfectly formed (with no unexpected (, ), , or ').
Each record is on a new line, per your edit
The only new lines in your input will be to break to the next record
... the following should work (Note "new lines" here are \n. If they're \r\n, change them accordingly):
var input = /* Your input */;
var output = input.split(/\n/g).map(function (cols) {
cols = cols.match(/^\((\d+), (\d+), (\d+), (\d+), '(.*)'\)/).slice(1);
return cols.slice(0, 4).map(Number).concat(cols[4]);
});
The code splits on new lines, then goes through row by row and splits into cells using a regular expression, which greedily attributes as much as it can to the final cell. It then turns the first 4 elements into integers, and sticks the 5th element (the string) onto the end.
This gives you an array of records, where each record is itself an array. The first 4 elements are your PK's (as integers) and your 5th element is the string.
For example, given your input, use output[0][4] to get "Gerald (Janitor), Broflowski", and output[1][0] to get the first PK 2 for the second record (don't forget JavaScript arrays are zero-indexed).
You can see it working here: http://jsfiddle.net/56ThR/

Another option would be to convert it into something that looks like an Array and eval it. I know it is not recommended to use eval, but it's a cool solution :)
var lines = input.split("\n");
var output = [];
for(var v in lines){
// Remove opening (
lines[v] = lines[v].slice(1);
// Remove closing ) and what is after
lines[v] = lines[v].slice(0, lines[v].lastIndexOf(')'));
output[v] = eval("[" + lines[v] + "]");
}
So, the eval parameter would look like: [1, 3, 2, 1, 'John (Finances)'], which is indeed an Array.
Demo: http://jsfiddle.net/56ThR/3/
And, it can also be written shorter like this:
var lines = input.split("\n");
var output = lines.map( function(el) {
return eval("[" + el.slice(1).slice(0, el.lastIndexOf(')') - 1) + "]");
});
Demo: http://jsfiddle.net/56ThR/4/

You can always do it "manually" :)
var lines = input.split("\n");
var output = [];
for(var v in lines){
output[v] = [];
// Remove opening (
lines[v] = lines[v].slice(1);
// Get integers
for(var i = 0; i < 4; ++i){
var pos = lines[v].indexOf(',');
output[v][i] = parseInt(lines[v].slice(0, pos));
lines[v] = lines[v].slice(pos+1);
}
// Get string betwen apostrophes
lines[v] = lines[v].slice(lines[v].indexOf("'") + 1);
output[v][4] = lines[v].slice(0, lines[v].indexOf("'"));
}
Demo: http://jsfiddle.net/56ThR/2/

What you have here is basically a csv (comma separated value) file which you wish to parse.
The easiest way would be to use an wxternal library that will take care of most of the issues you have
Example: jquery csv library is a good one. https://code.google.com/p/jquery-csv/

Related

Remove current input if not yyyy/mm/dd format?

I want to remove input data if it is not match with yyyy/mm/dd format ! I tried with following but only remove letter and special character ...
eg.
20144 -> must remove last 4
2014// -> must remove last /
2014/01/123 -> must remove last 3
$("input").on("keyup", function() {
console.log(this.value);
this.value = this.value.replace(/[^(\d{4})\/(\d{1,2})\/(\d{1,2})]/g, '');
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<input type="text">

The strategy is actually quite simple, once you actually break down the logic of what you want. So, you want to coerce user input into a YYYY/MM/DD format. We can do this in a step-by-step manner:
Split the input's value by the / character. You now get an array.
We keep the first three elements of the array, which should correspond to YYYY, MM, and DD respectively. We will ignore whatever fragments that come after, since they aren't part of the valid date. This can be done using .slice(0,3) on the array.
Parse each individual part of the array:
At index of 0, you have the year fragment. Use .substring(0,4) so that it is trimmed to 4 characters max
At index of 1 or 2, you have the month/day fragment respectively. Use .substring(0,2) so that it is trimmed to 2 characters max
Join the resulting array back using .join('/').
If the array contains empty elements, you will end up duplicated // in your string. Simply trim them away using regex, .replace(/\/(\/)+/, '/')
You will notice that in my logic I have not included padding numbers, e.g.
. converting days from 1 to 01. You cannot do this when the user is inputting, because you never know if the user intends to type one or two digits. If you want this, you will have to reparse the input onblur, because that is when you know the user is done with the input.
See proof-of-concept below:
$('input').on('keyup', function() {
var valueParts = this.value.split('/');
if (!valueParts.length) {
return;
}
// Only keep the first 3 elements of array
valueParts = valueParts.slice(0, 3);
// Substring array (keep first 4 characters for year, and first 2 characters for month/day)
var substringCounts = [4, 2, 2];
substringCounts.forEach(function(substringCount, index) {
// If index does not exist in array, skip it
if (!valueParts[index])
return;
valueParts[index] = valueParts[index].substring(0, substringCount);
});
// Join remaining elements
var parsedString = valueParts.join('/');
// Trim extraneous slashes
parsedString = parsedString.replace(/\/(\/)+/, '/');
this.value = parsedString;
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<input type="text">
Note: if you want to pad the numbers, you will have to include the following logic, on top of what has been mentioned above:
// Pad numbers on blur
$('input').on('blur', function() {
var valueParts = this.value.split('/');
if (!valueParts.length) {
return;
}
// Only keep the first 3 elements of array
valueParts = valueParts.slice(0, 3);
// Pad lengths (year to 4 digits, month to 2 digits, day to 2 digits)
var padLengths = [4, 2, 2];
padLengths.forEach(function(padLength, index) {
// If index does not exist in array, skip it
if (!valueParts[index])
return;
valueParts[index] = valueParts[index].padStart(padLength, '0');
});
// Join remaining elements
var parsedString = valueParts.join('/');
// Trim extraneous slashes
parsedString = parsedString.replace(/\/(\/)+/, '/');
this.value = parsedString;
});
With that in mind, if you want to combine the above two logic together, you can abstract the part dealing with "joining remaining elements" and "trim extraneous slashes". I have hidden the code snippet below, since it's very verbose and mostly contains the same logic as mentioned above:
// Helper method: joins array using '/' and trims duplicated joining characters
function joinAndTrimSlashes(valueArray) {
// Join remaining elements
var parsedString = valueArray.join('/');
// Trim extraneous slashes
parsedString = parsedString.replace(/\/(\/)+/, '/');
return parsedString;
}
$('input').on('keyup', function() {
var valueParts = this.value.split('/');
if (!valueParts.length)
return;
// Only keep the first 3 elements of array
valueParts = valueParts.slice(0, 3);
// Substring array (keep first 4 characters for year, and first 2 characters for month/day)
var substringCounts = [4, 2, 2];
substringCounts.forEach(function(substringCount, index) {
// If index does not exist in array, skip it
if (!valueParts[index])
return;
valueParts[index] = valueParts[index].substring(0, substringCount);
});
this.value = joinAndTrimSlashes(valueParts);
});
// Pad numbers on blur
$('input').on('blur', function() {
var valueParts = this.value.split('/');
if (!valueParts.length)
return;
// Only keep the first 3 elements of array
valueParts = valueParts.slice(0, 3);
// Pad lengths (year to 4 digits, month to 2 digits, day to 2 digits)
var padLengths = [4, 2, 2];
padLengths.forEach(function(padLength, index) {
// If index does not exist in array, skip it
if (!valueParts[index])
return;
valueParts[index] = valueParts[index].padStart(padLength, '0');
});
this.value = joinAndTrimSlashes(valueParts);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<input type="text">

Since I do think that pure regex is going to be very bad at solving this you can just do a manual pass like e.g.
var good = '2013/01/01';
var bad1 = '20123/01/02';
var bad2 = '2011//01/03';
var bad3 = '2010/01/034';
var bad4 = '2009//01/045';
var bad5 = '20083//01/223';
var all = [ good, bad1, bad2, bad3, bad4, bad5 ];
function normalizeDate(dateString) {
var currentValue = dateString.replace(/\/{2,}/g,'/'); //remove repeated /
var parts = currentValue.split('/').map(function (value) {
return value.replace(/\D/g, '0');
});
var newParts = [
parts[0] ? parts[0].padEnd(4,'0').substring(0,4) : '2000' ,
parts[1] ? parts[1].padStart(2, '0').substring(0,2) : '01',
parts[2] ? parts[2].padStart(2, '0').substring(0,2) : '01'
];
return newParts.join('/');
}
for (var i = 0;i < all.length;i++) {
console.log(normalizeDate(all[i]));
}

$("input").on("keyup", function() {
var validationRegex = new RegExp(/([12]\d{3}\/(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01]))/);
if(!validationRegex.test(this.value)){
this.value = '';
}
});
This will match only the format YYYY/MM/DD.

Thank all for your answer! I reference your answers and I can figure out to solve my desire .
I believe there have much better solution so answer here if you guy have better one
var format = ["number","number","number","number","slash","number","number","slash","number","number"];
$("input").on("keyup", function() {
if(this.value.length > format.length) {
this.value = this.value.slice(0,format.length);
return;
}
for(var i in this.value) {
if(format[i] == "number") {
if(!this.value[i].match(/[0-9]/)) {
this.value = this.value.slice(0,i);
}
} else {
if(this.value[i] != "/") {
this.value = this.value.slice(0,i);
}
}
}
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<input type="text">

TypeError: Cannot find function "sort" in "object"... but the said object is supposed to be a string according to google docs

TL:DR
According to the google docs, getResponseText() should return a string... but I get a message that claims it is an object when I try to sort it.. huh?
TypeError: Cannot find function sort in object
I was under the impression that a javascript string sort of works like an array, and it seems to behave like one because string[0] returns the first letter of a string..
DETAILS:
here is the sheet I am working
Hello everyone, I have a very unique situation where I need to update dirty strings (courtesy of an undesirable OCR import).
I have created a function that does the job but needs additional functionality.
Currently, the process goes like this:
enter your desired string
each cell (in your selection) is checked for that string
cells are updating with desired string if the match is over 50% alike
the check works like this:
compare the first letter of desired string (txtT[0])
against the first letter of target cell (valT[0])
compare additional letters [x] up to the length of the longest string
for example:
desired string = "testing"
target cell = "t3st1ng"
the loop goes like this:
create a point system do to math
(total points = length of longest string)
compare t and t ... if matching, add one point (+1 in this case because it matches)
compare e and 3 ... if matching, add one point (+0 in this case because it does not match)
compare s and s ... if matching, add one point (+1 in this case because it matches)
compare t and t ... if matching, add one point (+1 in this case because it matches)
compare i and 1 ... if matching, add one point (+0 in this case because it does not match)
compare n and n ... if matching, add one point (+1 in this case because it matches)
compare g and g ... if matching, add one point (+1 in this case because it matches)
points earned/total points = % of alike
The problem with this system is that if is based on the position of the letters in each string.
This causes problems when comparing strings like "testing" and "t est ing"
I tried to update it so that the first thing it does is SORT the string alphabetically, ignoring all special characters and non alphabetical characters.
That's when I came across an error:
TypeError: Cannot find function sort in object testing.
This does not make sense because my desired string is a string. See code where it says "this is where i get my error":
According to the google docs, getResponseText() should return a string... but I cannot call the sort method on the string.. which makes no sense!
function sandboxFunction() {
try {
var ui = SpreadsheetApp.getUi();
var ss = SpreadsheetApp.getActiveSpreadsheet();
var as = ss.getActiveSheet();
var ar = as.getActiveRange();
var sv = ui.prompt('enter desired string');
var txt = sv.getResponseText();
var txtT = txt.trim();
txtT = txtT.replace(/ /g, ''); //this is the trimmed comparison string
txtT = txtT.sort(); //***this is where I get my error***
ui.alert(txtT);
var vals = ar.getValues();
for (var r = 0; r < vals.length; r++) {
var row = vals[r];
for (var c = 0; c < row.length; c++) {
var val = row[c];
var valT = val.trim();
valT = valT.replace(/ /g, ''); // this is the trimmed comparison cell
ui.alert(valT);
//this is where we test the two
//test length
var tl = txtT.length;
var vl = valT.length;
if (vl < tl) {
ui.alert("different lengths.. applying fix");
for (vl; vl < tl; vl++) {
valT = valT.concat("x");
ui.alert(valT);
}
}
else if (tl < vl) {
ui.alert("different lengths.. applying fix");
for (tl; tl < vl; tl++) {
txtT = txtT.concat("x");
ui.alert(txtT);
}
}
if (valT.toUpperCase() == txtT.toUpperCase()) {
ui.alert("your strings match");
}
else {
var total = txtT.length;
var pts = 0;
for (var x = 0; x < total; x++) {
if (valT[x] == txtT[x]) {
pts++;
}
}
if (pts / total >= 0.5) {
ui.alert("at least 50% match, fixing text");
vals[r][c] = txt;
}
}
}
}
ar.setValues(vals);
}
catch (err) {
ui.alert(err);
}
}

You can't sort a string in that way, sort is a method of arrays.
You can convert your string to an array, later you can sort
var txtT = "This is a string".trim();
txtT = txtT.replace(/ /g, ''); //this is the trimmed comparison string
var txtArray = txtT.split(''); // Convert to array
var txtSorted = txtArray.sort(); // Use sort method
console.log(txtSorted);
See sort() docs

Mask a portion of a String using RegExp

I'm trying to mask a portion of a string using JavaScript.
e.g. Mask second and third segment of credit-card number like this using regex:
4567 6365 7987 3783 → 4567 **** **** 3783
3457 732837 82372 → 3457 ****** 82372
I just want to keep the first 4 numbers and the last 5 characters.
This is my first attempt: /(?!^.*)[^a-zA-Z\s](?=.{5})/g
https://regex101.com/r/ZBi54c/2

You can try this:
var cardnumber = '4567 6365 7987 3783';
var first4 = cardnumber.substring(0, 4);
var last5 = cardnumber.substring(cardnumber.length - 5);
mask = cardnumber.substring(4, cardnumber.length - 5).replace(/\d/g,"*");
console.log(first4 + mask + last5);

You could slice the first four digits and apply a replacement for the rest.
console.log(
['4567 6365 7987 3783', '3457 732837 82372'].map(
s => s.slice(0, 4) + s.slice(4).replace(/\d(?=.* )/g, '*')
)
);

The answer apparently satisfies the OP. Here is another solution using only Regexes:
function starry(match, gr1, gr2, gr3) {
var stars = gr2.replace(/\d/g, '*');
return gr1 + " " + stars + " " + gr3;
}
function ccStarry(str) {
var rex = /(\d{4})\s(\d{4}\s\d{4}|\d{6})\s(\d{4}|\d{5})/;
if (rex.test(str))
return str.replace(rex, starry);
else return "";
}
var s1 = "4567 6365 7987 3783";
var s2 = "3457 732837 82372";
var s3 = "dfdfdf";
console.log(ccStarry(s1));
console.log(ccStarry(s2));
console.log(ccStarry(s3));
This ensures that the pattern matches before trying any replacements. For example, in the third test case, it returns an empty string. The pattern can be updated to match other credit card patterns besides the ones given in the question.

I would like to elaborate more on the answer from #Nina Scholz, I use .slice() in the following sample code for masking the variable in 2 condition.
Just a simple variable var n = '12345567890'
Array object
// Single number
var n = '601115558888';
var singleNumber = n.slice(0, 4) + n.slice(4, n.length -4).replace(/\d/g,'*') + n.slice(n.length -4);
console.log(singleNumber);
// array of object
var obj = [{
contacts_name: 'Jason',
contacts_num : '651231239991'
},
{
contacts_name: 'King',
contacts_num : '60101233321'
}];
// Mask for the middle number, showing the first4 number and last4 number
// and replace the rest number with *
var num = obj.map((element, index) =>
element.contacts_num.slice(0,4)
+ element.contacts_num.slice(4, element.contacts_num.length-4).replace(/\d/g, '*')
+ element.contacts_num.slice(element.contacts_num.length -4)
);
console.log(num);

If it's JavaScript doing the regex masking, you've already failed because JS should never need to know the original card number, except when you've just received it from the user and are sending it to the server for the first time, in which case you shouldn't be masking it anyway so the user can check for typos.
I can't really help you there, you've already failed in the worst way.
Server-side, if the number is already broken into spaces*, then one option is: (in PHP but the same idea applies to all)
$parts = explode(" ",$fullnumber);
$first = array_shift($parts);
$last = array_pop($parts);
$middle = implode(" ",$parts);
$mask = preg_replace("/\d/","*",$middle);
$result = "$first $mask $last";
* it shouldn't be

How can I "count characters" in a regex?

I'm not sure if count is the right word to use because it doesn't really matter to me how many there are, but let me explain. My data will be formatted like this: (hi,(1,2),hey),(yo,(3,(rawr),4),howdy) and I have no control over how many dimensions there are. And I want to grab the lowest groups ["hi", Array[], "hey"] and ["yo", Array[], "howdy"] So if there was a way to "count" I could count the open parenthesis, and then count the closed ones and when it hits 0, that's when the regex ends. For example:
(hi,(1,2),hey),(yo,(3,(rawr),4),howdy)
1---2---1----0-1---2--3----2--1------0
Now with that being said, I don't believe counting is possible but what I want is a subsitute solution. This is what I have so far /\([^\(]*?\)/ but that only returns the highest level group from each of the low-level groups aka (1,2) and (rawr).

You can use a stack to track the (and).
Array.reduce(
'(hi,(1,2),hey),(yo,(3,(rawr),4),howdy)',
function(x,y){
if(y=='(')
return [x[0]+1, x[1]+(x[0]+1)]
else if(y==')')
return [x[0]-1, x[1]+(x[0]-1)]
else
return [x[0], x[1]+'-']
},
[0,'']
)[1]
Try it in firebug console.

This works for the original use-case and #Barmar's use-case - and it counts the parenthesis, if you really wanted that...
Also, I added arbitrary spaces all over the data strings - just in case (since you have no control over the incoming data)
var results = [];
var dataString = "(hi, (1,2) , hey), ( yo,( 3, ( rawr ), 4) , howdy )";
//var dataString = "(hi, (1 , 2 ), (3, 4), hey), (yo ,(3,(rawr ), 4), howdy)";
var dataSplit = dataString.split(",");
var trimRegex = /^\s+|\s+$/g;
var openParensRegex = /\(/;
var closeParensRegex = /\)/;
var parensRegex = /\(|\)/;
var parensCount = 0;
for (var x = 0, lenx = dataSplit.length; x < lenx; x++){
var cleanString = dataSplit[x].replace(trimRegex, "");
if (openParensRegex.test(cleanString)){ parensCount++; };
if (parensCount < 2){
results.push(cleanString.replace(parensRegex, "").replace(trimRegex, ""));
};
if (closeParensRegex.test(cleanString)){ parensCount--; };
};
console.log(results);
Hope that helps!

The following script might help, it will identify the parenthesis levels:
var string="(hi,(1,2),hey),(yo,(3,(rawr),4),howdy)",i=0;
while (string.indexOf("(")>=0) {
i++;
string=string.replace(/\(([^()]+)\)/g,"|l"+i+"|$1|l"+i+"|");
}
Result:
|l2|hi,|l1|1,2|l1|,hey|l2|,|l3|yo,|l2|3,|l1|rawr|l1|,4|l2|,howdy|l3|
Fiddle: http://jsfiddle.net/GfUZh/

If you want to get the highest levels you can probably do it with regex by finding the intersection points, in this case ),(.
var str = '(hi,(1,2),hey),(yo,(3,(rawr),4),howdy)';
var re = /(\(.+\)),(\(.+\))/;
var results = re.exec(str);
results.shift(); // remove first item which is useless
console.log(results); //=> ["(hi,(1,2),hey)", "(yo,(3,(rawr),4),howdy)"]
Demo: http://jsfiddle.net/elclanrs/fFmfE/

JavaScript filtering an array of <input> values by character count

This should be a quickie, but I'm scratching my head as to why this bit of JavaScript isn't working for me. The goal is to take the value of an input box (string of words separated by spaces), list these words as items in an array, and remove those which are fewer than 3 characters:
var typed = $('input').val();
var query = typed.split(" ");
var i=0;
for (i=0; i<query.length; i++) {
if (query[i].length < 3) {
query.splice(i,1);
}
}
Have this running onkeyup for the input box and it seems to work, but only about 50% of the time (strings of 1 and 2 characters somehow find their way into the array on occasion). Any suggestions would be hugely appreciated.

The problem is that you are iterating while removing the elements. Consider this array:
["he", "l", "lo world"]
Initially your loop starts at index 0 and removes "he" from the array. Now the new array is
["l", "lo world"]
In the next iteration i will be 1, and you will check "lo world"'s length, thus ignoring the "l" string altogether.
Use the filter method in Array to remove the unwanted elements.
var biggerWords = query.filter(function(word) {
return word.length >= 3;
});

Besides the iterating problem, you may also see unexpected entries if you type multiple spaces
try
var query = typed.split(/\s+/);
This way it will split on any number of spaces, instead of each individual one

The problem is that you're slicing the array while counting forward. Think about it...if you take an index point out of the array, thereby shortening it by one, incrementing i and moving on to the next one actually moves one further than you want, completely missing the next index. Increment i--, start at query.length-1, and make the condition that i>=0. For an example of this in action, check it out here:
http://jsfiddle.net/kcwjs/
CSS:
input {
width:300px;
}
HTML:
<input id="textbox" type="text" />
<div id="message"></div>
Javascript:
$(document).ready(function() {
$('#textbox').keyup(checkStrings);
});
function checkStrings(e) {
var typed = $('#textbox').val();
if (typed == "") return false;
var query = typed.split(" ");
var querylen = query.length;
var acceptedWords = '';
var badWords = '';
for (var i = querylen-1; i >= 0; i--) {
if (query[i].length < 3) {
badWords += query[i] + " ";
} else {
acceptedWords += query.splice(i,1) + " ";
}
}
$('#message').html("<div>Bad words are: " + badWords + "</div>" +
"<div>Good words are: " + acceptedWords + "</div>");
}

Try this code, it get's rid of any 3 character words, as well as making sure no empty array elements are created.
typed.replace(/(\b)\w{1,3}\b/g,"$1");
var query = typed.split(/\s+/);

hey i think you should use a new array for the result. since you are removing the element in array. the length is changed. here is my solution
var typed = "dacda cdac cd k foorar";
var query = typed.split(" ");
var i=0;
var result = [];
for (i=0; i<query.length; i++) {
if (query[i].length >= 3) {
result.push(query[i]);
}
}

We Keep Coding

JavaScript is the programming language of the Web.

Parse semi-structured values - javascript

What you have here is basically a csv (comma separated value) file which you wish to parse. The easiest way would be to use an wxternal library that will take care of most of the issues you have Example: jquery csv library is a good one. https://code.google.com/p/jquery-csv/

Related

Remove current input if not yyyy/mm/dd format?

TypeError: Cannot find function "sort" in "object"... but the said object is supposed to be a string according to google docs

Mask a portion of a String using RegExp

How can I "count characters" in a regex?

JavaScript filtering an array of <input> values by character count

Categories

Resources