Empty strings in array after using the split method with a regexp - javascript

I'm reading through Chapter 5 of Professional JavaScript for Web Developers and came across this example involving the split method and a regular expression. My confusion stems from the output of the variable colors3. Why does the array contain an empty string before and after the commas?
var colorText = “red,blue,green,yellow”;
var colors1 = colorText.split(“,”); //[“red”, “blue”, “green”, “yellow”]
var colors2 = colorText.split(“,”, 2); //[“red”, “blue”]
var colors3 = colorText.split(/[^\,]+/); //[“”, “,”, “,”, “,”, “”]

In the last case, you're defining separator as "any run of characters that aren't commas".
Because nothing precedes the first "separator" ("red") and nothing follows the last "separator" ("yellow"). Split presumes that the first separator is preceded by a value, and that the last separator is followed by a value -- as they are, in your first and second examples, and in any normal case such as a line in a CSV file. The only quasi-exception would be if the first (or last) value in the CSV line were an empty string; in that case, what would you see if there were an empty string followed by a separator?
You would see just a seemingly orphaned separator at the beginning of the line (or a separator at the end). It has to be this way because you have to support empty values.
If you preceded "red" with a comma, you would see an initial empty string in the first array, and an initial comma in the last.
I think you're thrown off by the fact that your last regex redefines "separator" as a set of characters normally regarded as data, and redefines "data" as a character normally defined as a separator.
Accept the arbitrariness. Let it flow through you. They're not commas and letters, they're zeroes and ones.

Related

Best practice for converting string to object in JavaScript

I am working on a small UI for JSON editing which includes some object and string manipulation. I was able to make it work, but one of the fields is bit tricky and I would be grateful for an advice.
Initial string:
'localhost=3000,password=12345,ssl=True,isAdmin=False'
Should be converted to this:
{ app_server: 'localhost:3000', app_password:'12345', app_ssl: 'True', app_isAdmin: 'False' }
I was able to do that by first splitting the string with the ',' which returns an array. And then I would loop through the second array and split by '='. In the last step I would simply use forEach to loop through the array and create an object:
const obj = {}
arr2.forEach((item) => (obj[`app_${item[0]}`] = item[1]));
This approach works, but in case some of the fields, i.e password contains ',' or '=', my code will break. Any idea on how to approach this? Would some advanced regex be a good idea?
Edit: In order to make things simple, it seems that I have caused an opposite effect, so I apologize for that.
The mentioned string is a part of larger JSON file, it is the one of the values. On the high level, I am changing the shape of the object, every value that has the structure I described 'server='something, password=1234, ssl=True', has to be transformed into separate values which will populate the input fields. After that, user modify them or simply download the file (I have separate logic for joining the input fields into the initial shape again)
Observation/Limitation with the design that you have :
As per your comment, none of the special characters is escaped in any way then how we will read this string password=12345,ssl=True ? It will be app_password: 12345,ssl=True or app_password: 12345 ?
why localhost=3000 is converted into app_server: 'localhost:3000' instead of app_localhost: '3000' like other keys ? Is there any special requirement for this ?
You have to design your password field in the way that it will not accept at least , character which is basically used to split the string.
Here you go, If we can correct the above mentioned design observations :
const str = 'localhost=3000,password=123=45,ssl=True,isAdmin=False';
const splittedStr = str.split(',');
const result = {};
splittedStr.forEach(s => {
const [key, ...values] = s.split('=')
const value = values.join('=');
result[`app_${key}`] = value
});
console.log(result);
As you can see in above code snippet, I added password value as 123=45 and it is working properly as per the requirement.
You can use a regular expression that matches key and value in the key=value format, and will capture anything between single quotes when the value happens to start with a single quote:
(\w+)=(?:'((?:\\.|[^'])*)'|([^,]+))
This assumes that:
The key consists of alphanumerical characters and underscores only
There is no white space around the = (any space that follows it, is considered part of the value)
If the value starts with a single quote, it is considered a delimiter for the whole value, which will be terminated by another quote that must be followed by a comma, or must be the last character in the string.
If the value is not quoted, all characters up to the next comma or end of the string will be part of the value.
As you've explained that the first part does not follow the key=value pattern, but is just a value, we need to deal with this exception. I suggest prefixing the string with server=, so that now also that first part has the key=value pattern.
Furthermore, as this input is part of a value that occurs in JSON, it should be parsed as a JSON string (double quoted), in order to decode any escaped characters that might occur in it, like for instance \n (backslash followed by "n").
Since it was not clarified how quotes would be escaped when they occur in a quoted string, it remains undecided how for instance a password (or any text field) can include a quote. The above regex will require that if there is a character after a quote that is not a comma, the quote will be considered part of the value, as opposed to terminating the string. But this is just shifting the problem, as now it is impossible to encode the sequence ', in a quoted field. If ever this point is clarified, the regex can be adapted accordingly.
Implementation in JavaScript:
const regex = /(\w+)=(?:'(.*?)'(?![^,])|([^,]+))/g;
function parse(s) {
return Object.fromEntries(Array.from(JSON.parse('"server=' + s + '"').matchAll(regex),
([_, key, quoted, value]) => ["app_" + key, quoted ?? (isNaN(value) ? value : +value)]
));
}
// demo:
// Password includes here a single quote and a JSON encoded newline character
const s = "localhost:3000, password='12'\\n345', ssl='True', isAdmin='False'";
console.log(parse(s));

regex with replace() for letters only

I have a string that output
20153 Risk
What i am trying to achieve is getting only letters, i have achieved by getting only numbers using regular expression which is
const cf_regex_number = cf_input.replace(/\D/g, '');
this will return only 20153 . But as soon as i tried to only get letters , its returning the while string instead of Risk . i have done my research and the regular expression to get only letters is using **/^[a-zA-Z]*$/**
This is my line of code i tried to get only letters
const cf_regex_character = cf_input.replace(/^[a-zA-Z]*$/,'')
but instead of returning Risk , it is returning 20153 Risk which is the whole line of string .
/[^a-z]+/i
The [ brackets ] signify a range of characters; specifically, a to z in this case.
Actually the i flag means insensitive to case, so that includes A to Z also.
The caret ^ inverts the pattern; it means, anything not in the specified range.
And the + means continue adding characters to the match as long as they are they within that range.
Then stop matching.
In effect this matches everything up to the space in 20153 Risk.
Then you replace this match with the empty string '' and what you've got left is Risk.
const string = '20153 Risk';
const result = string.replace(/[^a-z]+/i, '');
console.log(result);
Your first pattern is locating every non-digit and replacing it with nothing.
On the other hand, your second pattern is locating just the first occurence of a pattern, and the pattern is looking for start of string, followed by letters, followed by end of string. There is no such sequence - if you start from the start of string, there are exactly zero letters, and then you are left very far from the expected end of the string. Even if that worked, you are deleting letters, not non-letters.
This pattern is parallel to your first one (delete any occurence of a non-letter):
const cf_regex_character = cf_input.replace(/[^a-zA-Z]/g,'')
but possibly a better way to go is to extract the desired substring, instead of deleting everything that it is not:
const letters = cf_input.match(/[a-z]+/i)[0];
const numbers = cf_input.match(/\d+/)[0];
(This is if you know there is such a substring; if you are unsure it would be better to code a bit more defensively.)
cf_input="20153 Risk"
const cf_regex_character = cf_input.replace(/\d+\s/,'')
console.log(cf_regex_character)
str="20153 Risk"
reg=/[a-z]+/gi
res=str.match(reg)
console.log(res[0])

Capturing parentheses - /(\d)/ ? or /\s*;\s*/?

I am reading about split and below is a variable looking at the string values. However I do not understand what the symbols are looking for.
According to the page: If separator contains capturing parentheses, matched results are returned in the array.
var myString = 'Hello 1 word. Sentence number 2.';
var splits = myString.split(/(\d)/);
console.log(splits);
// Results
[ "Hello ", "1", " word. Sentence number ", "2", "." ]
My question is, what is happening here? Parentheses "(" or ")" is not part of the string. Why is space or "." separated for some and not the other?
Another one is /\s*;\s*
States it removes semi-colon before and after if there are 0 or more space. Does this mean /\s* mean it looks for a space and remove and ';' in this case is the separator?
var names = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ';
console.log(names);
var re = /\s*;\s*/;
var nameList = names.split(re);
console.log(nameList);
// Results
["Harry Trump", "Fred Barney", "Helen Rigby", "Bill Abel", "Chris Hand "]
If so why doesn't /\s*^\s*/ remobe space before and after ^ symbol if my string looked like this.
var names = 'Harry Trump ^Fred Barney^ Helen Rigby ^ Bill Abel ^Chris Hand ';
console.log(names);
var re = /\s*^\s*/;
var nameList = names.split(re);
console.log(nameList);
I would like to know what the symbols mean and why they are in certain order. Thanks you.
It seems you got your examples from here.
First let's look at this one /(\d)/.
Working inside out, recognize that \d escapes all digits.
Now, from the article, wrapping the parentheses around the escape tells the split method to keep the delimiter (which in this case is any digit) in the returned array. Notice that without the parentheses, the returned array wouldn't have numeric elements (as strings of course). Lastly, it is wrapped in slashes (//) to create a regular expression. Basically this case says: split the string by digits and keep the digits in the returned array.
The second case /\s*;\s* is a little more complicated and will take some understanding of regular expressions. First note that \s escapes a space. In regular expressions, a character c followed by a * says 'look for 0 or more of c, in consecutive order'. So this regular expression matches strings like ' ; ', ';', etc (I added the single quotes to show the spaces). Note that in this case, we don't have parentheses, so the semicolons will be excluded from the returned array.
If you're still stuck, I'd suggest reading about regular expressions and practice writing them. This website is great, just be be weary of the fact that regular expressions on that site may be slightly different than those used in javascript in terms of syntax.
The 1st example below splits the input string at any digit, keeping the delimiter (i.e. the digit) in the final array.
The 2nd example below shows that leaving the parentheses out still splits the array at any digit, but those digit delimiters are not included in the final array.
The 3rd example below splits the input string any time the following pattern is encountered: as many consecutive spaces as possible (including none) immediately followed by a semi-colon immediately followed by as many consecutive spaces as possible (including none).
The 4th example below shows that you can indeed split a similar input string as in the 3rd example but with "^" replacing ";". However, because the "^" by itself means "the start of the string" you have to tell JavaScript to find the actual "^" by putting a backslash (i.e. a special indicator designated for this purpose) right in front of it, i.e. "\^".
const show = (msg) => {console.log(JSON.stringify(msg));};
var myString = 'Hello 1 word. Sentence number 2.';
var splits1 = myString.split(/(\d)/);
show(splits1);
var splits2 = myString.split(/\d/);
show(splits2);
var names1 = 'Harry Trump ;Fred Barney; Helen Rigby ; Bill Abel ;Chris Hand ';
var nameList1 = names1.split(/\s*;\s*/);
show(nameList1);
var names2 = 'Harry Trump ^Fred Barney^ Helen Rigby ^ Bill Abel ^Chris Hand ';
var nameList2 = names2.split(/\s*\^\s*/);
show(nameList2);

javascript split() array contains

While learning JavaScript, I did not get why the output when we print the array returned of the Sting.split() method (with regular expression as an argument) is as explained below.
var colorString = "red,blue,green,yellow";
var colors = colorString.split(/[^\,]+/);
document.write(colors); //this print 7 times comma: ,,,,,,,
However when I print individual element of the array colors, it prints an empty string, three commas and an empty string:
document.write(colors[0]); //empty string
document.write(colors[1]); //,
document.write(colors[2]); //,
document.write(colors[3]); //,
document.write(colors[4]); //empty string
document.write(colors[5]); //undefined
document.write(colors[6]); //undefined
Then, why printing the array directly gives seven commas.
Though I think its correct to have three commas in the second output, I did not get why there is a starting (at index 0) and ending empty string (at index 4).
Please explain I am screwed up here.
/[^\,]+/ splits on one or more characters that are not a comma. Thus, JavaScript will split your string on red, blue etc. The resulting leftovers, then, are the empty string at the beginning (the substring from index 0 to 0), the commas, and the empty string at the end. If you go out of bounds of the array you get undefined (as with any array).
red,blue,green,yellow
xxx xxxx xxxxx xxxxxx <-- x is what is being eaten during split, because it's the delimiter
You just want .split(","), which splits on commas, so that the commas are eaten and you are left with the colors.
Now, when you do document.write(someArray), the array is converted into a string so that it can be displayed. This effectively means someArray.join() is called, which by default puts commas in between. So you get commas joined by commas, resulting in even more commas.
When you print out the array, the different elements of the array are also separated by commas. So your output are these 5 array elements:
[empty string],[comma],[comma],[comma],[empty string]
Amounting to 7 commas. The reason why you get commas and empty strings instead of colors is, that split will split at everything that matches (instead of giving you back everything that matches). So simply don't use regular expressions at all, but just split at ,:
var colors = colorString.split(',');
[^\,] <- this means anything BUT commas.
try
var colors = colorString.split(',');

What does this JS do?

var passwordArray = pwd.replace(/\s+/g, '').split(/\s*/);
I found the above line of code is a rather poorly documented JavaScript file, and I don't know exactly what it does. I think it splits a string into an array of characters, similar to PHP's str_split. Am I correct, and if so, is there a better way of doing this?
it replaces any spaces from the password and then it splits the password into an array of characters.
It is a bit redundant to convert a string into an array of characters,because you can already access the characters of a string through brackets(.. not in older IE :( ) or through the string method "charAt" :
var a = "abcdefg";
alert(a[3]);//"d"
alert(a.charAt(1));//"b"
It does the same as: pwd.split(/\s*/).
pwd.replace(/\s+/g, '').split(/\s*/) removes all whitespace (tab, space, lfcr etc.) and split the remainder (the string that is returned from the replace operation) into an array of characters. The split(/\s*/) portion is strange and obsolete, because there shouldn't be any whitespace (\s) left in pwd.
Hence pwd.split(/\s*/) should be sufficient. So:
'hello cruel\nworld\t how are you?'.split(/\s*/)
// prints in alert: h,e,l,l,o,c,r,u,e,l,w,o,r,l,d,h,o,w,a,r,e,y,o,u,?
as will
'hello cruel\nworld\t how are you?'.replace(/\s+/g, '').split(/\s*/)
The replace portion is removing all white space from the password. The \\s+ atom matches non-zero length white spcace. The 'g' portion matches all instances of the white space and they are all replaced with an empty string.

Categories