Split using regex creates 2 empty elements in array - javascript

I need to split a string into 2 pieces using regex, so I used the following code:
var str = "This is a test";
var list = str.split(/(test)/);
Required output:
list = ["This is a ", "test"]
Instead of 2 this gives me 3 elements in the array (last one is empty). I understand that regex finds nothing after the 2nd match so it adds an empty (3rd) element. Is there any way that I can modify my code so I get exactly 2 elements thus avoiding the last empty element?
Note: the above code is a simplified version for which we can use other options besides regex but I would have to use regex.

var str = "This is a test";
var list = str.split(/(test)/,2);
list: ["This is a ", "test"]

Perhaps overkill if you can guarantee that you are only expecting an array of length two but given the nature of the question a more robust solution may be to use Array.filter to remove all empty strings from the array - including entries in the middle of the array which would arise from several delimiters appearing next to each other in your input string.
var list = str.split(/(test)/).filter(
function(v){ return v!=null && v!='' }
);

You can try with checking if last element is empty or not:
var last = list.pop();
last.length || list.push(last);
or:
list[list.length-1].length || list.pop();
or even shorter:
list.slice(-1)[0].length || list.pop();
To handle first empty element (test was there as #Kobi suggested) use:
list[0].length || list.shift();

This is giving me the results you want:
var str = "This is a test";
var list = str.split(/(?=test)/g);
(?= is a lookahead, it doesn't capture the word test so that stays in the array after splitting.

Related

Why is the first match empty when using a split regex? [duplicate]

I don't understand this behaviour:
var string = 'a,b,c,d,e:10.';
var array = string.split ('.');
I expect this:
console.log (array); // ['a,b,c,d,e:10']
console.log (array.length); // 1
but I get this:
console.log (array); // ['a,b,c,d,e:10', '']
console.log (array.length); // 2
Why two elements are returned instead of one? How does split work?
Is there another way to do this?
You could add a filter to exclude the empty string.
var string = 'a,b,c,d,e:10.';
var array = string.split ('.').filter(function(el) {return el.length != 0});
A slightly easier version of #xdazz version for excluding empty strings (using ES6 arrow function):
var array = string.split('.').filter(x => x);
This is the correct and expected behavior. Given that you've included the separator in the string, the split function (simplified) takes the part to the left of the separator ("a,b,c,d,e:10") as the first element and the part to the rest of the separator (an empty string) as the second element.
If you're really curious about how split() works, you can check out pages 148 and 149 of the ECMA spec (ECMA 262) at http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf
Use String.split() method with Array.filter() method.
var string = 'a,b,c,d,e:10.';
var array = string.split ('.').filter(item => item);
console.log(array); // [a,b,c,d,e:10]
console.log (array.length); // 1
https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/String/split
trim the trailing period first
'a,b,c,d,e:10.'.replace(/\.$/g,''); // gives "a,b,c,d,e:10"
then split the string
var array = 'a,b,c,d,e:10.'.replace(/\.$/g,'').split('.');
console.log (array.length); // 1
That's because the string ends with the . character - the second item of the array is empty.
If the string won't contain . at all, you will have the desired one item array.
The split() method works like this as far as I can explain in simple words:
Look for the given string to split by in the given string. If not found, return one item array with the whole string.
If found, iterate over the given string taking the characters between each two occurrences of the string to split by.
In case the given string starts with the string to split by, the first item of the result array will be empty.
In case the given string ends with the string to split by, the last item of the result array will be empty.
It's explained more technically here, it's pretty much the same for all browsers.
According to MDN web docs:
Note: When the string is empty, split() returns an array containing
one empty string, rather than an empty array. If the string and
separator are both empty strings, an empty array is returned.
const myString = '';
const splits = myString.split();
console.log(splits);
// ↪ [""]
Well, split does what it is made to do, it splits your string. Just that the second part of the split is empty.
Because your string is composed of 2 part :
1 : a,b,c,d,e:10
2 : empty
If you try without the dot at the end :
var string = 'a,b,c:10';
var array = string.split ('.');
output is :
["a,b,c:10"]
You have a string with one "." in it and when you use string.split('.') you receive array containing first element with the string content before "." character and the second element with the content of the string after the "." - which is in this case empty string.
So, this behavior is normal. What did you want to achieve by using this string.split?
try this
javascript gives two arrays by split function, then
var Val = "abc#gmail.com";
var mail = Val.split('#');
if(mail[0] && mail[1]) { alert('valid'); }
else { alert('Enter valid email id'); valid=0; }
if both array contains length greater than 0 then condition will true

splitting string is returning full string

I want to create a small script which determines how many words are in a paragraph and then divides the paragraph depending on a certain length. My approach was to split the paragraph using split(), find out how many elements are in the array and then output some of the elements into one paragraph and the rest into another.
var para = document.getElementById('aboutParagraph').innerHTML;
var paraElements = para.split();
var paraLength = paraElements.length;
if(paraLength >= 500){
}
console.log(paraElements);
when I use this code paraElements is being returned in an array where the first element is the entire string.
Sof for example if the paragraph were "this is a paragraph" paraElements is being returned as: ["this is a paragraph"], with a a length of 1. Shouldn't it be ["this", "is", "a", "paragraph"]?
var str = "this is a paragraph";
var ans = str.split(' ');
console.log(ans);
You need to use split(' ') with this format. Use ' ', notice space there. You were not passing any parameter by which to split.
The split() method splits a string at a delimiter you specify (can be a literal string, a reference to a string or a regular expression) and then returns an array of all the parts. If you want just one part, you must pass the resulting array an index.
You are not supplying a delimiter to split on, so you are getting the entire string back.
var s = "This is my test string";
var result = s.split(/\s+/); // Split everywhere there is one or more spaces
console.log(result); // The entire resulting array
console.log("There are " + result.length + " words in the string.");
console.log("The first word is: " + result[0]); // Just the first word
You are missing the split delimiter for a space. Try this:
var para = document.getElementById('aboutParagraph').innerHTML;
var paraElements = para.split(' ');
var paraLength = paraElements.length;
if(paraLength >= 500){
}
console.log(paraElements);
The split() will return an array, if you don't pass in a delimiter as an argument it would encapsulate the whole string into one element of an array.
You can break your words on spaces, but you may want to also consider tabs and newlines. For that reason, you could use some regex /\s+/ which will match on any whitespace character.
The + is used so that it treats all consecutive whitespace characters as one delimiter. Otherwise a string with two spaces, like foo bar would be treated as three words with one being an empty string ["foo", "", "bar"] (the plus makes it ["foo", "bar"] as expected).
var para = document.getElementById('aboutParagraph').innerHTML;
var paraElements = para.split(/\s+/); // <-- need to pass in delimiter to split on
var paraLength = paraElements.length;
if (paraLength >= 500) {}
console.log(paraLength, paraElements);
<p id="aboutParagraph">I want to create a small script which determines how many words are in a paragraph and then divides the paragraph depending on a certain length. My approach was to split the paragraph using split(), find out how many elements are in the array and then output some of the elements into one paragraph and the rest into another.</p>

Whats wrong with this regex logic

I am trying to fetch the value after equal sign, its works but i am getting duplicated values , any idea whats wrong here?
// Regex for finding a word after "=" sign
var myregexpNew = /=(\S*)/g;
// Regex for finding a word before "=" sign
var mytype = /(\S*)=/g;
//Setting data from Grid Column
var strNew = "QCById=20";
var matchNew = myregexpNew.exec(strNew);
var newtype = mytype.exec(strNew);
alert(matchNew);
https://jsfiddle.net/6vjjv0hv/
exec returns an array, the first element is the global match, the following ones are the submatches, that's why you get ["=20", "20"] (using console.log here instead of alert would make it clearer what you get).
When looking for submatches and using exec, you're usually interested in the elements starting at index 1.
Regarding the whole parsing, it's obvious there are better solution, like using only one regex with two submatches, but it depends on the real goal.
You can try without using Regex like this:
var val = 'QCById=20';
var myString = val.substr(val.indexOf("=") + 1);
alert(myString);
Presently exec is returning you the matched value.
REGEXP.exec(SOMETHING) returns an array (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec).
The first item in the array is the full match and the rest matches the parenthesized substrings.
You do not get duplicated values, you just get an array of a matched value and the captured text #1.
See RegExp#exec() help:
If the match succeeds, the exec() method returns an array and updates properties of the regular expression object. The returned array has the matched text as the first item, and then one item for each capturing parenthesis that matched containing the text that was captured.
Just use the [1] index to get the captured text only.
var myregexpNew = /=(\S*)/g;
var strNew = "QCById=20";
var matchNew = myregexpNew.exec(strNew);
if (matchNew) {
console.log(matchNew[1]);
}
To get values on both sides of =, you can use /(\S*)=(\S*)/g regex:
var myregexpNew = /(\S*)=(\S*)/g;
var strNew = "QCById=20";
var matchNew = myregexpNew.exec(strNew);
if (matchNew) {
console.log(matchNew[1]);
console.log(matchNew[2]);
}
Also, you may want to add a check to see if the captured values are not undefined/empty since \S* may capture an empty string. OR use /(\S+)=(\S+)/g regex that requires at least one non-whitespace character to appear before and after the = sign.

extract all text inside braces into array of strings

I have a big string from which I would like to extract all parts that are inside round braces.
Say I have a string like
"this (one) that (one two) is (three )"
I need to write a function that would return an array
["one", "one two", "three "]
I tried to write a regex from some advice found here and failed, since I seem to only get the first element and not a proper array filled with all of them: http://jsfiddle.net/gfQzK/
var match = s.match(/\(([^)]+)\)/);
alert(match[1]);
Could someone point me in the right direction? My solution does not have to be regular expression.
You need a global regex. See if this helps:
var matches = [];
str.replace(/\(([^)]+)\)/g, function(_,m){ matches.push(m) });
console.log(matches); //= ["one", "one two", "three "]
match won't do as it doesn't capture groups in global regex. replace can be used to loop.
You are almost there. You just need to change a few things.
First, add the global attribute to your regex. Now your regex should look like:
/\(([^)]+)\)/g
Then, match.length will provide you with the number of matches. And to extract the matches, use indexes such as match[1] match[2] match[3] ...
You need to use the global flag, and multiline if you have new lines in there, and continually exec the result until you have all your results in an array:
var s='Russia ignored (demands) by the White House to intercept the N.S.A. leaker and return him to the United States, showing the two countries (still) have a (penchant) for that old rivalry from the Soviet era.';
var re = /\(([^)]+)\)/gm, arr = [], res = [];
while ((arr = re.exec(s)) !== null) {
res.push(arr[1]);
}
alert(res);
fiddle
For reference check out this mdn article on exec

Javascript Regexp - Match Characters after a certain phrase

I was wondering how to use a regexp to match a phrase that comes after a certain match. Like:
var phrase = "yesthisismyphrase=thisiswhatIwantmatched";
var match = /phrase=.*/;
That will match from the phrase= to the end of the string, but is it possible to get everything after the phrase= without having to modify a string?
You use capture groups (denoted by parenthesis).
When you execute the regex via match or exec function, the return an array consisting of the substrings captured by capture groups. You can then access what got captured via that array. E.g.:
var phrase = "yesthisismyphrase=thisiswhatIwantmatched";
var myRegexp = /phrase=(.*)/;
var match = myRegexp.exec(phrase);
alert(match[1]);
or
var arr = phrase.match(/phrase=(.*)/);
if (arr != null) { // Did it match?
alert(arr[1]);
}
phrase.match(/phrase=(.*)/)[1]
returns
"thisiswhatIwantmatched"
The brackets specify a so-called capture group. Contents of capture groups get put into the resulting array, starting from 1 (0 is the whole match).
It is not so hard, Just assume your context is :
const context = "https://example.com/pa/GIx89GdmkABJEAAA+AAAA";
And we wanna have the pattern after pa/, so use this code:
const pattern = context.match(/pa\/(.*)/)[1];
The first item include pa/, but for the grouping second item is without pa/, you can use each what you want.
Let try this, I hope it work
var p = /\b([\w|\W]+)\1+(\=)([\w|\W]+)\1+\b/;
console.log(p.test('case1 or AA=AA ilkjoi'));
console.log(p.test('case2 or AA=AB'));
console.log(p.test('case3 or 12=14'));
If you want to get value after the regex excluding the test phrase, use this:
/(?:phrase=)(.*)/
the result will be
0: "phrase=thisiswhatIwantmatched" //full match
1: "thisiswhatIwantmatched" //matching group

Categories