I am using the following Javascript to read strings out of a text file and process them with a regular expression
while (!textFile.AtEndOfStream)
{
currLine = textFile.ReadLine();
match = re.exec(currLine);
do stuff with match
}
The problem I have is that every other time re.exec is called it fails and returns null; so the first row is processed correctly, but the second row results in null, then the third row works, and the fourth row results in null.
I can use the following code to get the result I want
while (!textFile.AtEndOfStream)
{
currLine = textFile.ReadLine();
match = re.exec(currLine);
if (match == null) match = re.exec(currLine);
}
but that seems a bit of a nasty kludge. Can anyone tell my why this happens and what I can do to fix it properly?
Your re is defined with the ‘global’ modifier, eg. something like /foo/g.
When a RegExp is global, it retains hidden state in the RegExp instance itself to remember the last place it matched. The next time you search, it'll search forward from the index of the end of the last match, and find the next match from there. If you're passing a different string to the one you passed last time, this will give highly unpredictable results!
When you use global regexps, you should exhaust them by calling them repeatedly until you get null. Then the next time you use it you'll be matching from the start of the string again. Alternatively you can explicitly set re.lastIndex to 0 before using one. If you only want to test for the existence of one match, as in this example, simplest is just not to use g.
The JS RegExp interfaces is one of the most confusing, poorly-designed parts of the language. (And this is JavaScript, so that's saying a lot.)
Javascript regular expressions keep some state between executions and you are probably falling in to that trap.
I always use the String.match function and have never been bitten :
while (!textFile.AtEndOfStream)
{
match = textFile.ReadLine ().match (re);
do stuff with match
}
Related
I'm trying to pull the first occurence of a regex pattern from a string all in one statement to make my code look cleaner. This is what I want to do:
var matchedString = somestring.match(/some regex/g)[0];
I would expect this to be legal but it throws an exception:
Exception: somestring.match(...) is null
It seems like JS is trying to index the array before match is finsihed, as the array does provide atleast one match, so I don't expect it to be null.
I would like some insight in why it happens. Could it be a bug?
My machine is a PC running Arch Linux x86_64. The code is being executed within the scratchpad of firefox 32.0.3.
Thanks for your interest.
If somestring.match() finds no match, then it returns null.
And, null[0] throws an exception.
Since you are getting this exact exception, your regex is not being found in the content. Be very careful using the g flag on a match option in this way as it does not always do what you expect when you have submatches specified in the regex. Since it looks like you just want the first match anyway, you should probably remove the g option.
A safer way to code is:
var matches = somestring.match(/some regex/);
if (matches) {
// do something here with matches[0]
}
If you want to do it in one statement (and there's no particularly good reason why that is a good idea, see jfriend000's answer), then:
var firstMatchOrFalse = /pattern/.test(somestring) && somestring.match(/pattern/)[0];
and if you only want the first match, why the g flag?
Ok I have a situation where customer's of ours can construct a text and with in that text enter in specified short codes to be replaced with data from the database.
For instance:
[f], thanks for supporting our organization. Tap here: [u] to enter your donation.
would be sent as
Jamie, thanks for supporting our organization. Tap here: https://cbo.io/oh98eI to enter your donation.
I need to be able to look for and correct certain errors in the typing as they type. Specifically, if the type [u] followed by any character but a space I want to be able to automatically put a space in there for them. I thought I had it with this.
if (msg.indexOf('[u]') >= 0) {
if (msg.match(/\[u\][^\s]/g).length) {
vmsg = msg.replace('[u]', '[u] ');
msg_len = msg.length;
v$('#message_text').val(msg);
}
}
but the problem I am seeing is as soon as I change the [u]. to [u] . Javascript now renders an "Uncaught TypeError: Cannot read property 'length' of null" because of the second if statement of msg.match etc.
I also have another check similar to this one and if I use both of them the second one doesn't get recognized because of the error. How do I write this so that it does not cause this error.
The next problem I have is if I want to use the same short code twice and I try to enter a character after the second one the replace only replaces the first occurrence so that I end up with [u] and a bunch of space for the first one and the second one remains [u]. with no change. Is there a way to get it so that it will replace the one with the discrepancy rather than the one that is ok.
There's no need to use .length. match() returns null when there's no match, and returns a non-empty array when there's a match, so just test the return value itself instead of its length. There's also no need to use the g modifier; you only care if there's at least one match, you don't need to return all of them (since you're not actually using the returned value).
if (msg.match(/\[u\]\S/)) {
and [^\s] can be simplified to \S.
I've been trying evaluate a string based in a regular expression, however I noticed a weird behaviour and when I test with the regular expression more than once. The test method alternates between true and false .
Check this codepen --> http://codepen.io/gpincheiraa/pen/BoXrEz?editors=001
var emailRegex = /^([a-zA-Z0-9_\.\-]){0,100}\#(([a-zA-Z0-9\-]){0,100}\.)+([a-zA-Z0-9]{2,4})+$/,
phoneChileanRegex = /^\+56\S*\s*9\S*\s*\d{8}(?!\#\w+)/g,
number = "+56982249953";
if(!number.match(phoneChileanRegex) && !number.match(emailRegex) ) {
console.log("it's not a phone number or email address");
}
//Weird Behaviour
console.log(phoneChileanRegex.test(number)); --> true
console.log(phoneChileanRegex.test(number)); --> false
From the MDN documentation:
As with exec() (or in combination with it), test() called multiple times on the same global regular expression instance will advance past the previous match.
So, the second time you call the method, it will look for a match after the first match, i.e. after +56982249953. There is no match because the pattern is anchored to the start of the string (^) (and because there are no characters left), so it returns false.
To make this work you have to remove the g modifier.
That's because the test method, like many RegExp methods, tries to find multiple matches in the same string when used with the g flag. To do this, it keeps track of position it should search from in the RegExp's lastIndex property. If you don't need to find multiple matches, just don't put the g flag. And if you ever want to use it, just remember to set regex.lastIndex to 0 when you want to test a new string.
Read more about lastIndex on MDN
I am trying to validate a line of input. it should be one or more integers. The integers can be separated by a single full stop or one or more asterisks. 0 is not valid, but -0 is. A full stop is not valid at the end of the line, but one or more asterisks are. So "2.4*-2.-0**" is valid.
I have tried using:
/^(?:(?:-0)|(?:-?[1-9]\d*)\.|\*+)*(?:(?:-0)|(?:-?[1-9]\d*))\**$/.test(myline)
but this rejects input that has a mix of full stops and asterisks in it.
I do subsequently successfully parse out all the components of the string using:
var regEx = new RegExp("((?:-0)|(?:-?[1-9]\\d*))(\\.|\\*+|$)", "g"), result;
while((result = regEx.exec(myline)) !== null)
I could concatenate the values of result[0] and result[1] from each pass of the while loop and at the end compare that with the original string. I was just hoping for a single test at the start before entering the loop with all its logic.
The answer may be to test for invalid things rather than a completely valid string, such as:
if (/(?:[^-.*0-9])|(?:^[.*0])|(?:\d-)|(?:\.[.*])|(?:\*\.)|(?:[.*]-?0\d)|(?:[-.]$)|(?:-00)|(?:-[^0-9])|(?:[.*]0)|^$/.test(mydata))
I have run a comprehensive set of test cases through this and have not found any problems.
I think this is a very basic question, but I really can't understand the concept. I have the following regular expression:
var t = '11:59 am';
t.match(/^(\d+)/);
Now, according to my understanding when I print the value I should just get 11 since I am just checking for digits. However, I get 11,11. I have to use 0th element to pick the required value like t.match(/^(\d+)/)[0].
This is because you are using a capture group, (), around the digits. Try replacing this with:
t.match(/^\d+/);
Note: this will still return an array, because that's just what .match() does.
match() always returns an array if there are any matches. Element [0] is the whole match, and element [1] is what is inside the first set of parentheses.