JavaScript regular expression to validate input such as 2.4**6-0 - javascript

I am trying to validate a line of input. it should be one or more integers. The integers can be separated by a single full stop or one or more asterisks. 0 is not valid, but -0 is. A full stop is not valid at the end of the line, but one or more asterisks are. So "2.4*-2.-0**" is valid.
I have tried using:
/^(?:(?:-0)|(?:-?[1-9]\d*)\.|\*+)*(?:(?:-0)|(?:-?[1-9]\d*))\**$/.test(myline)
but this rejects input that has a mix of full stops and asterisks in it.
I do subsequently successfully parse out all the components of the string using:
var regEx = new RegExp("((?:-0)|(?:-?[1-9]\\d*))(\\.|\\*+|$)", "g"), result;
while((result = regEx.exec(myline)) !== null)
I could concatenate the values of result[0] and result[1] from each pass of the while loop and at the end compare that with the original string. I was just hoping for a single test at the start before entering the loop with all its logic.

The answer may be to test for invalid things rather than a completely valid string, such as:
if (/(?:[^-.*0-9])|(?:^[.*0])|(?:\d-)|(?:\.[.*])|(?:\*\.)|(?:[.*]-?0\d)|(?:[-.]$)|(?:-00)|(?:-[^0-9])|(?:[.*]0)|^$/.test(mydata))
I have run a comprehensive set of test cases through this and have not found any problems.

Related

What Regex would capture both the beginning and end from of a string?

I am trying to edit a DateTime string in typescript file.
The string in question is 02T13:18:43.000Z.
I want to trim the first three characters including the letter T from the beginning of a string AND also all 5 characters from the end of the string, that is Z000., including the dot character. Essentialy I want the result to look like this: 13:18:43.
From what I found the following pattern (^(.*?)T) can accomplish only the first part of the trim I require, that leaves the initial result like this: 13:18:43.000Z.
What kind of Regex pattern must I use to include the second part of the trim I have mentioned? I have tried to include the following block in the same pattern (Z000.)$ but of course it failed.
Thanks.
Any help would be appreciated.
There is no need to use regular expression in order to achieve that. You can simply use:
let value = '02T13:18:43.000Z';
let newValue = value.slice(3, -5);
console.log(newValue);
it will return 13:18:43, assumming that your string will always have the same pattern. According to the documentation slice method will substring from beginIndex to endIndex. endIndex is optional.
as I see you only need regex solution so does this pattern work?
(\d{2}:)+\d{2} or simply \d{2}:\d{2}:\d{2}
it searches much times for digit-digit-doubleDot combos and digit-digit-doubleDot at the end
the only disadvange is that it doesn't check whether say there are no minutes>59 and etc.
The main reason why I didn't include checking just because I kept in mind that you get your dates from sources where data that are stored are already valid, ex. database.
Solution
This should suffice to remove both the prefix from beginning to T and postfix from . to end:
/^.*T|\..*$/g
console.log(new Date().toISOString().replace(/^.*T|\..*$/g, ''))
See the visualization on debuggex
Explanation
The section ^.*T removes all characters up to and including the last encountered T in the string.
The section \..*$ removes all characters from the first encountered . to the end of the string.
The | in between coupled with the global g flag allows the regular expression to match both sections in the string, allowing .replace(..., '') to trim both simultaneously.

Javascript "".length returning 1 rather than 0

Ok so I am rather stumped by this one.
I get a string value from a javascript library. I call myStringVar = myStringVar.trim() but when I do myStringVar.substring(0,1) it gives me an empty string. When I call var arr = myStringVar.split('') the first element in the array is and empty string, and when I call arr[0].trim().length it returns 1 instead of zero.
Am I missing something?
EDIT
Following the comments and responses I have been able to isolate the problem down to the existence of a non-visual unicode character at the beginning of the string. I will now try to find a way to remove those characters from the string....or better yet extract the portions of the string that are of interest.
Thanks for the help.
The most likely answer for this is that you have some invisible Unicode character in your string (for instance, "⁣", U+2063 INVISIBLE SEPARATOR).
A string containing only such a character would look to a user (or programmer) like an empty string, but would infact have length 1 since it does contain a character.
One simple way to test if this is the case, is to get the Unicode character code of the character in the string with string.charCodeAt(0). You can then look this up value in a Unicode table (such as this one), which should tell you if you have an invisible character in your string.

.trim() and regular expressions producing unexpected results

I wrote a fairly simple regular expression to detect when a string looks like it could be an email:
var looksLikeEmail = /^\S+#\S+\.\S+$/gi;
I'm using Knockout and the string being tested is the value of a textarea.
Essentially, say we have the value of the textarea in a variable text. This value was, for example, the typed in value abc#example.com.
What's odd, is it seems like, even though text === text.trim(), looksLikeEmail.test(text) returns true, but looksLikeEmail.test(text.trim()) returns false.
On the other hand, if I manually create the string var test2 = 'abc#example.com', it does not have this issue.
This seems to indicate to me that the textarea is inserting some odd characters or something... that .trim() is doing something weird with. But test.length === test2.length and test.length === test.trim().length
Does anyone know how to make this behave correctly?
I've written up a jsfiddle to quickly demonstrate the behavior...
If you go to the fiddle and try typing in an email... you will see the problem. another weird behavior: add a space after the email, then remove it. /confused
Any help is much appreciated. Thanks.
.test(), just like .exec() will remember the last index of a match when using a global regex, and try to match from it onward, failing on the second call. Just remove the /g option from your regex - it doesn't make sense to have /g in a non-multiline regex which matches beginning and end.

Negating in /,?(([1-9]-[1-9])|([1-9]))/g

I am trying to match a string containing a mix of digits and hyphenated digits, like a crossword answer specification, for example 1,2-2 or 1-1,3,4,2-2
/,?(([1-9]-[1-9])|([1-9]))/g is what I've come up to match the string
value = value.replace(/,?(([1-9]-[1-9])|([1-9]))/g, '');
replaces ok, and I've checked it out in an online tester.
What I really need is to negate this, so I can use it on a keyup event, examine the contents of a textarea and remove characters that don't fit, so it only allows through characters as in the example.
I've tried ^ where expected, but this it's not doing what I expect, how should I negate the regex so I remove everything that doesn't match?
If there is a better way of doing this I'm open to suggestions too.
var value = 'hello,1,2,3,4-6,1-1,3,test,4,2-2';
var pattern = /,?(([1-9]-[1-9])|([1-9]))/g;
value.replace(pattern, ''); // "hello,test"
You can use String#match. With /g flag, it returns an array of all the matches, then you can use Array#join to join them.
The problem is that String#match returns null when there is no match, so you have to handle that case and use an empty array so that it can join:
(value.match(pattern) || []).join(''); // ",1,2,3,4-6,1-1,3,4,2-2"
Note: It may better to check them on onblur rather than onkeyup. Messing with the text that the user is currently typing will make it annoying. Better to wait for the user to finish typing.
Didn't test it in JS, but this should return the valid string beginning from the left and as long as valid values are encountered (note that I used \d - if you'd like 1-9 only, then use your brackets).
(?:\d(?:-\d)?,)*\d(?:-\d)?
E.g. matching this regular expression with the string "0-1,1,2,3,4-4,2,,1,3--4" will return "0-1,1,2,3,4-4,2" as the first match.

Trying to remove trailing text

I having the following code. I want to extract the last text (hello64) from it.
<span class="qnNum" id="qn">4</span><span>.</span> hello64 ?*
I used the code below but it removes all the integers
questionText = questionText.replace(/<span\b.*?>/ig, "");
questionText=questionText.replace(/<\/span>/ig, "");
questionText = questionText.replace(/\d+/g,"");
questionText = questionText.replace("*","");
questionText = questionText.replace(". ",""); i want to remove the first integer, and need to keep the rest of the integers
It's the third line .replace(/\d+/g,"") which is replacing the integers. If you want to keep the integers, then don't replace \d+, because that matches one or more digits.
You could achieve most of that all on one line, by the way - there's no need to have multiple replaces there:
var questionText = questionText.replace(/((<span\b.*?>)|(<\/span>)|(\d+))/ig, "");
That would do the same as the first three lines of your code. (of course, you'd need to drop the |(\d+) as per the first part of the answer if you didn't want to get rid of the digits.
[EDIT]
Re your comment that you want to replace the first integer but not the subsequent ones:
The regex string to do this would depend very heavily on what the possible input looks like. The problem is that you've given us a bit of random HTML code; we don't know from that whether you're expecting it to always be in this precise format (ie a couple of spans with contents, followed by a bit at the end to keep). I'll assume that this is the case.
In this case, a much simpler regex for the whole thing would be to replace eveything within <span....</span> with blank:
var questionText = questionText.replace(/(<span\b.*?>.*?<\/span>)/ig, "");
This will eliminate the whole of the <span> tags plus their contents, but leave anything outside of them alone.
In the case of your example this would provide the desired effect, but as I say, it's hard to know if this will work for you in all cases without knowing more about your expected input.
In general it's considered difficult to parse arbitrary HTML code with regex. Regex is a contraction of "Regular Expressions", which is a way of saying that they are good at handling strings which have 'regular' syntax. Abitrary HTML is not a 'regular' syntax due to it's unlimited possible levels of nesting. What I'm trying to say here is that if you have anything more complex than the simple HTML snippets you've supplied, then you may be better off using a HTML parser to extract your data.
This will match the complete string and put the part after the last </span> till the next word boundary \b into the capturing group 1. You just need to replace then with the group 1, i.e. $1.
searched_string = string.replace(/^.*<\/span>\s*([A-Za-z0-9]+)\b.*$/, "$1");
The captured word can consist of [A-Za-z0-9]. If you want to have anything else there just add it into that group.

Categories