Find consecutive "//" in regex in JavaScript - javascript

I gave it a college try, but I'm stumped. I'm trying to find consecutive slashes within a string. The rest of the regex works great, but the last part I can't quite get.
Here's what I have:
val.match( /^[\/]|[~"#%&*:<>?\\{|}]|[\/|.]$/ )
and finding this thread, I decided to update my code to no avail:
RegEx to find two or more consecutive chars
val.match( /^[\/]|[\/]{2,}|[~"#%&*:<>?\\{|}]|[\/|.]$/ )
What do I need to get this thing going?
So, I need this regex to look for many characters. That would explain the first code sample that I provided:
val.match( /^[\/]|[~"#%&*:<>?\\{|}]|[\/|.]$/ )
What I need it to also do, is look in the string for a double whack. Yes, I'm well aware of indexOf and other string manipulation techniques, but I labeled it regex because it needs to be. Let me know if you need more info...

Uh, why aren't you just doing
/\/{2,}/g
? Your regexes in the OP seem way more complicated...
\/ matches a literal backslash character
{2,} tells to match it twice or more
/g makes the pattern global so you can find all occurences of the pattern in your strings.

[\/]+ should match one or more /s.

/(.)$1+/
would find any place where a single character occurs 2 or more times. the (.) matches a single character, and captures that character into $1, which you then require to be immediately after the initial character, 1 or more times.
For slashes, you can simplify it down to
/\/{2,}/
/\/\/+/
but then you're into leaning toothpick territory.

Why not use indexof? That would be simpler.

Here's the answer.
val.match( /^[\/|_]|[~"#%&*:<>?\\{|}]|[\/]{2,}|[\/|.]$/ )
Not sure why the other version doesn't work, but maybe someone could shed some light onto the matter.
Tests:
_text - Failed leading underscore
/text - Falied leading whack
text~moreText - Failed contains invalid character: ~"#%&*:<>?\{|}
text//text - Failed double whack
text/ - Failed trailing whack
text. - Failed trailing period
Not sure why the code below wasn't working, but moved the double whack test and it works now:
val.match( /^[\/|_]|[\/]{2,}|[~"#%&*:<>?\\{|}]|[\/|.]$/ )

Related

how to reduce complexity in regex?

I have a regex which finds all kind of money denoted in dollars,like $290,USD240,$234.45,234.5$,234.6usd
(\$)[0-9]+\.?([0-9]*)|usd+[0-9]+\.?([0-9]*)|[0-9]+\.?[0-9]*usd|[0-9]+\.?[0-9]*(\$)
This seems to works, but how can i avoid the complexity in my regex?
It is possible to make the regex a bit shorter by collapsing the currency indicators:
You can say USD OR $ amount instead of USD amount OR $ amount. This results in the following regex:
((\$|usd)[0-9]+\.?([0-9]*))|([0-9]+\.?[0-9]*(\$|usd))
Im not sure if you'll find this less complex, but at least it's easier to read because it's shorter
The character set [0-9] can also be replaced by \d -- the character class which matches any digit -- making the regex even shorter.
Doing this, the regex will look as follows:
((\$|usd)\d+\.?\d*)|(\d+\.?\d*(\$|usd))
Update:
According to #Toto this regex would be more performant using non-capturing groups (also removed the not-necessary capture group as pointed out by #Simon MᶜKenzie):
(?:\$|usd)\d+\.?\d*|\d+\.?\d*(?:\$|usd)
$.0 like amounts are not matched by the regex as #Gangnus pointed out. I updated the regex to fix this:
((\$|usd)((\d+\.?\d*)|(\.\d+)))|(((\d+\.?\d*)|(\.\d+))(\$|usd))
Note that I changed \d+\.?\d* into ((\d+\.?\d*)|(\.\d+)): It now either matches one or more digits, optionally followed by a dot, followed by zero or more digits; OR a dot followed by one or more digits.
Without unnecessary capturing groups and using non-capturing groups:
(?:\$|usd)(?:\d+\.?\d*|\.\d+)|(?:\d+\.?\d*|\.\d+)(?:\$|usd)
Try this
^(?:\$|usd)?(?:\d+\.?\d*)(?:\$|usd)?$
Reducing the complexity you are reducing the correctness. The following regex works correctly, but even it doesn't take lowcase. (but that could be managed by a key). All other current answers here simply haven't the correct substring for the decimal number.
^\s*(?:(?:(?:-?(?:usd|\$)|(?:usd|\$)-)(?:(?:0|[1-9]\d*)?(?:\.\d+)?(?<=\d)))|(?:-?(?:(?:0|[1-9]\d*)?(?:\.\d+)?(?<=\d))(?:usd|\$)))\s*$
Look here at the test results.
Make a correct line and only after that try to shorten it.

Exact string negation in javascript regexpressions

This is more a question to satisfy my curiosity than a real need for help, but I will appreciate your help equally as it is driving me nuts.
I am trying to negate an exact string using Javascript regular expressions, the idea is to exclude URL that include the string "www". For instance this list:
http://www.example.org/
http://status.example.org/index.php?datacenter=1
https://status.example.org/index.php?datacenter=2
https://www.example.org/Insights
http://www.example.org/Careers/Job_Opportunities
http://www.example.org/Insights/Press-Releases
For that I can succesfully use the following regex:
/^http(|s):..[^w]/g
This works correctly, but while I can do a positive match I cannot do something like:
/[^www]/g or /[^http]/g
To exclude lines that include the exact string www or http. I have tried the infamous "negative Lookeahead" like that:
/*(?: (?!www).*)/g
But this doesn't work either OR I cannot test it online, it doesn't works in Notepad++ either.
If I were using Perl, Grep, Awk or Textwrangler I would have simply done:
!www OR !http
And this would have done the job.
So, my question is obviously: What would be the correct way to do such thing in Javascript? Does this depend on the regex parser (as I seem to understand?).
Thanks for any answer ;)
You need to add a negative lookahead at the start.
^(?!.*\bwww\.)https?:\/\/.*
DEMO
(?!.*\bwww\.) Negative lookahead asserts that the string we are going to match won't contain, www.. \b means word boundary which matches between a word character and a non-word character. Without \b, www. in your regex would match www. in foowww.
To negate 'www' at every position in the input string:
var a = [
'http://www.example.org/',
'http://status.example.org/index.php?datacenter=1',
'https://status.example.org/index.php?datacenter=2',
'https://www.example.org/Insights',
'http://www.example.org/Careers/Job_Opportunities',
'http://www.example.org/Insights/Press-Releases'
];
a.filter(function(x){ return /^((?!www).)*$/.test(x); });
So at every position check that 'www' doesn't match, and then match
any character (.).

What does this JavaScript Regular Expression /[^\d.-] mean?

We had a developer here who had added following line of code to a web application:
var amount = newValue.replace(/[^\d.-]/g, '');
The particular line deals with amount values that a user may enter into a field.
I know the following about the regular expression:
that it replaces the matches with empty strings (i.e. removes them)
that /g is a flag that means to match all occurrences inside "newValue"
that the brackets [] denote a special group
that ^ means beginning of the line
that d means digits
Unfortunately I do not know enough to determine what kind of strings this should match. I checked with some web-based regex testers if it matches e.g. strings like 98.- and other alternatives with numbers but so far no luck.
My problem is that it seems to make IE very slow so I need to replace it with something else.
Any help on this would be appreciated.
Edit:
Thanks to all who replied. I tried not just Google but sites like myregextester.com, regular-expressions.info, phpliveregex.com, and others. My problem was misunderstanding the meaning of ^ and expecting that this required a numeric string like 44.99.
Inside the group, when the ^ is the first character, it works as a negation of the character matches. In other words, it's saying match any character that are not the ones in the group.
So this will mean "match anything that is not a digit, a period, or a hyphen".
The ^ character is a negation character.
var newValue = " x44x.-x ";
var amount = newValue.replace(/[^\d.-]/g, '');
console.log(amount);
will print
44.-
I suspect the developer maybe just wanted to remove trailing whitespaces? I would rather try to parse the string for numbers and remove anything else.

regular expression for ends with some word

I want to build regular expression for series
cd1_inputchk,rd_inputchk,optinputchk where inputchk is common (ending characters)
please guide for the same
Very simply, it's:
/inputchk$/
On a per-word basis (only testing matching /inputchk$/.test(word) ? 'matches' : 'doesn\'t match';). The reason this works, is it matches "inputchk" that comes at the end of a string (hence the $)
As for a list of words, it starts becoming more complicated.
Are there spaces in the list?
Are they needed?
I'm going to assume no is the answer to both questions, and also assume that the list is comma-separated.
There are then a couple of ways you could proceed. You could use list.split() to get an array of each word, and teast each to see if they end in inputchk, or you could use a modified regular expression:
/[^,]*inputchk(?:,|$)/g
This one's much more complicated.
[^,] says to match non-, characters
* then says to match 0 or more of those non-, chars. (it will be greedy)
inputchk matches inputchk
(?:...) is a non-capturing parenthesis. It says to match the characters, but not store the match as part of the result.
, matches the , character
| says match one side or the other
$ says to match the end of the string
Hopefully all of this together will select the strings that you're looking for, but it's very easy to make a mistake, so I'd suggest doing some rigorous testing to make sure there aren't any edge-conditions that are being missed.
This one should work (dollar sign basically means "end of string"):
/inputchk$/

Struggling with regex to match only two of a character, not three

I need to match all occurrences of // in a string in a Javascript regex
It can't match /// or /
So far I have (.*[^\/])\/{2}([^\/].*)
which is basically "something that isn't /, followed by // followed by something that isn't /"
The approach seems to work apart from when the string I want to match starts with //
This doesn't work:
//example
This does
stuff // example
How do I solve this problem?
Edit: A bit more context - I am trying to replace // with !, so I am then using:
result = result.replace(myRegex, "$1 ! $2");
Replace two slashes that either begin the string or do not follow a slash,
and are followed by anything not a slash or the end of the string.
s=s.replace(/(^|[^/])\/{2}([^/]|$)/g,'$1!$2');
It looks like it wouldn't work for example// either.
The problem is because you're matching // preceded and followed by at least one non-slash character. This can be solved by anchoring the regex, and then you can make the preceding/following text optional:
^(.*[^\/])?\/{2}([^\/].*)?$
Use negative lookahead/lookbehind assertions:
(.*)(?<!/)//(?!/)(.*)
Use this:
/([^/]*)(\/{2})([^/]*)/g
e.g.
alert("///exam//ple".replace(/([^/]*)(\/{2})([^/]*)/g, "$1$3"));
EDIT: Updated the expression as per the comment.
/[/]{2}/
e.g:
alert("//example".replace(/[/]{2}/, ""));
This does not answer the OP's question about using regex, but since some of the original comments suggested using .replaceAll, since not everyone who reads the question in the future wants to use regex, since people might mistakenly assume that regex is the only alternative, and since these details cannot be accommodated by submitting a comment, here's a poor man's non-regex approach:
Temporarily replace the three contiguous characters with something that would never naturally occur — really important when dealing with user-entered values.
Replace the remaining two contiguous characters using .replaceAll().
Return the original three contiguous characters.
For instance, let's say you wanted to remove all instances of ".." without affecting occurrences of "...".
var cleansedText = $(this).text().toString()
.replaceAll("...", "☰☸☧")
.replaceAll("..", "")
.replaceAll("☰☸☧", "...")
;
$(this).text(cleansedText);
Perhaps not as fast as regex for longer strings, but works great for short ones.

Categories