Struggling with regex to match only two of a character, not three - javascript

I need to match all occurrences of // in a string in a Javascript regex
It can't match /// or /
So far I have (.*[^\/])\/{2}([^\/].*)
which is basically "something that isn't /, followed by // followed by something that isn't /"
The approach seems to work apart from when the string I want to match starts with //
This doesn't work:
//example
This does
stuff // example
How do I solve this problem?
Edit: A bit more context - I am trying to replace // with !, so I am then using:
result = result.replace(myRegex, "$1 ! $2");

Replace two slashes that either begin the string or do not follow a slash,
and are followed by anything not a slash or the end of the string.
s=s.replace(/(^|[^/])\/{2}([^/]|$)/g,'$1!$2');

It looks like it wouldn't work for example// either.
The problem is because you're matching // preceded and followed by at least one non-slash character. This can be solved by anchoring the regex, and then you can make the preceding/following text optional:
^(.*[^\/])?\/{2}([^\/].*)?$

Use negative lookahead/lookbehind assertions:
(.*)(?<!/)//(?!/)(.*)

Use this:
/([^/]*)(\/{2})([^/]*)/g
e.g.
alert("///exam//ple".replace(/([^/]*)(\/{2})([^/]*)/g, "$1$3"));
EDIT: Updated the expression as per the comment.
/[/]{2}/
e.g:
alert("//example".replace(/[/]{2}/, ""));

This does not answer the OP's question about using regex, but since some of the original comments suggested using .replaceAll, since not everyone who reads the question in the future wants to use regex, since people might mistakenly assume that regex is the only alternative, and since these details cannot be accommodated by submitting a comment, here's a poor man's non-regex approach:
Temporarily replace the three contiguous characters with something that would never naturally occur — really important when dealing with user-entered values.
Replace the remaining two contiguous characters using .replaceAll().
Return the original three contiguous characters.
For instance, let's say you wanted to remove all instances of ".." without affecting occurrences of "...".
var cleansedText = $(this).text().toString()
.replaceAll("...", "☰☸☧")
.replaceAll("..", "")
.replaceAll("☰☸☧", "...")
;
$(this).text(cleansedText);
Perhaps not as fast as regex for longer strings, but works great for short ones.

Related

Exclude list of string in validation - regex [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 2 years ago.
I know that I can negate group of chars as in [^bar] but I need a regular expression where negation applies to the specific word - so in my example how do I negate an actual bar, and not "any chars in bar"?
A great way to do this is to use negative lookahead:
^(?!.*bar).*$
The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Inside the lookahead [is any regex pattern].
Unless performance is of utmost concern, it's often easier just to run your results through a second pass, skipping those that match the words you want to negate.
Regular expressions usually mean you're doing scripting or some sort of low-performance task anyway, so find a solution that is easy to read, easy to understand and easy to maintain.
Solution:
^(?!.*STRING1|.*STRING2|.*STRING3).*$
xxxxxx OK
xxxSTRING1xxx KO (is whether it is desired)
xxxSTRING2xxx KO (is whether it is desired)
xxxSTRING3xxx KO (is whether it is desired)
You could either use a negative look-ahead or look-behind:
^(?!.*?bar).*
^(.(?<!bar))*?$
Or use just basics:
^(?:[^b]+|b(?:$|[^a]|a(?:$|[^r])))*$
These all match anything that does not contain bar.
The following regex will do what you want (as long as negative lookbehinds and lookaheads are supported), matching things properly; the only problem is that it matches individual characters (i.e. each match is a single character rather than all characters between two consecutive "bar"s), possibly resulting in a potential for high overhead if you're working with very long strings.
b(?!ar)|(?<!b)a|a(?!r)|(?<!ba)r|[^bar]
I came across this forum thread while trying to identify a regex for the following English statement:
Given an input string, match everything unless this input string is exactly 'bar'; for example I want to match 'barrier' and 'disbar' as well as 'foo'.
Here's the regex I came up with
^(bar.+|(?!bar).*)$
My English translation of the regex is "match the string if it starts with 'bar' and it has at least one other character, or if the string does not start with 'bar'.
The accepted answer is nice but is really a work-around for the lack of a simple sub-expression negation operator in regexes. This is why grep --invert-match exits. So in *nixes, you can accomplish the desired result using pipes and a second regex.
grep 'something I want' | grep --invert-match 'but not these ones'
Still a workaround, but maybe easier to remember.
If it's truly a word, bar that you don't want to match, then:
^(?!.*\bbar\b).*$
The above will match any string that does not contain bar that is on a word boundary, that is to say, separated from non-word characters. However, the period/dot (.) used in the above pattern will not match newline characters unless the correct regex flag is used:
^(?s)(?!.*\bbar\b).*$
Alternatively:
^(?!.*\bbar\b)[\s\S]*$
Instead of using any special flag, we are looking for any character that is either white space or non-white space. That should cover every character.
But what if we would like to match words that might contain bar, but just not the specific word bar?
(?!\bbar\b)\b\[A-Za-z-]*bar[a-z-]*\b
(?!\bbar\b) Assert that the next input is not bar on a word boundary.
\b\[A-Za-z-]*bar[a-z-]*\b Matches any word on a word boundary that contains bar.
See Regex Demo
Extracted from this comment by bkDJ:
^(?!bar$).*
The nice property of this solution is that it's possible to clearly negate (exclude) multiple words:
^(?!bar$|foo$|banana$).*
I wish to complement the accepted answer and contribute to the discussion with my late answer.
#ChrisVanOpstal shared this regex tutorial which is a great resource for learning regex.
However, it was really time consuming to read through.
I made a cheatsheet for mnemonic convenience.
This reference is based on the braces [], (), and {} leading each class, and I find it easy to recall.
Regex = {
'single_character': ['[]', '.', {'negate':'^'}],
'capturing_group' : ['()', '|', '\\', 'backreferences and named group'],
'repetition' : ['{}', '*', '+', '?', 'greedy v.s. lazy'],
'anchor' : ['^', '\b', '$'],
'non_printable' : ['\n', '\t', '\r', '\f', '\v'],
'shorthand' : ['\d', '\w', '\s'],
}
Just thought of something else that could be done. It's very different from my first answer, as it doesn't use regular expressions, so I decided to make a second answer post.
Use your language of choice's split() method equivalent on the string with the word to negate as the argument for what to split on. An example using Python:
>>> text = 'barbarasdbarbar 1234egb ar bar32 sdfbaraadf'
>>> text.split('bar')
['', '', 'asd', '', ' 1234egb ar ', '32 sdf', 'aadf']
The nice thing about doing it this way, in Python at least (I don't remember if the functionality would be the same in, say, Visual Basic or Java), is that it lets you know indirectly when "bar" was repeated in the string due to the fact that the empty strings between "bar"s are included in the list of results (though the empty string at the beginning is due to there being a "bar" at the beginning of the string). If you don't want that, you can simply remove the empty strings from the list.
I had a list of file names, and I wanted to exclude certain ones, with this sort of behavior (Ruby):
files = [
'mydir/states.rb', # don't match these
'countries.rb',
'mydir/states_bkp.rb', # match these
'mydir/city_states.rb'
]
excluded = ['states', 'countries']
# set my_rgx here
result = WankyAPI.filter(files, my_rgx) # I didn't write WankyAPI...
assert result == ['mydir/city_states.rb', 'mydir/states_bkp.rb']
Here's my solution:
excluded_rgx = excluded.map{|e| e+'\.'}.join('|')
my_rgx = /(^|\/)((?!#{excluded_rgx})[^\.\/]*)\.rb$/
My assumptions for this application:
The string to be excluded is at the beginning of the input, or immediately following a slash.
The permitted strings end with .rb.
Permitted filenames don't have a . character before the .rb.

Javascript regular expression (unbroken repetitions of a pattern)

Let's say that I have a given string in javascript - e.g., var s = "{{1}}SomeText{{2}}SomeText"; It may be very long (e.g., 25,000+ chars).
NOTE: I'm using "SomeText" here as a placeholder to refer to any number of characters of plain text. In other words, "SomeText" could be any plain text string which doesn't include {{1}} or {{2}}. So the above example could be var s = "{{1}}Hi there. This is a string with one { curly bracket{{2}}Oh, very nice to meet you. I also have one } curly bracket!"; And that would be perfectly valid.
The rules for it are simple:
It does not need to have any instances of {{2}}. However, if it does, then after that instance we cannot encounter another {{2}} unless we find a {{1}} first.
Valid examples:
"{{2}}SomeText"
"{{1}}SomeText{{2}}SomeText"
"{{1}}SomeText{{1}}SomeText{{2}}SomeText"
"{{1}}SomeText{{1}}SomeText{{2}}SomeText{{1}}SomeText"
"{{1}}SomeText{{1}}SomeText{{2}}SomeText{{1}}SomeText{{1}}SomeText"
"{{1}}SomeText{{1}}SomeText{{2}}SomeText{{1}}SomeText{{1}}SomeText{{2}}SomeText"
etc...
Invalid examples:
"{{2}}SomeText{{2}}SomeText"
"{{1}}SomeText{{2}}SomeText{{2}}SomeText"
"{{1}}SomeText{{2}}SomeText{{2}}SomeText{{1}}SomeText"
etc...
This seems like a relatively easy problem to solve - and indeed I could easily solve it without regular expressions, but I'm keen to learn how to do something like this with regular expressions. Unfortunately, I'm not even sure if "conditionals and lookaheads" is a correct description of the issue in this case.
NOTE: If a workable solution is presented that doesn't involve "conditionals and lookaheads" then I will edit the title.
It's probably easier to invert the condition. Try to match any text that contains two consecutive instances of {{2}}, and if it doesn't match that, it's good.
Using this strategy, your pattern can be as simple as:
/{\{2}}([^{]*){\{2}}/
Demonstration
This will match a literal {{2}}, followed by zero or more characters other than {, followed by a literal {{2}}.
Notice that the second { needs to be escaped, otherwise, the regex engine will consider the {2} as to be a quantifier on the previous { (i.e. {{2} matches exactly two { characters).
Just in case you need to allow characters like {, and between the two {{2}}, you can use a pattern like this:
/{\{2}}((?!{\{1}}).)*{\{2}}/
Demonstration
This will match a literal {{2}}, followed by zero or more of any character, so long as those characters create a sequence like {{1}}, followed by a literal {{2}}.
(({{1}}SomeText)+({{2}}SomeText)?)*
Broken down:
({{1}}SomeText)+ - 1 to many {{1}} instances (greedy match)
({{2}}SomeText)? - followed by an optional {{2}} instance
Then the whole thing is wrapped in ()* such that the sequence can appear 0 to many times in a row.
No conditionals or lookaheads needed.
You said you can have one instance of {2} first, right?
^(.(?!{2}))(.{2})?(?!{2})((.(?!{2})){1}(.(?!{2}))({2})?)$
Note if {2} is one letter replace all dots with [^{2}]

Nice way to do this regex substitution

I'm writing a javascript function which takes a regex and some elements against which it matches the regex against the name attribute.
Let's say i'm passed this regex
/cmw_step_attributes\]\[\d*\]/
and a string that is structured like this
"foo[bar][]chicken[123][cmw_step_attributes][456][name]"
where all the numbers could vary, or be missing. I want to match the regex against the string in order to swap out the 456 for another number (which will vary), eg 789. So, i want to end up with
"foo[bar][]chicken[123][cmw_step_attributes][789][name]"
The regex will match the string, but i can't swap out the whole regex for 789 as that will wipe out the "[cmw_step_attributes][" bit. There must be a clean and simple way to do this but i can't get my head round it. Any ideas?
thanks, max
Capture the first part and put it back into the string.
.replace(/(cmw_step_attributes\]\[)\d*/, '$1789');
// note I removed the closing ] from the end - quantifiers are greedy so all numbers are selected
// alternatively:
.replace(/cmw_step_attributes\]\[\d*\]/, 'cmw_step_attributes][789]')
Either literally rewrite part that must remain the same in replacement string, or place it inside capturing brackets and reference it in replace.
See answer on: Regular Expression to match outer brackets.
Regular expressions are the wrong tool for the job because you are dealing with nested structures, i.e. recursion.
Have you tried:
var str = 'foo[bar][]chicken[123][cmw_step_attributes][456][name]';
str.replace(/cmw_step_attributes\]\[\d*?\]/gi, 'cmw_step_attributes][XXX]');

Find consecutive "//" in regex in JavaScript

I gave it a college try, but I'm stumped. I'm trying to find consecutive slashes within a string. The rest of the regex works great, but the last part I can't quite get.
Here's what I have:
val.match( /^[\/]|[~"#%&*:<>?\\{|}]|[\/|.]$/ )
and finding this thread, I decided to update my code to no avail:
RegEx to find two or more consecutive chars
val.match( /^[\/]|[\/]{2,}|[~"#%&*:<>?\\{|}]|[\/|.]$/ )
What do I need to get this thing going?
So, I need this regex to look for many characters. That would explain the first code sample that I provided:
val.match( /^[\/]|[~"#%&*:<>?\\{|}]|[\/|.]$/ )
What I need it to also do, is look in the string for a double whack. Yes, I'm well aware of indexOf and other string manipulation techniques, but I labeled it regex because it needs to be. Let me know if you need more info...
Uh, why aren't you just doing
/\/{2,}/g
? Your regexes in the OP seem way more complicated...
\/ matches a literal backslash character
{2,} tells to match it twice or more
/g makes the pattern global so you can find all occurences of the pattern in your strings.
[\/]+ should match one or more /s.
/(.)$1+/
would find any place where a single character occurs 2 or more times. the (.) matches a single character, and captures that character into $1, which you then require to be immediately after the initial character, 1 or more times.
For slashes, you can simplify it down to
/\/{2,}/
/\/\/+/
but then you're into leaning toothpick territory.
Why not use indexof? That would be simpler.
Here's the answer.
val.match( /^[\/|_]|[~"#%&*:<>?\\{|}]|[\/]{2,}|[\/|.]$/ )
Not sure why the other version doesn't work, but maybe someone could shed some light onto the matter.
Tests:
_text - Failed leading underscore
/text - Falied leading whack
text~moreText - Failed contains invalid character: ~"#%&*:<>?\{|}
text//text - Failed double whack
text/ - Failed trailing whack
text. - Failed trailing period
Not sure why the code below wasn't working, but moved the double whack test and it works now:
val.match( /^[\/|_]|[\/]{2,}|[~"#%&*:<>?\\{|}]|[\/|.]$/ )

Creating Slugs from Titles?

I have everything in place to create slugs from titles, but there is one issue. My RegEx replaces spaces with hyphens. But when a user types "Hi there" (multiple spaces) the slug ends up as "Hi-----there". When really it should be "Hi-there".
Should I create the regular expression so that it only replaces a space when there is a character either side?
Or is there an easier way to do this?
I use this:
yourslug.replace(/\W+/g, '-')
This replaces all occurrences of one or more non-alphanumeric characters with a single dash.
Just match multiple whitespace characters.
s/\s+/-/g
Daniel's answer is correct.
However if somebody is looking for complete solution I like this function,
http://dense13.com/blog/2009/05/03/converting-string-to-slug-javascript/
Thanks to "dense13"!
It might be the easiest to fold repeated -s into one - as the last step:
replace /-{2,}/ by "-"
Or if you only want this to affect spaces, fold spaces instead (before the other steps, obviously)
I would replace [\s]+ with '-' and then replace [^\w-] with ''
You may want to trim the string first, to avoid leading and trailing hyphens.
function hyphenSpace(s){
s= (s.trim)? s.trim(): s.replace(/^\s+|\s+$/g,'');
return s.split(/\s+/).join('-');
}

Categories