Get word between two sets of characters - javascript

I'm trying to write a regular expression that takes any word between these two sets of characters:3D and &sa
examples:
3DEvb31p5vFs4_&sa : the output : Evb31p5vFs4_
3D_Ve8_LBztG50_&sa : the output : _Ve8_LBztG50_
I have used the expression: /\w[3D][A-Za-z0-9_-].*sa/g
So the next step is to skip the "3D" and "&sa"
Thanks in advance!

You can use match() with regex /3D(.*)&sa/
var a='3DEvb31p5vFs4_&sa';
var b='3D_Ve8_LBztG50_&sa' ;
document.write(a.match(/3D(.*)&sa/)[1] +'<br>');
document.write(b.match(/3D(.*)&sa/)[1]);
Explanation:
3D(.*)&sa
Debuggex Demo

Try this:
3D(?s)(.*)&sa
Explaination:
3D matches the characters 3D literally (case sensitive)
(?s) Match the remainder of the pattern with the following options:
s modifier: single line. Dot matches newline characters
1st Capturing group (.*)
.* matches any character
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
&sa matches the characters &sa literally (case sensitive)
g modifier: global. All matches (don't return on first match)

Related

Regex related confusion

I would like to extract mkghj.bmg and pp.kp from the following string using a regex used in javascript
avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp
Everything enclosed within ,,,, needs to be ignored. There could be multiple instances of ,,,, But they will always occur even number of times (non occurrence is also a possibility) in the string.
Also, avb#gh.lk has an # sign, therefore it needs to be ignored
I guess the rule I am looking for is this - if there is a dot (.) look ahead and look behind :-
If the dot is enclosed inside ,,,, then ignore it
If the dot has an # before it with no space between the dot and #, ignore it
In all other cases, capture an unbroken set of characters (until a space is encountered) on either side of the dot
I came up with this regex, but it is not helpful
[^\, ]+([^# \,]+\w+)[^\, ]+
Generally speaking that is (mind the capturing group):
not_this | neither_this_nor_this | (but_this_interesting_stuff)
For your specific example, this could be
,,,,.*?,,,,|\S+#\S+|(\S+)
You need to check for the existance of group 1, see a demo on regex101.com.
In JS this would be:
var myString = "avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp";
var myRegexp = /,,,,.*?,,,,|\S+#\S+|(\S+)/g;
match = myRegexp.exec(myString);
while (match != null) {
if (typeof(match[1]) != 'undefined') {
console.log(match[1]);
}
match = myRegexp.exec(myString);
}
Regex
^[^ ]+ ([^ ]+) ,,,,.*,,,,\s+(.*)
Description
^ asserts position at start of a line
Match a single character not present in the list below [^ ]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
matches the character literally (case sensitive)
1st Capturing Group ([^ ]+)
Match a single character not present in the list below [^ ]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
,,,, matches the characters ,,,, literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
,,,, matches the characters ,,,, literally (case sensitive)
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (.*)
.* matches any character (except for line terminators)
You can replace the string between ,,, by '' and than from remaining you can split and search for # sign and filter.
let str = `avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp`
let op = str.replace(/,,,.*?,,,,|/g,'').split(' ').filter(e=> !e.includes('#') && e );
console.log(op)

Javascript RegEx : Match repeating character exactly twice (ignore if it matches 3 times)

I am parsing strings for tokens that have 2 types of delimiter (similar to mustache templates).
I need a pure reg ex solution that matches {{bob}} in this is {{bob}} a double token. But does NOT match in this is {{{bob}}} a triple token
I am matching the double with
\{\{[^\{]([\s\S]+?)[^\}]\}\}
However, it matches the {{bob}} within the triple {{{bob}}}.
Without the negative look behind i'm struggling to find a pure regex solution. Any pointers?
You could search any whitespace char outside the brackets, something like this:
\s(\{\{([^\{\}]+?)\}\})\s
RegEx:
^(\}?)\{\{[^\{]([\s\S]+?)[^\}]\}\}
Auto Generated Explanation:
^ asserts position at start of the string
1st Capturing Group (\}?)
\}? matches the character } literally (case sensitive)
? Quantifier — Matches between zero and one times, as many times as possible, giving back as needed (greedy)
\{ matches the character { literally (case sensitive)
\{ matches the character { literally (case sensitive)
Match a single character not present in the list below [^\{]
\{ matches the character { literally (case sensitive)
2nd Capturing Group ([\s\S]+?)
Match a single character present in the list below [\s\S]+?
+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\S matches any non-whitespace character (equal to [^\r\n\t\f ])
Match a single character not present in the list below [^\}]
\} matches the character } literally (case sensitive)
\} matches the character } literally (case sensitive)
\} matches the character } literally (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
You may use the following to extract the matches:
var s = "this is {{{ bad bob triple}}} a triple token {{bob double}} {{bob}} a double token {{{bad token}} {{bad token}}}";
var rx = /(?:^|[^{]){{([^{}]*)}}(?!})/g;
var m, res=[];
while(m=rx.exec(s)) {
res.push(m[1]);
}
console.log(res);
See the regex demo here.
(?:^|[^{]) - either start of string or any char but {
{{ - double {
([^{}]*) - Group 1: any char but { and } zero or more times
}} - double }
(?!}) - not followed immediately with }.

Regarding JavaScript RegEx - Replace all punctuation including underscore

In javascript, how can I replace all punctuation (including underscore) marks with hyphen? Moreover, it should not contain more than one hyphen sequentially.
I tried "h....e l___l^^0".replace(/[^\w]/g, "-") but it gives me h----e---l___l--0
What should I do so that it returns me h-e-l-l-0 instead?
+ repeats the previous token one or more times.
> "h....e l___l^^0".replace(/[\W_]+/g, "-")
'h-e-l-l-0'
[\W_]+ matches non-word characters or _ one or more times.
All you need to do is to add an quatifier + to regex
"h....e l___l^^0".replace(/[^a-zA-Z0-9]+/g, "-")
NOTE
instead of [^\w] give [^a-zA-Z0-9]+ because \w contains _ hence it wont be replaced if you give [^\w]
Regex101
[!"#$%&'()*+,\-.\/:;<=>?#[\\\]^_`{|}~ ]+
Description
[!"#$%&'()*+,\-.\/:;<=>?#[\\\]^_`{|}~ ]+ match a single character present in the list below
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
!"#$%&'()*+, a single character in the list !"#$%&'()*+, literally (case sensitive)
\- matches the character - literally
. the literal character .
\/ matches the character / literally
:;<=>?#[ a single character in the list :;<=>?#[ literally (case sensitive)
\\ matches the character \ literally
\] matches the character ] literally
^_`{|}~ a single character in the list ^_`{|}~ literally
g modifier: global. All matches (don't return on first match)
JS
alert("h....e l___l^^0".replace(/[!"#$%&'()*+,\-.\/ :;<=>?#[\\\]^_`{|}~]+/g, "-"));
Result:
h-e-l-l-0

javascript - url regexp not proper

I am tryint to validate URL with js.
function validateURL(url) {
var urlregex = new RegExp("^(http:\/\/www.|https:\/\/www.|ftp:\/\/www.|www.|https:\/\/|http:\/\/){1}([0-9A-Za-z]+\.)");
return urlregex.test(url);
}
but but i want that google.com will also pass, but it is not passing thru this regexp.
what is wrong with regexp?
I want these urls to pass thru regexp:
http://www.google.com
http://google.com
https://www.google.com
https://google.com
google.com
www.google.com
Try this instead:
function validateURL(url) {
var urlregex = new RegExp("^((?:https?:\/\/)?(?:www\.)?[0-9A-Za-z]+\.[0-9A-Za-z]+)$");
return urlregex.test(url);
}
´
DEMO
http://regex101.com/r/rG8wP9
OR:
function validateURL(url) {
if (/^((?:https?:\/\/)?(?:www\.)?[0-9A-Za-z]+\.[0-9A-Za-z]+)$/im.test(url)) {
return true;
} else {
return false;
}
}
EXPLANATION:
^ assert position at start of a line
1st Capturing group ((?:https?:\/\/)?(?:www\.)?[0-9A-Za-z]+\.[0-9A-Za-z]+)
(?:https?:\/\/)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
http matches the characters http literally (case sensitive)
s? matches the character s literally (case sensitive)
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
: matches the character : literally
\/ matches the character / literally
\/ matches the character / literally
(?:www\.)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
www matches the characters www literally (case sensitive)
\. matches the character . literally
[0-9A-Za-z]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
\. matches the character . literally
[0-9A-Za-z]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
$ assert position at end of a line
g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
what is wrong with regexp?
This part of regexp:
([0-9A-Za-z]+\.)
can't be matched with "google.com" in your example. But it matches with wrong domain addresses like google. Something like this:
(?:[0-9A-Za-z]+\.)+[A-Za-z]{2,}
is more correct and flexible.
Several things not working in this regexp. For instance nothing happen after the period at the end. Also never need {1} as every expression matched once by default. Main problem is the | because the branches are not inside () so after each | the match restart completely
This is your regexp cleaned up. For instance pulling things into groups, see the | inside ().
isMatch = (/^(?:(?:https?|ftp):\/\/)?(?:www\.)?[0-9a-z]+\.[a-z]+/im.test(url))

Regex - Not containing a string and not ending with a /

How do I create a regular expression which don't contain the string "umbraco" and doesn't end with a /
This is the what I have so far but I'm unable to get it fully working, any help would be appreciated.
(?!umbraco)(?![/]$)
Test strings would be:
http://www.domain.com/umbraco/login.aspx - shouldn't match
http://www.domain.com/pages/1/ - shouldn't match
http://www.domain.com/pages/1 - should match
It should be this regex:
^(?!.*?umbraco).*?[^\/]$
Online Demo: http://regex101.com/r/lM0cS9
Explanation:
^ assert position at start of a line
(?!.*?umbraco) Negative Lookahead - Assert that it is impossible to match the regex below
.*? matches any character (except newline)
Quantifier: Between zero and unlimited times, as few times as possible, expanding as needed
umbraco matches the characters umbraco literally (case sensitive)
.*? matches any character (except newline)
Quantifier: Between zero and unlimited times, as few times as possible, expanding as needed
[^\/] match a single character not present in the list below
\/ matches the character / literally
$ assert position at end of a line
This should be the regex
^(?!.*?umbraco).*?[^\/\s*\n*]$
demo http://rubular.com/r/tEhY7JFjXK

Categories