Google Scripts match regex

Google Scripts match regex - javascript

Using google scripts I'm trying to match a string part that always looks like this:
*YYYYMMDD;hhmm*
for example: *20170701;0900*
I defined this regex:
var regExp = ('(?:\*)(?P<date>\d+)(?:\;)(?P<time>\d+)(?:\*)','gi');
and then call it using:
var datepart = textbody.match(regExp);
However I'm not getting any match, although the same text in https://regex101.com/ works well. Any idea what I'm doing wrong?

You created a regex for PCRE engine, while in Google Apps scripts, you should use one for JavaScript.
Remove all named capturing groups (they are not supported in JS, i.e. (?P<date>\d+) => (\d+)), use a regex literal (i.e. RegExp("pattern", "gi") => /pattern/gi, but i is not necessary here, only use it if there are letters in the pattern), remove the global modifier to get a match with capturing groups intact.
var rx = /\*(\d+);(\d+)\*/;
var datepart = textbody.match(rx);
var date, time;
if (datepart) {
date = datepart[1];
time = datepart[2];
}
Note that (?:\*) = \* because a non-capturing group is still a consuming pattern (i.e. what it matches is added to the match value). Since you want to get subparts of the regex, you just need to focus on the capturing groups, those (...) parts.

Related

Regexp group not excluding dots

Let's say I have the following string: div.classOneA.classOneB#idOne
Trying to write a regexp which extracts the classes (classOneA, classOneB) from it. I was able to do this but with Lookbehind assertion only.
It looks like this:
'div.classOneA.classOneB#idOne'.match(/(?<=\.)([^.#]+)/g)
> (2) ["classOneA", "classOneB"]
Now I would like to archive this without the lookbehind approach and do not really understand why my solution's not working.
'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
> (2) [".classOneA", ".classOneB"]
Thought that the grouping will solve my problem but all matching item contains the dot as well.

There isn't a good way in Javascript to both match multiple times (/g option) and pick up capture groups (in the parens). Try this:
var input = "div.classOneA.classOneB#idOne";
var regex = /\.([^.#]+)/g;
var matches, output = [];
while (matches = regex.exec(input)) {
output.push(matches[1]);
}

This is because with g modifier you get all matching substrings but not its matching groups (that is as if (...) pairs worked just like (?:...) ones.
You see. Whithout g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/)
[ '.classOneA',
'classOneA',
index: 3,
input: 'div.classOneA.classOneB#idOne',
groups: undefined ]
With g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
[ '.classOneA', '.classOneB' ]
In other words: you obtain all matches but only the whole match (0 item) per each.
There are many solutions:
Use LookBehind assertions as you pointed out yourself.
Fix each result later adding .map(x=>x.replace(/^\./, ""))
Or, if your input structure won't be much more complicated than the example you provide, simply use a cheaper approach:
> 'div.classOneA.classOneB#idOne'.replace(/#.*/, "").split(".").slice(1)
[ 'classOneA', 'classOneB' ]
Use .replace() + callback instead of .match() in order to be able to access capture groups of every match:
const str = 'div.classOneA.classOneB#idOne';
const matches = [];
str.replace(/\.([^.#]+)/g, (...args)=>matches.push(args[1]))
console.log(matches); // [ 'classOneA', 'classOneB' ]
I would recommend the third one (if there aren't other possible inputs that could eventually break it) because it is much more efficient (actual regular expressions are used only once to trim the '#idOne' part).

If you want to expand you regex. you can simply map on results and replace . with empty string
let op = 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
.map(e=> e.replace(/\./g,''))
console.log(op)

If you know you are searching for a text containing class, then you can use something like
'div.classOneA.classOneB#idOne'.match(/class[^.#]+/g)
If the only thing you know is that the text is preceded by a dot, then you must use lookbehind.

This regex will work without lookbehind assertion:
'div.classOneA.classOneB#idOne'.match(/\.[^\.#]+/g).map(item => item.substring(1));
Lookbehind assertion is not available in JavaScript recently.

I'm not an expert on using regex - particularly in Javascript - but after some research on MDN I've figured out why your attempt wasn't working, and how to fix.
The problem is that using .match with a regexp with the /g flag will ignore capturing groups. So instead you have to use the .exec method on the regexp object, using a loop to execute it multiple times to get all the results.
So the following code is what works, and can be adapted for similar cases. (Note the grp[1] - this is because the first element of the array returned by .exec is the entire match, the groups are the subsequent elements.)
var regExp = /\.([^.#]+)/g
var result = [];
var grp;
while ((grp = regExp.exec('div.classOneA.classOneB#idOne')) !== null) {
result.push(grp[1]);
}
console.log(result)

How to write regexp for finding :smile: in javascript?

I want to write a regular expression, in JavaScript, for finding the string starting and ending with :.
For example "hello :smile: :sleeping:" from this string I need to find the strings which are starting and ending with the : characters. I tried the expression below, but it didn't work:
^:.*\:$

My guess is that you not only want to find the string, but also replace it. For that you should look at using a capture in the regexp combined with a replacement function.
const emojiPattern = /:(\w+):/g
function replaceEmojiTags(text) {
return text.replace(emojiPattern, function (tag, emotion) {
// The emotion will be the captured word between your tags,
// so either "sleep" or "sleeping" in your example
//
// In this function you would take that emotion and return
// whatever you want based on the input parameter and the
// whole tag would be replaced
//
// As an example, let's say you had a bunch of GIF images
// for the different emotions:
return '<img src="/img/emoji/' + emotion + '.gif" />';
});
}
With that code you could then run your function on any input string and replace the tags to get the HTML for the actual images in them. As in your example:
replaceEmojiTags('hello :smile: :sleeping:')
// 'hello <img src="/img/emoji/smile.gif" /> <img src="/img/emoji/sleeping.gif" />'
EDIT: To support hyphens within the emotion, as in "big-smile", the pattern needs to be changed since it is only looking for word characters. For this there is probably also a restriction such that the hyphen must join two words so that it shouldn't accept "-big-smile" or "big-smile-". For that you need to change the pattern to:
const emojiPattern = /:(\w+(-\w+)*):/g
That pattern is looking for any word that is then followed by zero or more instances of a hyphen followed by a word. It would match any of the following: "smile", "big-smile", "big-smile-bigger".

The ^ and $ are anchors (start and end respectively). These cause your regex to explicitly match an entire string which starts with : has anything between it and ends with :.
If you want to match characters within a string you can remove the anchors.
Your * indicates zero or more so you'll be matching :: as well. It'll be better to change this to + which means one or more. In fact if you're just looking for text you may want to use a range [a-z0-9] with a case insensitive modifier.
If we put it all together we'll have regex like this /:([a-z0-9]+):/gmi
match a string beginning with : with any alphanumeric character one or more times ending in : with the modifiers g globally, m multi-line and i case insensitive for things like :FacePalm:.
Using it in JavaScript we can end up with:
var mytext = 'Hello :smile: and jolly :wave:';
var matches = mytext.match(/:([a-z0-9]+):/gmi);
// matches = [':smile:', ':wave:'];
You'll have an array with each match found.

How do I make a regular expression that matches everything on a line after a given character?

If I have a String in JavaScript
key=value
How do I make a RegEx that matches key excluding =?
In other words:
var regex = //Regular Expression goes here
regex.exec("key=value")[0]//Should be "key"
How do I make a RegEx that matches value excluding =?
I am using this code to define a language for the Prism syntax highlighter so I do not control the JavaScript code doing the Regular Expression matching nor can I use split.

Well, you could do this:
/^[^=]*/ // anything not containing = at the start of a line
/[^=]*$/ // anything not containing = at the end of a line
It might be better to look into Prism's lookbehind property, and use something like this:
{
'pattern': /(=).*$/,
'lookbehind': true
}
According to the documentation this would cause the = character not to be part of the token this pattern matches.

use this regex (^.+?)=(.+?$)
group 1 contain key
group 2 contain value
but split is better solution

.*=(.*)
This will match anything after =
(.*)=.*
This will match anything before =
Look into greedy vs ungreedy quantifiers if you expect more than one = character.
Edit: as OP has clarified they're using javascript:
var str = "key=value";
var n=str.match(/(.*)=/i)[1]; // before =
var n=str.match(/=(.*)/i)[1]; // after =

var regex = /^[^=]*/;
regex.exec("key=value");

JavaScript Regex to match a URL in a field of text

How can I setup my regex to test to see if a URL is contained in a block of text in javascript. I cant quite figure out the pattern to use to accomplish this
var urlpattern = new RegExp( "(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?"
var txtfield = $('#msg').val() /*this is a textarea*/
if ( urlpattern.test(txtfield) ){
//do something about it
}
EDIT:
So the Pattern I have now works in regex testers for what I need it to do but chrome throws an error
"Invalid regular expression: /(http|ftp|https)://[w-_]+(.[w-_]+)+([w-.,#?^=%&:/~+#]*[w-#?^=%&/~+#])?/: Range out of order in character class"
for the following code:
var urlexp = new RegExp( '(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?' );

Though escaping the dash characters (which can have a special meaning as character range specifiers when inside a character class) should work, one other method for taking away their special meaning is putting them at the beginning or the end of the class definition.
In addition, \+ and \# in a character class are indeed interpreted as + and # respectively by the JavaScript engine; however, the escapes are not necessary and may confuse someone trying to interpret the regex visually.
I would recommend the following regex for your purposes:
(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
this can be specified in JavaScript either by passing it into the RegExp constructor (like you did in your example):
var urlPattern = new RegExp("(http|ftp|https)://[\w-]+(\.[\w-]+)+([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?")
or by directly specifying a regex literal, using the // quoting method:
var urlPattern = /(http|ftp|https):\/\/[\w-]+(\.[\w-]+)+([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])?/
The RegExp constructor is necessary if you accept a regex as a string (from user input or an AJAX call, for instance), and might be more readable (as it is in this case). I am fairly certain that the // quoting method is more efficient, and is at certain times more readable. Both work.
I tested your original and this modification using Chrome both on <JSFiddle> and on <RegexLib.com>, using the Client-Side regex engine (browser) and specifically selecting JavaScript. While the first one fails with the error you stated, my suggested modification succeeds. If I remove the h from the http in the source, it fails to match, as it should!
Edit
As noted by #noa in the comments, the expression above will not match local network (non-internet) servers or any other servers accessed with a single word (e.g. http://localhost/... or https://sharepoint-test-server/...). If matching this type of url is desired (which it may or may not be), the following might be more appropriate:
(http|ftp|https)://[\w-]+(\.[\w-]+)*([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?
#------changed----here-------------^
<End Edit>
Finally, an excellent resource that taught me 90% of what I know about regex is Regular-Expressions.info - I highly recommend it if you want to learn regex (both what it can do and what it can't)!

Complete Multi URL Pattern.
UPDATED: Nov. 2020, April & June 2021 (Thanks commenters)
Matches all URI or URL in a string!
Also extracts the protocol, domain, path, query and hash. ([a-z0-9-]+\:\/+)([^\/\s]+)([a-z0-9\-#\^=%&;\/~\+]*)[\?]?([^ \#\r\n]*)#?([^ \#\r\n]*)
https://regex101.com/r/jO8bC4/56
Example JS code with output - every URL is turned into a 5-part array of its 'parts' (protocol, host, path, query, and hash)
var re = /([a-z0-9-]+\:\/+)([^\/\s]+)([a-z0-9\-#\^=%&;\/~\+]*)[\?]?([^ \#\r\n]*)#?([^ \#\r\n]*)/mig;
var str = 'Bob: Hey there, have you checked https://www.facebook.com ?\n(ignore) https://github.com/justsml?tab=activity#top (ignore this too)';
var m;
while ((m = re.exec(str)) !== null) {
if (m.index === re.lastIndex) {
re.lastIndex++;
}
console.log(m);
}
Will give you the following:
["https://www.facebook.com",
"https://",
"www.facebook.com",
"",
"",
""
]
["https://github.com/justsml?tab=activity#top",
"https://",
"github.com",
"/justsml",
"tab=activity",
"top"
]

You have to escape the backslash when you are using new RegExp.
Also you can put the dash - at the end of character class to avoid escaping it.
& inside a character class means & or a or m or p or ; , you just need to put & and ; , a, m and p are already match by \w.
So, your regex becomes:
var urlexp = new RegExp( '(http|ftp|https)://[\\w-]+(\\.[\\w-]+)+([\\w-.,#?^=%&:/~+#-]*[\\w#?^=%&;/~+#-])?' );

try (http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?

I've cleaned up your regex:
var urlexp = new RegExp('(http|ftp|https)://[a-z0-9\-_]+(\.[a-z0-9\-_]+)+([a-z0-9\-\.,#\?^=%&;:/~\+#]*[a-z0-9\-#\?^=%&;/~\+#])?', 'i');
Tested and works just fine ;)

Try this general regex for many URL format
/(([A-Za-z]{3,9})://)?([-;:&=\+\$,\w]+#{1})?(([-A-Za-z0-9]+\.)+[A-Za-z]{2,3})(:\d+)?((/[-\+~%/\.\w]+)?/?([&?][-\+=&;%#\.\w]+)?(#[\w]+)?)?/g

The trouble is that the "-" in the character class (the brackets) is being parsed as a range: [a-z] means "any character between a and z." As Vini-T suggested, you need to escape the "-" characters in the character classes, using a backslash.

try this worked for me
/^((ftp|http[s]?):\/\/)?(www\.)([a-z0-9]+)\.[a-z]{2,5}(\.[a-z]{2})?$/
that is so simple and understandable

Split string in JavaScript using a regular expression

I'm trying to write a regex for use in javascript.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var match = new RegExp("'[^']*(\\.[^']*)*'").exec(script);
I would like split to contain two elements:
match[0] == "'areaog_og_group_og_consumedservice'";
match[1] == "'\x26roleOrd\x3d1'";
This regex matches correctly when testing it at gskinner.com/RegExr/ but it does not work in my Javascript. This issue can be replicated by testing ir here http://www.regextester.com/.
I need the solution to work with Internet Explorer 6 and above.
Can any regex guru's help?

Judging by your regex, it looks like you're trying to match a single-quoted string that may contain escaped quotes. The correct form of that regex is:
'[^'\\]*(?:\\.[^'\\]*)*'
(If you don't need to allow for escaped quotes, /'[^']*'/ is all you need.) You also have to set the g flag if you want to get both strings. Here's the regex in its regex-literal form:
/'[^'\\]*(?:\\.[^'\\]*)*'/g
If you use the RegExp constructor instead of a regex literal, you have to double-escape the backslashes: once for the string literal and once for the regex. You also have to pass the flags (g, i, m) as a separate parameter:
var rgx = new RegExp("'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", "g");
while (result = rgx.exec(script))
print(result[0]);

The regex you're looking for is .*?('[^']*')\s*,\s*('[^']*'). The catch here is that, as usual, match[0] is the entire matched text (this is very normal) so it's not particularly useful to you. match[1] and match[2] are the two matches you're looking for.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var parameters = /.*?('[^']*')\s*,\s*('[^']*')/.exec(script);
alert("you've done: loadArea("+parameters[1]+", "+parameters[2]+");");
The only issue I have with this is that it's somewhat inflexible. You might want to spend a little time to match function calls with 2 or 3 parameters?
EDIT
In response to you're request, here is the regex to match 1,2,3,...,n parameters. If you notice, I used a non-capturing group (the (?: ) part) to find many instances of the comma followed by the second parameter.
/.*?('[^']*')(?:\s*,\s*('[^']*'))*/

Maybe this:
'([^']*)'\s*,\s*'([^']*)'

We Keep Coding

JavaScript is the programming language of the Web.

Google Scripts match regex - javascript

Related

Regexp group not excluding dots

How to write regexp for finding :smile: in javascript?

How do I make a regular expression that matches everything on a line after a given character?

JavaScript Regex to match a URL in a field of text

Split string in JavaScript using a regular expression

Categories

Resources