Strange behavior of regexp in JavaScript - javascript

I wrote a simple JavaScript function to split a file name into parts: given a file name of the type 'image01.png' it splits it into 'image', '01', 'png'.
For this I use the following regular expression:
var reg = /(\D+)(\d+).(\S+)$/;
This works.
However, I would like to be able to split also something like this: day12Image01.png into 'day12Image', '01', 'png'. Generally, I would like to have any number of additional digits associated to the body as long as they do not fall right before the extension.
I tried with:
var reg = /(.+)(\d+).(\S+)$/;
or the alternative:
var reg = /(\S+)(\d+).(\S+)$/;
Confusingly (to me), if I apply those regular expressions to 'image01.png' I get following decomposition: 'image0', '1', 'png'.
Why is the '0' being assigned to the body instead of the numerical index in these cases?
Thanks for any feedback.

Try to use non-greedy regular expression /(\S+?)(\d+).(\S+)$/. As far as I know this should work for javascript.

Here is one possible regular expression that should work fine:
/^(.+?)(\d+)\.(\S+)$/
Note, you should escape a dot . character, since otherwise the regex will consider it as 'any character' (so called "Special dot").

By default, capture groups are greedy, they will capture as much as they can, and since + means one OR more, it can just match the last digit and leave the first to the . or the \S. Make them un-greedy with ?:
var reg = /(.+?)(\d+).(\S+)$/;
Or
var reg = /(\S+?)(\d+).(\S+)$/;

Related

regex encapsulation

I've got a question concerning regex.
I was wondering how one could replace an encapsulated text, something like {key:23} to something like <span class="highlightable">23</span, so that the entity will still remain encapsulated, but with something else.
I will do this in JS, but the regex is what is important, I have been searching for a while, probably searching for the wrong terms, I should probably learn more about regex, generally.
In any case, is there someone who knows how to perform this operation with simplicity?
Thanks!
It's important that you find {key:23} in your text first, and then replace it with your wanted syntax, this way you avoid replacing {key:'sometext'} with that syntax which is unwanted.
var str = "some random text {key:23} some random text {key:name}";
var n = str.replace(/\{key:[\d]+\}/gi, function myFunction(x){return x.replace(/\{key:/,'<span>').replace(/\}/, '</span>');});
this way only {key:AnyNumber} gets replaced, and {key:AnyThingOtherThanNumbers} don't get touched.
It seems you are new to regex. You need to learn more about character classes and capturing groups and backreferences.
The regex is somewhat basic in your case if you do not need any nested encapsulated text support.
Let's start:
The beginning is {key: - it will match the substring literally. Note that { can be a special character (denoting start of a limiting quantifier), thus, it is a good idea to escape it: {key:.
([^}]+) - This is a bit more interesting: the round brackets around are a capturing group that let us later back-reference the matched text. The [^}]+ means 1 or more characters (due to +) other than } (as [^}] is a negated character class where ^ means not)
} matches a } literally.
In the replacement string, we'll get the captured text using a backreference $1.
So, the entire regex will look like:
{key:([^}]+)}
See demo on regex101.com
Code snippet:
var re = /{key:([^}]+)}/g;
var str = '{key:23}';
var subst = '<span class="highlightable">$1</span>';
document.getElementById("res").innerHTML = str.replace(re, subst);
.highlightable
{
color: red;
}
<div id="res"/>
If you want to use a different behavior based on the value of key, then you'll need to adjust the regex to either match digits only (with \d+) or letters only (say, with [a-zA-Z] for English), or other shorthand classes, ranges (= character classes), or their combinations.
If your string is in var a, then:
var test = a.replace( /\{key:(\d+)\}/g, "<span class='highlightable'>$1</span>");

javascript regex to require at least one special character

I've seen plenty of regex examples that will not allow any special characters. I need one that requires at least one special character.
I'm looking at a C# regex
var regexItem = new Regex("^[a-zA-Z0-9 ]*$");
Can this be converted to use with javascript? Do I need to escape any of the characters?
Based an example I have built this so far:
var regex = "^[a-zA-Z0-9 ]*$";
//Must have one special character
if (regex.exec(resetPassword)) {
isValid = false;
$('#vsResetPassword').append('Password must contain at least 1 special character.');
}
Can someone please identify my error, or guide me down a more efficient path? The error I'm currently getting is that regex has no 'exec' method
Your problem is that "^[a-zA-Z0-9 ]*$" is a string, and you need a regex:
var regex = /^[a-zA-Z0-9 ]*$/; // one way
var regex = new RegExp("^[a-zA-Z0-9 ]*$"); // another way
[more information]
Other than that, your code looks fine.
In javascript, regexs are formatted like this:
/^[a-zA-Z0-9 ]*$/
Note that there are no quotation marks and instead you use forward slashes at the beginning and end.
In javascript, you can create a regular expression object two ways.
1) You can use the constructor method with the RegExp object (note the different spelling than what you were using):
var regexItem = new RegExp("^[a-zA-Z0-9 ]*$");
2) You can use the literal syntax built into the language:
var regexItem = /^[a-zA-Z0-9 ]*$/;
The advantage of the second is that you only have to escape a forward slash, you don't have to worry about quotes. The advantage of the first is that you can programmatically construct a string from various parts and then pass it to the RegExp constructor.
Further, the optional flags for the regular expression are passed like this in the two forms:
var regexItem = new RegExp("^[A-Z0-9 ]*$", "i");
var regexItem = /^[A-Z0-9 ]*$/i;
In javascript, it seems to be a more common convention to the user /regex/ method that is built into the parser unless you are dynamically constructing a string or the flags.

Javascript regex expression to replace multiple strings?

I've a string done like this: "http://something.org/dom/My_happy_dog_%28is%29cool!"
How can I remove all the initial domain, the multiple underscore and the percentage stuff?
For now I'm just doing some multiple replace, like
str = str.replace("http://something.org/dom/","");
str = str.replace("_%28"," ");
and go on, but it's really ugly.. any help?
Thanks!
EDIT:
the exact input would be "My happy dog is cool!" so I would like to get rid of the initial address and remove the underscores and percentage and put the spaces in the right place!
The problem is that trying to put a regex on Chrome "something goes wrong". Is it a problem of Chrome or my regex?
I'd suggest:
var str = "http://something.org/dom/My_happy_dog_%28is%29cool!";
str.substring(str.lastIndexOf('/')+1).replace(/(_)|(%\d{2,})/g,' ');
JS Fiddle demo.
The reason I took this approach is that RegEx is fairly expensive, and is often tricky to fine tune to the point where edge-cases become less troublesome; so I opted to use simple string manipulation to reduce the RegEx work.
Effectively the above creates a substring of the given str variable, from the index point of the lastIndexOf('/') (which does exactly what you'd expect) and adding 1 to that so the substring is from the point after the / not before it.
The regex: (_) matches the underscores, the | just serves as an or operator and the (%\d{2,}) serves to match digit characters that occur twice in succession and follow a % sign.
The parentheses surrounding each part of the regex around the |, serve to identify matching groups, which are used to identify what parts should be replaced by the ' ' (single-space) string in the second of the arguments passed to replace().
References:
lastIndexOf().
replace().
substring().
You can use unescape to decode the percentages:
str = unescape("http://something.org/dom/My_happy_dog_%28is%29cool!")
str = str.replace("http://something.org/dom/","");
Maybe you could use a regular expression to pull out what you need, rather than getting rid of what you don't want. What is it you are trying to keep?
You can also chain them together as in:
str.replace("http://something.org/dom/", "").replace("something else", "");
You haven't defined the problem very exactly. To get rid of all stretches of characters ending in %<digit><digit> you'd say
var re = /.*%\d\d/g;
var str = str.replace(re, "");
ok, if you want to replace all that stuff I think that you would need something like this:
/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g
test
var string = "http://something.org/dom/My_happy_dog_%28is%29cool!";
string = string.replace(/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g,"");

JavaScript regex match characters inside quotes and not in character set

I have a string I would like to split using #, ., [], or {} characters, as in CSS. The desired functionality is:
- Input:
"div#foo[bar='value'].baz{text}"
- Output:
["div", "#foo", "[bar='value'", ".baz", "{text"]
This is easy enough, with this RegEx:
input.match(/([#.\[{]|^.*?)[^#.\[{\]}]*/g)
However, this doesn't ignore syntax characters inside quotes, as I would like it too. (e.x. "div[bar='value.baz']" should ignore the .)
How can I make the second part of my RegEx (the [^#.\[{\]}]* portion) capture not only the negated character set, but also any character within quotes. In other words, how can I implement the RegEx, (\"|').+?\1 into my current one.
Edit:
I've figured out a regex that works decent, but can't handle escaped-quotes inside quotes (for example: "stuff here \\" quote "). If someone knows how to do that, it would be extremely helpful:
str.match(/([#.\[{]|^.*?)((['"]).*?\3|[^.#\[\]{\}])*/g);
var str = "div#foo[bar='value.baz'].baz{text}";
str.match(/(^|[\.#[\]{}])(([^'\.#[\]{}]+)('[^']*')?)+/g)
// [ 'div', '#foo', '[bar=\'value.baz\'', '.baz', '{text' ]
var tokens = myCssString.match(/\/\*[\s\S]*?\*\/|"(?:[^"\\]|\\[\s\S]*)"|'(?:[^'\\]|\\[\s\S])*'|[\{\}:;\(\)\[\]./#]|\s+|[^\s\{\}:;\(\)\[\]./'"#]+/g);
Given your string, it produces
div
#
foo
[
bar=
'value.foo'
]
.
baz
{
text
}
The RegExp above is loosely based on the CSS 2.1 lexical grammar
Firstly, and i can't stress this enough: you shouldn't use regexps to parse css, you should use a real parser, for instance http://glazman.org/JSCSSP/ or similar - many have built them, no need for you to reinvent the wheel.
that said, to solve your current problem do this:
var str = "div#foo[bar='value.foo'].baz{text}";
str.match(/([#.\[{]|^.*?)(?:[^#\[{\]}]*|\.*)/g);
//["div", "#foo", "[bar='value.foo'", ".baz", "{text"]

Split string in JavaScript using a regular expression

I'm trying to write a regex for use in javascript.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var match = new RegExp("'[^']*(\\.[^']*)*'").exec(script);
I would like split to contain two elements:
match[0] == "'areaog_og_group_og_consumedservice'";
match[1] == "'\x26roleOrd\x3d1'";
This regex matches correctly when testing it at gskinner.com/RegExr/ but it does not work in my Javascript. This issue can be replicated by testing ir here http://www.regextester.com/.
I need the solution to work with Internet Explorer 6 and above.
Can any regex guru's help?
Judging by your regex, it looks like you're trying to match a single-quoted string that may contain escaped quotes. The correct form of that regex is:
'[^'\\]*(?:\\.[^'\\]*)*'
(If you don't need to allow for escaped quotes, /'[^']*'/ is all you need.) You also have to set the g flag if you want to get both strings. Here's the regex in its regex-literal form:
/'[^'\\]*(?:\\.[^'\\]*)*'/g
If you use the RegExp constructor instead of a regex literal, you have to double-escape the backslashes: once for the string literal and once for the regex. You also have to pass the flags (g, i, m) as a separate parameter:
var rgx = new RegExp("'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", "g");
while (result = rgx.exec(script))
print(result[0]);
The regex you're looking for is .*?('[^']*')\s*,\s*('[^']*'). The catch here is that, as usual, match[0] is the entire matched text (this is very normal) so it's not particularly useful to you. match[1] and match[2] are the two matches you're looking for.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var parameters = /.*?('[^']*')\s*,\s*('[^']*')/.exec(script);
alert("you've done: loadArea("+parameters[1]+", "+parameters[2]+");");
The only issue I have with this is that it's somewhat inflexible. You might want to spend a little time to match function calls with 2 or 3 parameters?
EDIT
In response to you're request, here is the regex to match 1,2,3,...,n parameters. If you notice, I used a non-capturing group (the (?: ) part) to find many instances of the comma followed by the second parameter.
/.*?('[^']*')(?:\s*,\s*('[^']*'))*/
Maybe this:
'([^']*)'\s*,\s*'([^']*)'

Categories