JavaScript regex match characters inside quotes and not in character set - javascript

I have a string I would like to split using #, ., [], or {} characters, as in CSS. The desired functionality is:
- Input:
"div#foo[bar='value'].baz{text}"
- Output:
["div", "#foo", "[bar='value'", ".baz", "{text"]
This is easy enough, with this RegEx:
input.match(/([#.\[{]|^.*?)[^#.\[{\]}]*/g)
However, this doesn't ignore syntax characters inside quotes, as I would like it too. (e.x. "div[bar='value.baz']" should ignore the .)
How can I make the second part of my RegEx (the [^#.\[{\]}]* portion) capture not only the negated character set, but also any character within quotes. In other words, how can I implement the RegEx, (\"|').+?\1 into my current one.
Edit:
I've figured out a regex that works decent, but can't handle escaped-quotes inside quotes (for example: "stuff here \\" quote "). If someone knows how to do that, it would be extremely helpful:
str.match(/([#.\[{]|^.*?)((['"]).*?\3|[^.#\[\]{\}])*/g);

var str = "div#foo[bar='value.baz'].baz{text}";
str.match(/(^|[\.#[\]{}])(([^'\.#[\]{}]+)('[^']*')?)+/g)
// [ 'div', '#foo', '[bar=\'value.baz\'', '.baz', '{text' ]

var tokens = myCssString.match(/\/\*[\s\S]*?\*\/|"(?:[^"\\]|\\[\s\S]*)"|'(?:[^'\\]|\\[\s\S])*'|[\{\}:;\(\)\[\]./#]|\s+|[^\s\{\}:;\(\)\[\]./'"#]+/g);
Given your string, it produces
div
#
foo
[
bar=
'value.foo'
]
.
baz
{
text
}
The RegExp above is loosely based on the CSS 2.1 lexical grammar

Firstly, and i can't stress this enough: you shouldn't use regexps to parse css, you should use a real parser, for instance http://glazman.org/JSCSSP/ or similar - many have built them, no need for you to reinvent the wheel.
that said, to solve your current problem do this:
var str = "div#foo[bar='value.foo'].baz{text}";
str.match(/([#.\[{]|^.*?)(?:[^#\[{\]}]*|\.*)/g);
//["div", "#foo", "[bar='value.foo'", ".baz", "{text"]

Related

regex encapsulation

I've got a question concerning regex.
I was wondering how one could replace an encapsulated text, something like {key:23} to something like <span class="highlightable">23</span, so that the entity will still remain encapsulated, but with something else.
I will do this in JS, but the regex is what is important, I have been searching for a while, probably searching for the wrong terms, I should probably learn more about regex, generally.
In any case, is there someone who knows how to perform this operation with simplicity?
Thanks!
It's important that you find {key:23} in your text first, and then replace it with your wanted syntax, this way you avoid replacing {key:'sometext'} with that syntax which is unwanted.
var str = "some random text {key:23} some random text {key:name}";
var n = str.replace(/\{key:[\d]+\}/gi, function myFunction(x){return x.replace(/\{key:/,'<span>').replace(/\}/, '</span>');});
this way only {key:AnyNumber} gets replaced, and {key:AnyThingOtherThanNumbers} don't get touched.
It seems you are new to regex. You need to learn more about character classes and capturing groups and backreferences.
The regex is somewhat basic in your case if you do not need any nested encapsulated text support.
Let's start:
The beginning is {key: - it will match the substring literally. Note that { can be a special character (denoting start of a limiting quantifier), thus, it is a good idea to escape it: {key:.
([^}]+) - This is a bit more interesting: the round brackets around are a capturing group that let us later back-reference the matched text. The [^}]+ means 1 or more characters (due to +) other than } (as [^}] is a negated character class where ^ means not)
} matches a } literally.
In the replacement string, we'll get the captured text using a backreference $1.
So, the entire regex will look like:
{key:([^}]+)}
See demo on regex101.com
Code snippet:
var re = /{key:([^}]+)}/g;
var str = '{key:23}';
var subst = '<span class="highlightable">$1</span>';
document.getElementById("res").innerHTML = str.replace(re, subst);
.highlightable
{
color: red;
}
<div id="res"/>
If you want to use a different behavior based on the value of key, then you'll need to adjust the regex to either match digits only (with \d+) or letters only (say, with [a-zA-Z] for English), or other shorthand classes, ranges (= character classes), or their combinations.
If your string is in var a, then:
var test = a.replace( /\{key:(\d+)\}/g, "<span class='highlightable'>$1</span>");

match everything between brackets

I need to match the text between two brackets. many post are made about it but non are supported by JavaScript because they all use the lookbehind.
the text is as followed
"{Code} - {Description}"
I need Code and Description to be matched with out the brackets
the closest I have gotten is this
/{([\s\S]*?)(?=})/g
leaving me with "{Code" and "{Description" and I followed it with
doing a substring.
so... is there a way to do a lookbehind type of functionality in Javascript?
You could simply try the below regex,
[^}{]+(?=})
Code:
> "{Code} - {Description}".match(/[^}{}]+(?=})/g)
[ 'Code', 'Description' ]
Use it as:
input = '{Code} - {Description}';
matches = [], re = /{([\s\S]*?)(?=})/g;
while (match = re.exec(input)) matches.push(match[1]);
console.log(matches);
["Code", "Description"]
Actually, in this particular case, the solution is quite easy:
s = "{Code} - {Description}"
result = s.match(/[^{}]+(?=})/g) // ["Code", "Description"]
Have you tried something like this, which doesn't need a lookahead or lookbehind:
{([^}]*)}
You would probably need to add the global flag, but it seems to work in the regex tester.
The real problem is that you need to specify what you want to capture, which you do with capture groups in regular expressions. The part of the matched regular expression inside of parentheses will be the value returned by that capture group. So in order to omit { and } from the results, you just don't include those inside of the parentheses. It is still necessary to match them in your regular expression, however.
You can see how to get the value of capture groups in JavaScript here.

Strange behavior of regexp in JavaScript

I wrote a simple JavaScript function to split a file name into parts: given a file name of the type 'image01.png' it splits it into 'image', '01', 'png'.
For this I use the following regular expression:
var reg = /(\D+)(\d+).(\S+)$/;
This works.
However, I would like to be able to split also something like this: day12Image01.png into 'day12Image', '01', 'png'. Generally, I would like to have any number of additional digits associated to the body as long as they do not fall right before the extension.
I tried with:
var reg = /(.+)(\d+).(\S+)$/;
or the alternative:
var reg = /(\S+)(\d+).(\S+)$/;
Confusingly (to me), if I apply those regular expressions to 'image01.png' I get following decomposition: 'image0', '1', 'png'.
Why is the '0' being assigned to the body instead of the numerical index in these cases?
Thanks for any feedback.
Try to use non-greedy regular expression /(\S+?)(\d+).(\S+)$/. As far as I know this should work for javascript.
Here is one possible regular expression that should work fine:
/^(.+?)(\d+)\.(\S+)$/
Note, you should escape a dot . character, since otherwise the regex will consider it as 'any character' (so called "Special dot").
By default, capture groups are greedy, they will capture as much as they can, and since + means one OR more, it can just match the last digit and leave the first to the . or the \S. Make them un-greedy with ?:
var reg = /(.+?)(\d+).(\S+)$/;
Or
var reg = /(\S+?)(\d+).(\S+)$/;

Javascript regex expression to replace multiple strings?

I've a string done like this: "http://something.org/dom/My_happy_dog_%28is%29cool!"
How can I remove all the initial domain, the multiple underscore and the percentage stuff?
For now I'm just doing some multiple replace, like
str = str.replace("http://something.org/dom/","");
str = str.replace("_%28"," ");
and go on, but it's really ugly.. any help?
Thanks!
EDIT:
the exact input would be "My happy dog is cool!" so I would like to get rid of the initial address and remove the underscores and percentage and put the spaces in the right place!
The problem is that trying to put a regex on Chrome "something goes wrong". Is it a problem of Chrome or my regex?
I'd suggest:
var str = "http://something.org/dom/My_happy_dog_%28is%29cool!";
str.substring(str.lastIndexOf('/')+1).replace(/(_)|(%\d{2,})/g,' ');
JS Fiddle demo.
The reason I took this approach is that RegEx is fairly expensive, and is often tricky to fine tune to the point where edge-cases become less troublesome; so I opted to use simple string manipulation to reduce the RegEx work.
Effectively the above creates a substring of the given str variable, from the index point of the lastIndexOf('/') (which does exactly what you'd expect) and adding 1 to that so the substring is from the point after the / not before it.
The regex: (_) matches the underscores, the | just serves as an or operator and the (%\d{2,}) serves to match digit characters that occur twice in succession and follow a % sign.
The parentheses surrounding each part of the regex around the |, serve to identify matching groups, which are used to identify what parts should be replaced by the ' ' (single-space) string in the second of the arguments passed to replace().
References:
lastIndexOf().
replace().
substring().
You can use unescape to decode the percentages:
str = unescape("http://something.org/dom/My_happy_dog_%28is%29cool!")
str = str.replace("http://something.org/dom/","");
Maybe you could use a regular expression to pull out what you need, rather than getting rid of what you don't want. What is it you are trying to keep?
You can also chain them together as in:
str.replace("http://something.org/dom/", "").replace("something else", "");
You haven't defined the problem very exactly. To get rid of all stretches of characters ending in %<digit><digit> you'd say
var re = /.*%\d\d/g;
var str = str.replace(re, "");
ok, if you want to replace all that stuff I think that you would need something like this:
/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g
test
var string = "http://something.org/dom/My_happy_dog_%28is%29cool!";
string = string.replace(/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g,"");

Split string in JavaScript using a regular expression

I'm trying to write a regex for use in javascript.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var match = new RegExp("'[^']*(\\.[^']*)*'").exec(script);
I would like split to contain two elements:
match[0] == "'areaog_og_group_og_consumedservice'";
match[1] == "'\x26roleOrd\x3d1'";
This regex matches correctly when testing it at gskinner.com/RegExr/ but it does not work in my Javascript. This issue can be replicated by testing ir here http://www.regextester.com/.
I need the solution to work with Internet Explorer 6 and above.
Can any regex guru's help?
Judging by your regex, it looks like you're trying to match a single-quoted string that may contain escaped quotes. The correct form of that regex is:
'[^'\\]*(?:\\.[^'\\]*)*'
(If you don't need to allow for escaped quotes, /'[^']*'/ is all you need.) You also have to set the g flag if you want to get both strings. Here's the regex in its regex-literal form:
/'[^'\\]*(?:\\.[^'\\]*)*'/g
If you use the RegExp constructor instead of a regex literal, you have to double-escape the backslashes: once for the string literal and once for the regex. You also have to pass the flags (g, i, m) as a separate parameter:
var rgx = new RegExp("'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", "g");
while (result = rgx.exec(script))
print(result[0]);
The regex you're looking for is .*?('[^']*')\s*,\s*('[^']*'). The catch here is that, as usual, match[0] is the entire matched text (this is very normal) so it's not particularly useful to you. match[1] and match[2] are the two matches you're looking for.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var parameters = /.*?('[^']*')\s*,\s*('[^']*')/.exec(script);
alert("you've done: loadArea("+parameters[1]+", "+parameters[2]+");");
The only issue I have with this is that it's somewhat inflexible. You might want to spend a little time to match function calls with 2 or 3 parameters?
EDIT
In response to you're request, here is the regex to match 1,2,3,...,n parameters. If you notice, I used a non-capturing group (the (?: ) part) to find many instances of the comma followed by the second parameter.
/.*?('[^']*')(?:\s*,\s*('[^']*'))*/
Maybe this:
'([^']*)'\s*,\s*'([^']*)'

Categories