Regex Dialect: JavaScript
I have the following capture group (('|").*?[^\\\2]\2) that selects a quoted string excluding escaped quotes.
Matches these for example...
"Felix's pet"
'Felix\'s pet'
However I would now like to remove all whitespace from a string except anything matching this pattern. Is there perhaps a way to back reference the capture group \1 and then exclude it from the matches?
I have attempted to do so with my limited RegEx knowledge, but so far it I can only select the space immediately preceding or following the pattern.
I have saved my test script on regexr for convenience if you would like to play around with my example.
Intended results:
key : string becomes key:string
dragon : "Felix's pet" becomes dragon:"Felix's pet"
"Hello World" something here "Another String"
becomes
"Hello World"somethinghere"Another String"
etc...
This is extremely hard to do with regular expressions. The following works:
result = subject.replace(/ (?=(?:(?:\\.|"(?:\\.|[^"\\])*"|[^\\'"])*'(?:\\.|"(?:\\.|[^"'\\])*"|[^\\'])*')*(?:\\.|"(?:\\.|[^"\\])*"|[^\\'])*$)(?=(?:(?:\\.|'(?:\\.|[^'\\])*'|[^\\'"])*"(?:\\.|'(?:\\.|[^'"\\])*'|[^\\"])*")*(?:\\.|'(?:\\.|[^'\\])*'|[^\\"])*$)/g, "");
I've built this answer from one of my earlier answers to a similar, but not identical question; therefore I'll refer you to it for an explanation.
You can test it live on regex101.com.
In Javascript, you can use String.replace with function as parameter. So you define matching groups and then you can replace each of them separately.
You want match all white spaces
\s+
and you need match all inside quotes
(('|")(?:[^\\]\\\2|.)*?\2)
so you combine it together
var pattern = /\s+|(('|")(?:[^\\]\\\2|.)*?\2)/g
and you write replace statement with anonymous function as parameter:
var filteredString = notFilteredString.replace(pattern,
function(match, group1) { return group1 || "" })
With each match the function is called to give replace string. The regexp match either white space or content of quote. The content of quote is wrapped as group1 and the anonymous function returns group1 if group1 is matched or nothing "" for white spaces or any other match.
Related
I'm trying to separate the words that has "[]" brackets around them, for example this line
"I hate [running] and [sweating]" how can i use regular expression just to check if a word is wrapped around "running" or "sweating" or any word.
I tried this. but it didn't work.
if(data.word === /[ ]/){}
Here is the expression needed :
/\[([^\[\]]+)\]/
so going by your example we would apply it as the following:
"I hate [running] and [sweating]".match(/\[([^\[\]]+)\]/g)
which will result in the words with the braces , same could be applied on a single word , and we could check whether it match or not to determine if it is surrounded by square brackets
The regular expression you need to test your example string and output all words enclosed by square brackets would be /\[([\w]+)\]/g
The [\w]+ part selects all words comprised of alphanumeric characters. Including \[ and \] explicitly matches the square brackets around those words. The parentheses ( and ) enclose the pattern that describes which parts should be returned.
I can highly recommend regex101.com as a handy playground for trying our regular expressions; it's the first place I went when writing the regular expression above to check for accuracy.
An example of how to implement this can be seen here.
I have a string that will look like one of these this
TEST/4_James
TEST/1003_Matt
TEST/10343_Adam
I want to split this string to get TEST and the Name after the "_", what regular expression can use to split it at "/" + any number + "_"?
Thanks
Use match, and capturing groups:
var james = "TEST/4_James";
matches = james.match(/(.*)\/.*_(.*)/);
console.log(matches[1]); // TEST
console.log(matches[2]); // James
// In order of appearance
(.*) //matches any character except newline and captures it
\/ //matches a forward slash
.*_ //matches any character except newline followed by an underscore
(.*) //matches any character except newline (what's left) and captures it
Someone mentioned this: https://regex101.com/ I use this as well and it's an awesome resource if you're learning regular expressions because it not only allows you to write and test them, but it's educational in the way it explains each piece of the regular expression and what it does.
Also it's a good idea to be a little more explicit in your expressions than .* if you can. For instance if you know that it's going to be numbers or characters, or a particular string then use a more explicit pattern. I just used this because I wasn't really sure what 'TEST' might contain in an actual scenario.
#a:{width:100px;height:100px;background-color:black;}#b:{width:100px;}
i have the above string
i want that the character: only after css selector like #a and #b get removed from this string
i thought that i must use regular expressions so i wrote one:
/[#\.A-Za-z0-9]+([:])[{]/g
see this regular expression working on regex101
but you know it matches : but when i try to remove this using replace method then whole #a:{ and #b:{ get removed
any help would be great!
The regex is almost correct. What you need to do is to repalce the with $1$2 instead of null string
Also make a small change to the regex as
/([#.A-Za-z0-9]+):({)/g
Regex Example
Changes made
([#.A-Za-z0-9]+) enclosed in brackets. The matched string is captured in $1 hence for the frist match $1 will contain #a
Within a character class its not required to escape the . as it looses it meaning in the class.
[{] to ({) The [] surrounding does not make any difference, hence drop it. Enclosed in (), hence captured in $2, for example in first match the $2 will contian {
Replace string $1$2
will give output as
#a{width:100px;height:100px;background-color:black;} #b{width:100px;}
Javascript
var value = "#a:{width:100px;height:100px;background-color:black;}#b:{width:100px;}";
alert(value.replace(/(#.):/g, "$1"));
Example: http://jsfiddle.net/7hs0jgd2/
Attempting to get all alphanumeric chars after : symbol unless a space exists, in which case the space will be the terminating mark.
// the following should all return foo
text = 'a :foo bar';
text = 's:foo';
text = ':foo, test';
Tried this, but doesn't get stuff unless there's a space. I'll probably need a regex, but not sure how that would be constructed
var t = following.substring(following.lastIndexOf(":")+1,following.lastIndexOf(' '));
How about: /:([a-zA-Z0-9]+)/
':foo, test'.match(/:([a-zA-Z0-9]+)/)[1] //returns foo
's:foo'.match(/:([a-zA-Z0-9]+)/)[1] //returns foo
':foo, test'.match(/:([a-zA-Z0-9]+)/)[1] //returns foo
These RegEx uses 2 parts:
: finds the : followed by
[a-zA-Z0-9]+ any alphanumeric character
Note:
Since this Regulars Expressions matches the types of characters I specified there is not need in this case to "break on space" since it will only match the alphanumeric characters after : it will automatically exclude space, comma, etc. any other character included within the square brackets will make that character a possible match.
I'd suggest using regular expressions, such as (albeit untested):
var result = text.match(/:([a-zA-Z0-9]*)\s/)[1];
References:
javascript regular expressions.
String.match().
You could use regex like this
/:([\w]+)[\W]/g
Given the following Regular Expression:
\b(MyString|MyString-Dash)\b
And the text:
AString
MyString
MyString-Dash
Running a match against the text never finds a match for the second thing (MyString-Dash) because the '-' (dash) character isn't a word boundary character. The following javascript always outputs "MyString,MyString" to the "matches" div (I would like to find MyString and MyString-Dash as distinct matches). How can I define a pattern that will match both MyString and MyString-Dash ?
<html>
<body>
<h1>Content</h1>
<div id="content">
AString
MyString
MyString-Dash
</div>
<br>
<h1>Matches (expecting MyString,MyString-Dash)</h1>
<div id="matches"></div>
</body>
<script>
var content = document.getElementById('content');
var matchesDiv = document.getElementById('matches');
var pattern = '\\b(MyString|MyString-Dash)\\b';
var matches = content.innerHTML.match(pattern);
matchesDiv.innerHTML = matches;
</script>
</html>
Swap the order of your matching so that the longest possible is first:
content.innerHTML.match(/\b(MyString-Dash|MyString)\b/)
I believe regular expressions match from left to right. Just tested this in Firebug, it works.
I would also change that pattern var to a regular expression literal, from '\\b(MyString-Dash|MyString)\\b' to /\b(MyString-Dash|MyString)\b/g
You want the /g in there because that will make the regular expression return all matches, rather than just the first one.
Please see this answer for how to deal with words with dashes in them and the issues related to boundaries when you have those kinds of words.
There are a couple problems with your assumptions.
Running a match against the text never finds a match for the second thing (MyString-Dash) because the '-' (dash) character isn't a word boundary character.
There's no such thing as a word boundary character. Word boundaries are the space between characters that match \w and don't match \w. - does not match '\w', so on either side of it is a "word boundary", but that won't break your match: the - is a literal dash in your regex and the \b's are far outside of it.
Second, regexen will always try to match the first thing they can in the string that matches your regex. As long as that first string in there matches, it will keep returning the first thing in there. You're asking for the first match when you ask for a match. That's the design. If you didn't want it to match MyString, don't ask for it.
Third, most regex engines prioritize 'completing a match' over length of a match. Thus, 'MyString', if it matches, will always be the first thing it returns. You'll have to wait until Perl 6 grammars for a regex engine that prioritizes length. :)
The only way for you to really do this is with two checks, one for the longer one, first, and then one for the shorter one. It will always match the first thing it finds that works. If you have a priority other than that, it's up to you to code it in as separate checks.