regexp to quote only string matches (not numbers) - javascript

I'm struggling with string:
"some text [2string] some another[test] and another [4]";
trying to quote every value but number within [], so it could be converted into
"some text ['2string'] some another['test'] and another [4]"
Thanks.

You need a regex that
matches content between [], i. e. a [, any number of characters except ], then a ]
asserts that there is at least one other character besides digits here.
You can solve this using character classes and negative lookahead assertions:
result = subject.replace(/\[(?!\d+\])([^\]]*)\]/g, "['$1']");
Explanation:
\[ # Match [
(?! # Assert that it's impossible to match...
\d+ # one or more digits
\] # followed by ]
) # End of lookahead assertion
( # Match and capture in group number 1:
[^\]]* # any number of characters except ]
) # End of capturing group
\] # Match ]

A longer, but IMO cleaner approach, if performance is not a big concern:
var string = "some text [2string] some another[test] and another [4]";
var output = string.replace(/(\[)(.*?)(\])/g, function(match, a, b, c) {
if(/^\d+$/.test(b)) {
return match;
} else {
return a + "'" + b + "'" + c;
}
});
console.log(output);
You basically match every expression inside square brackets, then test to see if it's a number. If it is, return the string as-it-is, otherwise insert quotes at the specific places.
Output:
some text ['2string'] some another['test'] and another [4]

I'd try something like \[(\d*?[a-z]\w*?)]. This should match any [...] as long as there's at least one letter inside. If underscores (_) aren't valid, replace the \w at the end with [a-z].
\[ is just a simple match for [, it has to be escaped due to the special meaning of [.
\d*? will match any amount of digits (or none), but as few as possible to fulfill the match.
[a-z] will match any character within the given range.
\w*? will match any "word" (alphanumeric) characters (letters, digits, and underscores), again as few as possible to fulfill the match.
] is another simple match, this one doesn't have to be escaped, as it's not misleading (no open [ at this level). It can be escaped, but this is usually a style preference (depends on the actual regex engine).

You can replace it with this regex
input.replace(/(?!\d+\])(\w+)(?=\])/g, "'$1'");

another solution that add a simple regex to your attempt:
str.split('[').join("['").split(']').join("']").replace(/\['(\d+)'\]/, "[$1]");

Related

Trying to Invalidate #Mentions using regex [duplicate]

I have a text like this;
[Some Text][1][Some Text][2][Some Text][3][Some Text][4]
I want to match [Some Text][2] with this regex;
/\[.*?\]\[2\]/
But it returns [Some Text][1][Some Text][2]
How can i match only [Some Text][2]?
Note : There can be any character in Some Text including [ and ] And the numbers in square brackets can be any number not only 1 and 2. The Some Text that i want to match can be at the beginning of the line and there can be multiple Some Texts
JSFiddle
The \[.*?\]\[2\] pattern works like this:
\[ - finds the leftmost [ (as the regex engine processes the string input from left to right)
.*? - matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below
\]\[2\] - ][2] substring.
So, the .*? gets expanded upon each failure until it finds the leftmost ][2]. Note the lazy quantifiers do not guarantee the "shortest" matches.
Solution
Instead of a .*? (or .*) use negated character classes that match any char but the boundary char.
\[[^\]\[]*\]\[2\]
See this regex demo.
Here, .*? is replaced with [^\]\[]* - 0 or more chars other than ] and [.
Other examples:
Strings between angle brackets: <[^<>]*> matches <...> with no < and > inside
Strings between parentheses: \([^()]*\) matches (...) with no ( and ) inside
Strings between double quotation marks: "[^"]*" matches "..." with no " inside
Strings between curly braces: \{[^{}]*} matches "..." with no " inside
In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?. To match abc 1 def in abc 0 abc 1 def, use abc(?:(?!abc).)*?def.
You could try the below regex,
(?!^)(\[[A-Z].*?\]\[\d+\])
DEMO

What is the regex to match alphanumeric 6 character words, separated by space or comma

I am newbie in RegEx and trying to design a RegEx which could match the String like below:
pattern 1 separated by comma and a space: KEPM39, JEMGH5, HEPM21 ... (repeat)
pattern 2 separated only by a space: KEPM39 JEMGH5 HEPM21 ... (repeat)
pattern 3 separated only by a comma: KEPM39,JEMGH5,HEPM21 ... (repeat)
this is my concept: "^[a-zA-Z0-9]{6,}[,\s]+$" but it seems wrong.
#I want to validate the whole string, and I use javascript & html to validate user input. (textarea)
#duplicate change to repeat to be more suitable.
function validate(){
var term = "JEPM34, KEPM11 ";
var re = new RegExp("^[a-zA-Z0-9]{6,}[,\s]+$");
if (re.test(term)) {
return true
} else {
return false
}
}
thanks you in advance!
A very loose way to validate could be:
^[A-Z\d]{6}(?:[ ,]+[A-Z\d]{6})*$
See the online demo. With loose, I meant that [ ,]+ is not checking that each delimiter in your string is the same per definition. Therefor even "KEPM39, JEMGH5 HEPM21, HEGD44 ZZZZZZ" would be valid.
If you want consistent delimiters, and there can be trailing spaces (as there is in your example data) you can use a capture group with a backreference \1 to keep consistent delimiters and match optional spaces at the end.
Note that you can also use \s but that could also match a newline.
Using test will return a boolean, so you don't have to use return true or false but you can return the result test`
^[A-Z\d]{6}(?:(, ?| )(?:[A-Z\d]{6}\1)*[A-Z\d]{6} *)?$
The pattern matches:
^ Start of string
[A-Z\d]{6} Match 6 occurrences of a char A-Z or a digit
(?: Non capture group to match as a whole
(, ?| ) Capture group 1, match either a comma and optional space, or a space to be used as a backreference
(?:[A-Z\d]{6}\1)* Optionally repeat any of the listed followed by a backreference \1 to group 1 which will match the same delimiter
[A-Z\d]{6} * Match any of the listed and optional spaces at the end
)? Close the group and make it optional to also match an instance without delimiters
$ End of string
Regex demo
const regex = /^[A-Z\d]{6}(?:(, ?| )(?:[A-Z\d]{6}\1)*[A-Z\d]{6} *)?$/;
const validate = term => regex.test(term);
[
"KEPM39, JEMGH5, HEPM21",
"KEPM39 JEMGH5 HEPM21",
"KEPM39,JEMGH5,HEPM21",
"JEPM34, KEPM11 ",
"JEPM34, KEPM11",
"JEPM34",
"KEPM39, JEMGH5 HEPM21, HEGD44 ZZZZZZ",
"KEPM39, JEMGH5 HEPM21"
].forEach(s =>
console.log(`${s} ==> ${validate(s)}`)
);

Regex to match # followed by square brackets containing a number

I want to parse a pattern similar to this using javascript:
#[10] or #[15]
With all my efforts, I came up with this:
#\\[(.*?)\\]
This pattern works fine but the problem is it matches anything b/w those square brackets. I want it to match only numbers. I tried these too:
#\\[(0-9)+\\]
and
#\\[([(0-9)+])\\]
But these match nothing.
Also, I want to match only pattern which are complete words and not part of a word in the string. i.e. should contain spaces both side if its not starting or ending the script. That means it should not match phrase like this:
abxdcs#[13]fsfs
Thanks in advance.
Use the regex:
/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
It will match if the pattern (#[number]) is not a part of a word. Should contain spaces both sides if its not starting or ending the string.
It uses groups, so if need the digits, use the group 1.
Testing code (click here for demo):
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("#[10]")); // true
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("#[15]")); // true
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("abxdcs#[13]fsfs")); // false
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("abxdcs #[13] fsfs")); // true
var r1 = /(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
var match = r1.exec("#[10]");
console.log(match[1]); // 10
var r2 = /(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
var match2 = r2.exec("abxdcs #[13] fsfs");
console.log(match2[1]); // 13
var r3 = /(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
var match3;
while (match3 = r3.exec("#[111] #[222]")) {
console.log(match3[1]);
}
// while's output:
// 111
// 222
You were close, but you need to use square brackets:
#\[[0-9]+\]
Or, a shorter version:
#\[\d+\]
The reason you need those slashes is to "escape" the square bracket. Usually they are used for denoting a "character class".
[0-9] creates a character class which matches exactly one digit in the range of 0 to 9. Adding the + changes the meaning to "one or more". \d is just shorthand for [0-9].
Of course, the backslash character is also used to escape characters inside of a javascript string, which is why you must escape them. So:
javascript
"#\\[\\d+\\]"
turns into:
regex
#\[\d+\]
which is used to match:
# a literal "#" symbol
\[ a literal "[" symbol
\d+ one or more digits (nearly identical to [0-9]+)
\] a literal "]" symbol
I say that \d is nearly identical to [0-9] because, in some regex flavors (including .NET), \d will actually match numeric digits from other cultures in addition to 0-9.
You don't need so many characters inside the character class. More importantly, you put the + in the wrong place. Try this: #\\[([0-9]+)\\].

Regular expression negative match

I can't seem to figure out how to compose a regular expression (used in Javascript) that does the following:
Match all strings where the characters after the 4th character do not contain "GP".
Some example strings:
EDAR - match!
EDARGP - no match
EDARDTGPRI - no match
ECMRNL - match
I'd love some help here...
Use zero-width assertions:
if (subject.match(/^.{4}(?!.*GP)/)) {
// Successful match
}
Explanation:
"
^ # Assert position at the beginning of the string
. # Match any single character that is not a line break character
{4} # Exactly 4 times
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
GP # Match the characters “GP” literally
)
"
You can use what's called a negative lookahead assertion here. It looks into the string ahead of the location and matches only if the pattern contained is /not/ found. Here is an example regular expression:
/^.{4}(?!.*GP)/
This matches only if, after the first four characters, the string GP is not found.
could do something like this:
var str = "EDARDTGPRI";
var test = !(/GP/.test(str.substr(4)));
test will return true for matches and false for non.

Regex needed to split a string by "."

I am in need for a regex in Javascript. I have a string:
'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'
I want to split this string by periods such that I get an array:
[
'*window',
'some1',
'some\.2', //ignore the . because it's escaped
'(a.b ? cc\.c : d.n [a.b, cc\.c])', //ignore everything inside ()
'some\.3',
'(this.o.p ? ".mike." [ff\.])',
'some5'
]
What regex will do this?
var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array
Fiddle: http://jsfiddle.net/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:
/ Start of RegExp literal
(?: Create a group without reference (example: say, group A)
\( `(` character
(?: Create a group without reference (example: say, group B)
(['"]) ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
\) `)` character
\1 The character as matched at group 1, either `'` or `"`
| OR
[^)]+? Any non-`)` character, at least once (see below)
)+ End of group (B). Let this group occur at least once
| OR
\\\. `\.` (escaped backslash and dot, because they're special chars)
| OR
[^.]+? Any non-`.` character, at least once (see below)
)+ End of group (A). Let this group occur at least once
/g "End of RegExp, global flag"
/*Summary: Match everything which is not satisfying the split-by-dot
condition as specified by the OP*/
There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.
The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.
When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:
Index 0: <Whole match>
Index 1: <Group 1>
The regex below :
result = subject.match(/(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g);
Can be used to acquire the desired results. Group 1 has the results since you want to omit the .
Use this :
var myregexp = /(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
for (var i = 0; i < match.length; i++) {
// matched text: match[i]
}
match = myregexp.exec(subject);
}
Explanation :
// (?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))
//
// Match the regular expression below «(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))»
// Match the regular expression below and capture its match into backreference number 1 «(\(.*?[^'"]\)|.*?[^\\])»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\(.*?[^'"]\)»
// Match the character “(” literally «\(»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match a single character NOT present in the list “'"” «[^'"]»
// Match the character “)” literally «\)»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^\\]»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match any character that is NOT a “A \ character” «[^\\]»
// Match the regular expression below «(?:\.|$)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\.»
// Match the character “.” literally «\.»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
// Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
It is notoriously difficult to use a Regex to do balanced parenthesis matching, especially in Javascript.
You would be way better off creating your own parser. Here's a clever way to do this that will utilize the strength of Regex's:
Create a Regex that matches and captures any "pattern of interest" - /(?:(\\.)|([\(\[\{])|([\)\]\}])|(\.))/g
Use string.replace(pattern, function (...)), and in the function, keep a count of opening braces and closing braces.
Add the matching text to a buffer.
If the split character is found and the opening and closing braces are balanced, add the buffer to your results array.
This solution will take a bit of work, and requires knowledge of closures, and you should probably see the documentation of string.replace, but I think it is a great way to solve your problem!
Update:
After noticing the number of questions related to this one, I decided to take on the above challenge.
Here is the live code to use a Regex to split a string.
This code has the following features:
Uses a Regex pattern to find the splits
Only splits if there are balanced parenthesis
Only splits if there are balanced quotes
Allows escaping of parenthesis, quotes, and splits using \
This code will work perfectly for your example.
not need regex for this work.
var s = '*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5';
console.log(s.match(/(?:\([^\)]+\)|.*?\.)/g));
output:
["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]
So, was working with this, and now I see that #FailedDev is rather not a failure, since that was pretty nice. :)
Anyhow, here's my solution. I'll just post the regex only.
((\(.*?((?<!")\)(?!")))|((\\\.)|([^.]))+)
Sadly this won't work in your case however, since I'm using negative lookbehind, which I don't think is supported by javascript regex engine. It should work as intended in other engines however, as can be confirmed here: http://gskinner.com/RegExr/. Replace with $1\n.

Categories