JS RegEx for C# "lambda syntax" - javascript

/^[a-z][ ][=][>][ ][a-z?][.?][a-z0-9]+[ ][=][ ]['?][a-z0-9]+['?]]/i
I'm trying to figure out how to get a rexex pattern that would recognize a string of lambda syntax (used in c#)
In the case of strings
"p => p = 'some random string'" //Must alow for single quotes
In the case of number or boolean values
"p => p = true" /*or*/ "p => p = 25" //Must allow for a string without single quotes with no whitespace at all in the event there are no single quotes
Also it must allow for a single '.' in the letter chosen to the left of the '=' sign
"p => p.firstName = 'Jack'"
How can I modify my regex to fulfill the following requirments
start off with any letter
followed with a mandatory space
followed by a mandatory string '=>' (without single quotes)
followed by a mandatory space
followed by the same letter in the step 1 (or at least a single character)
followed by a period character (optional)
followed by any set of alphanumberic characters (required if there is a period from step 6)
followed by a space
followed by an equals sign
followed by a space
followed by any alphanumeric set of characters along with single quotes (but only if the single quotes encompass the set of alphanumeric characters)

First off, just the general point that you don't need [] around everything, only character classes (e.g [a-zA-Z] or [_\$0-9]).
So let's go through your steps in order:
Match any letter - you don't specify case, let's do both:
Lowercase only: ([a-z])
Uppercase only: ([A-Z])
Both: ([a-zA-Z]).
We wrap it in () so we can use it in a backreference later.
The mandatory string => (merging steps 2-4) is just that, literally: =>. As none of these are special characters there is no need for escaping.
To get the same letter as step 1, we insert a backref to the first group (set of ()): \1
For step 6 & 7, we take the period along with one alphanumeric character to be optional: (\.\w)? and then zero or more alphanumeric characters: \w*
Now we have the literal string =, again none of these chars need to be escaped so we include it directly: =
For the last step we have several options:
Some numeric characters without whitespace: \d+
True or False
Or, single quote, any characters but the single quote and then single quote again: '[^']*' (we use negative character classes to get everything but ')
Now we join these to together as alternatives using |
Putting all this together, we get the final regex:
/([a-zA-Z]) => \1(\.\w)?\w* = (\d+|true|false|'[^']*')/i

Related

Regex for any character except quote after comma

I want to match every word separated by comma, but it must not include a quote like ' or ".
I was using this regex:
^[a-zA-Z0-9][\!\[\#\\\:\;a-zA-Z0-9`_\s,]+[a-zA-Z0-9]$
However, it only matches a character and number and not a symbol.
The output should be:
example,example //true
exaplle,examp#3 //true, with symbol or number
example, //false, because there is no word after comma
,example //false, because there is no word before comma
##example&$123,&example& //true, with all character and symbol except quote
You can match 1+ times what is present in the character class. Then repeat 1+ times in a non capturing group (?: what is present in the character class, preceded by a comma.
^[!\[#\\:;a-zA-Z0-9`_ &$#]+(?:,[!\[#\\:;a-zA-Z0-9`_ &$#]+)+$
Regex demo
Note that you don't have to escape \!, \#, \: and \; in the character class, and that \s might also possibly match a newline.
I'm assuming you want the whole string to match perfectly with your conditions and return true then and then only.
These are the conditions-
Each word should be separated by a comma, said comma should have 2 valid words on each side
Words can contain anything except the 2 kinds of quotes (' and ") and whitespace characters (spaces and newlines).
The regex you would use is this- ^(?:[^,'"\s]+,[^,'"\s]+)+$, with the global flag (g) on.
Check out the demo here
Edit: As per request of being able to match only a single word.
This is the regex you would use for that- ^(?:(?:[^,'"\s]+,[^,'"\s]+)+|[^,'"\s]+)$
This will match words separated by a , as well as match just a single word.
The conditions for what qualifies as a word remains the same as aforementioned.
Quick explanation:-
^[^,'"\s]+,[^,'"\s]+$
This part matches 2 words separated by a comma, [^,'"\s]+ denotes a word
Wrapping that whole thing in ^(?:[^,'"\s]+,[^,'"\s]+)+$ simply makes it repeat, so it'll match N number of words separated by a comma, not just 2
Then adding another alternative using | and wrapping the whole thing in a group (non-capturing), we get ^(?:(?:[^,'"\s]+,[^,'"\s]+)+|[^,'"\s]+)$
This simply just adds the alternative [^,'"\s]+ - which matches a singular word.
Check out the updated demo here

Do not allow '.'(dot) anywhere in a string (regular expression)

I have a regular expression for allowing unicode chars in names(Spanish, Japanese etc), but I don't want to allow '.'(dot) anywhere in the string.
I have tried this regex but it fails when string length is less than 3. I am using xRegExp.
^[^.][\\pL ,.'-‘’][^.]+$
For Example:
NOËL // true
Sanket ketkar // true
.sank // false
san. ket // false
NOËL.some // false
Basically it should return false when name has '.' in it.
Your pattern ^[^.][\\pL ,.'-‘’][^.]+$ matches at least 3 characters because you use 3 characters classes, where the first 2 expect to match at least 1 character and the last one matches 1 or more times.
You could remove the dot from your character class and repeat that character class only to match 1+ times any of the listed to also match when there are less than 3 characters.
^[\p{L} ,'‘’-]+$
Regex demo
Or you could use a negated character class:
^[^.\r\n]+$
^ Start of string
[^.\r\n]+ Negated character class, match any char except a dot or newline
$ End of string
Regex demo
You could try:
^[\p{L},\-\s‘’]+(?!\.)$
As seen here: https://regex101.com/r/ireqbW/5
Explanation -
The first part of the regex [\p{L},\-\s‘’]+ matches any unicode letter, hyphen or space (given by \s)
(?!\.) is a Negative LookAhead in regex, which basically tells the regex that for each match, it should not be followed by a .
^[^.]+$
It will match any non-empty string that does not contain a dot between the start and the end of the string.
If there is a dot somewhere between start to end (i.e. anywhere) it will fail.

Regex remove all leading and trailing special characters?

Let's say I have the following string in javascript:
&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&
I want to remove all the leading and trailing special characters (anything which is not alphanumeric or alphabet in another language) from all the words.
So the string should look like
a.b.c a.b.c a.b.c a.b.c a.b&.c a.b.&&dc ê.b..c
Notice how the special characters in between the alphanumeric is left behind. The last ê is also left behind.
This regex should do what you want. It looks for
start of line, or some spaces (^| +) captured in group 1
some number of symbol characters [!-\/:-#\[-``\{-~]*
a minimal number of non-space characters ([^ ]*?) captured in group 2
some number of symbol characters [!-\/:-#\[-``\{-~]*
followed by a space or end-of-line (using a positive lookahead) (?=\s|$)
Matches are replaced with just groups 1 and 2 (the spacing and the characters between the symbols).
let str = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*([^ ]*?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
Note that if you want to preserve a string of punctuation characters on their own (e.g. as in Apple & Sauce), you should change the second capture group to insist on there being one or more non-space characters (([^ ]+?)) instead of none and add a lookahead after the initial match of punctuation characters to assert that the next character is not punctuation:
let str = 'Apple &&& Sauce; -This + !That!';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*(?![!-\/:-#\[-`\{-~])([^ ]+?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
a-zA-Z\u00C0-\u017F is used to capture all valid characters, including diacritics.
The following is a single regular expression to capture each individual word. The logic is that it will look for the first valid character as the beginning of the capture group, and then the last sequence of invalid characters before a space character or string terminator as the end of the capture group.
const myRegEx = /[^a-zA-Z\u00C0-\u017F]*([a-zA-Z\u00C0-\u017F].*?[a-zA-Z\u00C0-\u017F]*)[^a-zA-Z\u00C0-\u017F]*?(\s|$)/g;
let myString = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&'.replace(myRegEx, '$1$2');
console.log(myString);
Something like this might help:
const string = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
const result = string.split(' ').map(s => /^[^a-zA-Z0-9ê]*([\w\W]*?)[^a-zA-Z0-9ê]*$/g.exec(s)[1]).join(' ');
console.log(result);
Note that this is not one single regex, but uses JS help code.
Rough explanation: We first split the string into an array of strings, divided by spaces. We then transform each of the substrings by stripping
the leading and trailing special characters. We do this by capturing all special characters with [^a-zA-Z0-9ê]*, because of the leading ^ character it matches all characters except those listed, so all special characters. Between these two groups we capture all relevant characters with ([\w\W]*?). \w catches words, \W catches non-words, so \w\W catches all possible characters. By appending the ? after the *, we make the quantifier * lazy, so that the group stops catching as soon as the next group, which catches trailing special characters, catches something. We also start the regex with a ^ symbol and end it with an $ symbol to capture the entire string (they respectively set anchors to the start end the end of the string). With .exec(s)[1] we then execute the regex on the substring and return the first capturing group result in our transform function. Note that this might be null if a substring does not include proper characters. At the end we join the substrings with spaces.

RegEx start and finish with letter, allow commas and dashes

I've got this regex:
/^[\a-zøåæäöüß][\a-z0-9øåæäöüß]*(?:\-?[\a-z0-9øåæäöüß,]+)*$/i
It works fine for a crazy input like "K61-283øÅ,æk-ken,a-sd", but it fails on the cases "word," (so, when there's just one comma).
Also - how can I restrict it that it should start with a letter after every comma or dash (so basically - every word)?
The rule is: start with a letter and end with alphanumeric; allow alphanumeric, dashes and commas; after each dash or comma there should be a letter
You may use
/^[a-zøåæäöüß][a-z0-9øåæäöüß]*(?:[-,][a-zøåæäöüß][a-z0-9øåæäöüß]*)*$/i
See the regex demo
Details:
^ - start of string
[a-zøåæäöüß] - a letter from the defined set
[a-z0-9øåæäöüß]* - 0+ digits or letters from the defined set
(?:[-,][a-zøåæäöüß][a-z0-9øåæäöüß]*)* - zero or more sequences of:
[-,] - a - or ,
[a-zøåæäöüß] - a letter from the defined set
[a-z0-9øåæäöüß]* - 0+ digits or letters from the defined set
$ - end of string.
Update 2:
There are two ways to look at your requirements.
The top-down view
We treat the input as a list of one or more words, separated by comma or dash:
INPUT = WORD (?: [,\-] WORD )*
Each word consists of a letter, followed by zero or more letters or digits:
WORD = LETTER [ LETTER DIGIT ]*
Translated into JavaScript regex syntax this gives us:
WORD = [a-zøåæäöüß][a-zøåæäöüß\d]*
And for the whole input (with anchors):
/^[a-zøåæäöüß][a-zøåæäöüß\d]*(?:[,\-][a-zøåæäöüß][a-zøåæäöüß\d]*)*$/i
(This is Wiktor Stribiżew's answer.)
The bottom-up view
We start by looking at the allowed characters. We know that the first character has to be a letter. After that, there can be zero or more input elements:
INPUT = LETTER ELEMENT*
Each element is either
a letter or digit, or
a comma or dash, followed by a letter:
ELEMENT = [ LETTER DIGIT ] | [ COMMA DASH ] LETTER
Translating this into JavaScript gives us:
/^[a-zøåæäöüß](?:[a-zøåæäöüß\d]|[,\-][a-zøåæäöüß])*$/i
These two regexes are equivalent. The bottom-up regex is shorter and contains less repetitive code. On the other hand, the top-down regex may run faster on some regex engines if the input strings are mostly alphanumeric, with relatively few dashes/commas. On the gripping hand, if your inputs are short, you probably don't care about minuscule performance differences.
Here's a direct encoding of your (revised) requirements:
/^[a-zøåæäöüß](?:(?:[a-zøåæäöüß\d]|[,\-][a-zøåæäöüß])*[,\-]?[a-zøåæäöüß])?$/i
The idea is to match a letter, followed by either
the end of the string (this handles input strings of length 1), or
a list of 0 or more intermediates, optionally followed by a comma or dash, followed by another letter
Each intermediate is either
a letter, or
a digit, or
a comma or a dash followed by a letter
Try this out: (allows letters and digits after comma or dash)
/^[a-zøåæäöüß]([a-z0-9øåæäöüß]|(,|-)[a-z0-9øåæäöüß])*[a-zøåæäöüß]$/i
or this: (allows letters after comma or dash)
/^[a-zøåæäöüß]([a-z0-9øåæäöüß]|(,|-)[a-zøåæäöüß])*[a-zøåæäöüß]$/i

RegExp extract specific string followed by any number with leading / trailing whitespace

I want to extract a string from another using JavaScript / RegExp.
Here is what I got:
var string = "wp-button wp-image-45 wp-label";
string.match(/(?:(?:.*)?\s+)?(wp-image-([0-9]+))(:?\s(?:.*)?)?/);
// returnes: ["wp-button ", "wp-image-45", "45", undefined]
I just want to have "wp-image-45", so:
(Optional) any character
(Optional) followed by whitespace
(Required) followed by "wp-image-"
(Required) followed by any number
(Optional) followed by whitespacy
(Optional) followed by any character
What is missing here? Is it just some kind of bracketing or more?
I also tried
string.match(/(?:(?:.*)?\s+)?(?=(wp-image-([0-9]+)))(?=(:?\s(?:.*)?)?)/)
Edit: In the end I just want to have the number. But I'd also make this step in between.
Regexps are not required to start matching at the beginning of the string, so your attempts to match whitespace and any character aren't necessary. Also, "any character" includes whitespace (except newlines in certain modes).
This should be all you need:
string.match(/\bwp-image-(\d+)\b/)
This will capture, for example, "wp-image-123" into matching group 0, and "123" into matching group 1.
\b means "word boundary", which ensures that you won't match "abcwp-image-123def". A word boundary is defined as any place where a non-word character is followed by a word character, or vice versa. A word character is consists of a letter, a number or an underscore.
Also, I used \d instead of [0-9] simply out of convenience. They have slightly different meaning (\d also matches characters considered numbers in other languages), but that won't make a difference in your case.
If all of that surrounding stuff is optional and all you want is the number then there's no point to matching for any of that stuff except for that "wp-image-" prefix, just do:
var string = "wp-button wp-image-45 wp-label";
string.match(/wp-image-([0-9]+)/);

Categories