javascript regular expression for DN - javascript

I wan a regex to alidate all types of possible DN's
I create one but its not so good.
/([A-z0-9=]{1}[A-z0-9]{1})*[,??]/ and some others by changing it, but in vain.
Posible DN's can be
CN=abcd,CN=abcd,O=abcd,C=us
CN=abcd0520,CN=users,O=abcd,C=us
C=us
etc

I recently had a need for this, so I created one that perfectly follows the LDAPv3 distinguished name syntax at RFC-2253.
Attribute Type
An attributeType can be expressed 2 ways. An alphanumeric string that starts with an alpha, validated using:
[A-Za-z][\w-]*
Or it can be an OID, validated using:
\d+(?:\.\d+)*
So attributeType validates using:
[A-Za-z][\w-]*|\d+(?:\.\d+)*
Attribute Value
An attributeValue can be expressed 3 ways. A hex string, which is a sequence of hex-pairs with a leading #. A hex string validates using:
#(?:[\dA-Fa-f]{2})+
Or an escaped string; each non-special character is expressed "as-is" (validates using [^,=\+<>#;\\"]). Special characters can be expressed with a leading \ (validates using \\[,=\+<>#;\\"]). Finally any character can be expressed as a hex-pair with a leading \ (validates using \\[\dA-Fa-f]{2}). An escaped string validates using:
(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*
Or a quoted-string; the value starts and ends with ", and can contain any character un-escaped except \ and ". Additionally, any of the methods from the escaped string above can be used. A quoted-string validates using:
"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"
All combined, an attributeValue validates using:
#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"
Name component
A name-component in BNF is:
name-component = attributeTypeAndValue *("+" attributeTypeAndValue)
attributeTypeAndValue = attributeType "=" attributeValue
In RegEx is:
(?#attributeType)=(?#attributeValue)(?:\+(?#attributeType)=(?#attributeValue))*
Replacing the (?#attributeType) and (?#attributeValue) placeholders with the values above gives us:
(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*")(?:\+(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"))*
Which validates a single name-component.
Distinguished name
Finally, the BNF for a distinguished name is:
name-component *("," name-component)
In RegEx is:
(?#name-component)(?:,(?#name-component))*
Replacing the (?#name-component) placeholder with the value above gives us:
^(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*")(?:\+(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"))*(?:,(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*")(?:\+(?:[A-Za-z][\w-]*|\d+(?:\.\d+)*)=(?:#(?:[\dA-Fa-f]{2})+|(?:[^,=\+<>#;\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*|"(?:[^\\"]|\\[,=\+<>#;\\"]|\\[\dA-Fa-f]{2})*"))*)*$
Test it here

This is not only not possible, it will never work, and should not even be attempted. LDAP data (distinguished name in this case) are not strings. A distinguished name has distinguishedName syntax, which is not a string, and comparisons must be made with using matching rules defined in the directory server schema. For this reason, regular expressions and native-language comparison, relative value, and equality operations like perl's ~~, eq and == and Java's == cannot be used with LDAP data - if a programmer attempts this, unexpected results can occur and the code is brittle, fragile, unpredictable, and does not have repeatable characteristics. Language LDAP APIs that do not support matching rules cannot be used with LDAP where comparison, equality checks, and relative value ordering comparisons are required.
By way of example, the distinguished names "dc=example,dc=com" and "DC=example, DC=COM" are equivalent in every way from an LDAP perspective, but native language equality operators would return false.

This worked for me:
Expression:
^(?<RDN>(?<Key>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+)\=(?<Value>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+))(?:\s*\,\s*(?<RDN>(?<Key>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+)\=(?<Value>(?:\\[0-9A-Fa-f]{2}|\\\[^=\,\\]|[^=\,\\]+)+)))*$
Test:
CN=Test User Delete\0ADEL:c1104f63-0389-4d25-8e03-822a5c3616bc,CN=Deleted Objects,DC=test,DC=domain,DC=local
The expression is already Regex escaped so to avoid having to repeat all the backslashes in C# make sure you prefix the string with the non-escaped literal # sign, i.e.
var dnExpression = #"...";
This will yield four groups, first a copy of the whole string, second a copy of the last RDN, third and fourth the key/value pairs. You can index into each key/value using the Captures collection of each group.
You can also use this to validate a RDN by cutting the expression to the "(?...)" group surrounded by the usual "^...$" to required a whole value (start-end of string).
I've allowed a hex special character escape "\", simple character escape "\" or anything other than ",=\" inside the key/value DN text. I'd guess this expression could be perfected by taking extra time to go through the MSDN AD standard and restrict the allowed characters to match exactly what is or is not allowed. But I believe this is a good start.

I created one. Working great.
^(\w+[=]{1}\w+)([,{1}]\w+[=]{1}\w+)*$

Related

Javascript - how to use regex process the following complicated string

I have the following string that will occur repeatedly in a larger string:
[SM_g]word[SM_h].[SM_l] "
Notice in this string after the phrase "[SM_g]word[Sm_h]" there are three components:
A period (.) This could also be a comma (,)
[SM_l]
"
Zero to all three of these components will always appear after "[SM_g]word[SM_h]". However, they can also appear in any order after "[SM_g]word[SM_h]". For example, the string could also be:
[SM_g]word[SM_h][SM_l]"
or
[SM_g]word[SM_h]"[SM_l].
or
[SM_g]word[SM_h]".
or
[SM_g]word[SM_h][SM_1].
or
[SM_g]word[SM_h].
or simply just
[SM_g]word[SM_h]
These are just some of the examples. The point is that there are three different components (more if you consider the period can also be a comma) that can appear after "[SM_h]word[SM_g]" where these three components can be in any order and sometimes one, two, or all three of the components will be missing.
Not only that, sometimes there will be up to one space before " and the previous component/[SM_g]word[SM_h].
For example:
[SM_g]word[SM_h] ".
or
[SM_g]word[SM_h][SM_l] ".
etc. etc.
I am trying to process this string by moving each of the three components inside of the core string (and preserving the space, in case there is a space before &\quot; and the previous component/[SM_g]word[SM_h]).
For example, [SM_g]word[SM_h].[SM_l]" would turn into
[SM_g]word.[SM_l]"[SM_h]
or
[SM_g]word[SM_h]"[SM_l]. would turn into
[SM_g]word"[SM_l].[SM_h]
or, to simulate having a space before "
[SM_g]word[SM_h] ".
would turn into
[SM_g]word ".[SM_h]
and so on.
I've tried several combinations of regex expressions, and none of them have worked.
Does anyone have advice?
You need to put each component within an alternation in a grouping construct with maximum match try of 3 if it is necessary:
\[SM_g]word(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})
You may replace word with .*? if it is not a constant or specific keyword.
Then in replacement string you should do:
$1$3$2
var re = /(\[SM_g]word)(\[SM_h])((?:\.|\[SM_l]| ?"){0,3})/g;
var str = `[SM_g]word[SM_h][SM_l] ".`;
console.log(str.replace(re, `$1$3$2`));
This seems applicable for your process, in other word, changing sub-string position.
(\[SM_g])([^[]*)(\[SM_h])((?=([,\.])|(\[SM_l])|( ?&\\?quot;)).*)?
Demo,,, in which all sub-strings are captured to each capture group respectively for your post processing.
[SM_g] is captured to group1, word to group2, [SM_h] to group3, and string of all trailing part is to group4, [,\.] to group5, [SM_l] to group6, " ?&\\?quot;" to group7.
Thus, group1~3 are core part, group4 is trailing part for checking if trailing part exists, and group5~7 are sub-parts of group4 for your post processing.
Therefore, you can get easily matched string's position changed output string in the order of what you want by replacing with captured groups like follows.
\1\2\7\3 or $1$2$7$3 etc..
For replacing in Javascript, please refer to this post. JS Regex, how to replace the captured groups only?
But above regex is not sufficiently precise because it may allow any repeatitions of the sub-part of the trailing string, for example, \1\2\3\5\5\5\5 or \1\2\3\6\7\7\7\7\5\5\5, etc..
To avoid this situation, it needs to adopt condition which accepts only the possible combinations of the sub-parts of the trailing string. Please refer to this example. https://regex101.com/r/6aM4Pv/1/ for the possible combinations in the order.
But if the regex adopts the condition of allowing only possible combinations, the regex will be more complicated so I leave the above simplified regex to help you understand about it. Thank you:-)

Regex javascript to only return a value and not full match

How do we do look behind in java script like we can in java or php?
RegEx works for php parser using lookbehind
Here is the working Regex using php parser.
(?<=MakeName=)(.*?)([^\s]+)
This produces the value
(MakeName=)(.*?)([^\s]+)
this produces the match + value
xml response to extract value from.
<ModelName="Tacoma" MakeName="Tundra" Year="2015">
I just need the value
There is no look-behind in JavaScript.
If you are sure the attribute MakeName is present in the input, then you could use this regular expression:
/[^"]*(?!.*\sMakeName\s*=)(?="([^"]*"[^"]*")*[^"]*$)/
It grabs the first series of characters that do not contain a double quote and have a double quote immediately following it, with an even number of double quotes following after that until the end of the input (to make sure we are matching inside a quoted string), but MakeName= should not occur anywhere after the match.
This is of course still not bullet proof, as it will fail for some boundary cases, like with single quoted values.:
<ModelName="Tacoma" MakeName='Tundra' Year="2015">
You could resolve that, if needed, by repeating the same pattern, but then based on single quotes, and combining the two with an OR (|).
Demo:
var s = '<ModelName="Tacoma" MakeName="Tundra" Year="2015">';
result = s.match(/[^"]*(?!.*\sMakeName\s*=)(?="([^"]*"[^"]*")*[^"]*$)/);
console.log(result[0]);

Javascript regex - Matching 2 substrings

I'm not the best at regular expressions and need some help.
I have these kind of strings: data-some-thing="5 10 red". Word 'data-some' is constant and 'thing' changes. 'thing' also may contain dashes. The values in double quotes contain only alphanumeric symbols or spaces.
Is it possible to get 'thing' and values in double quotes using only regex? If yes then what expression should I use? I tried using lookarounds but didn't have much success.
You could use:
var result = data.match(/data-some-(.*?)="(.*?)"/);
The result array will have three elements:
0: the complete match (not of your interest)
1: the variable part before the equal sign
2: the value between quotes.
Demo:
var data = 'data-some-thing="5 10 red"';
var result = data.match(/data-some-(.*?)="(.*?)"/);
document.write(result[1] + '<br>' + result[2]);
Disclaimer:
Please note that if you are doing this in the context of larger HTML parsing (it is not mentioned in the question), you should not use regular expressions. Instead you should load the HTML string into a DOM, and use DOM methods to find the attribute name and value pairs you are interested in.
For node.js you can use the npm modules jsdom and htmlparser to do this.

Using RegEx for javascript string parameter encoding

Summary
Can you use a regular expression to match multiple characters, but replace individual characters with specific replacements.
For instance, replace \ with \\ and replace " with \x22 and replace ' with \x27.
It is my understanding that this is simply not possible, as you can use the captured sub-matches within the expression, but not with any level of logic that would allow you to conditionally output text if a sub-match took place.
The following VB.NET code is obviously totally incorrect, but gives you an idea of my thinking... (i.e. if there was a replacement command that allowed you to say "if sub-match 1 happened, then output \\ instead")
RegEx.Replace(text, "(\)?("")?(')?", "{if($1,'\\')}{if($2,'\x22')}{if($2,'\x27')}")
(This would be for use with .NET RegEx class, but would be useful for use with javascript RegExp class)
Background
More for interest than actual need, but I've been playing with encoding text for use within javascript parameters. (Well, the need is certainly there, but the interest is efficiency.)
I've been using the standard String.Replace, and doing some tests for performance with the following two functions...
Public Function GetJSSafeString(ByVal text As String) As String
Return text.Replace("\", "\\").Replace("""", "\x22").Replace("'", "\x27")
End Function
Public Function GetJSSafeString2(ByVal text As String) As String
If text.Contains("\") Then
text = text.Replace("\", "\\")
End If
If text.Contains("""") Then
text = text.Replace("""", "\x22")
End If
If text.Contains("'") Then
text = text.Replace("'", "\x27")
End If
Return text
End Function
Using two strings, both around 200 characters in length - the first does not contain any characters to be converted - the second contains one of each character to be converted (\"'). I ran each of the two strings through the two functions 100000 times each.
The four results are coming out (in total-milliseconds) roughly as...
GetJSSafeString, no converted characters: 182.0364
GetJSSafeString, converted characters: 316.0632
GetJSSafeString2, no converted characters: 60.012
GetJSSafeString2, converted characters: 354.0708
So obviously GetJSSafeString2 is best if there are no replacement, and worst if there are characters to convert (but not much worse, so looks like the better choice).
But it got me thinking... could this be done with a single regular expression?
And if so, would it be faster than either of the two above functions?
The solution in JavaScript:
var text="this is a test \\ with \"things\" to ' replace";
var h={'\\':'\\\\', '"':"\\x22", "'":"\\x27"}; //we define here the replacements
text=text.replace(/("|\\|')/g,function(match){return h[match]});
alert(text); //prints: this is a test \\ with \x22things\x22 to \x27 replace
Note: this document on replace is worth reading
Big thanks to #psxls for his answer, which will be useful for future javascript implementation.
His answer made me look at the overloads for the .NET RegEx.Replace function (which to be honest, I should have done in the first place, my bad)... and there is a MatchEvaluator delegate.
So I have implemented the following code as a test (to compliment the code already in my answer)...
Public Function GetJSSafeString3(ByVal text As String) As String
Return Regex.Replace(text, "(\\|""|')", New MatchEvaluator(AddressOf GetJSSafeString3Eval))
End Function
Public Function GetJSSafeString3Eval(ByVal textMatch As Match) As String
Select Case textMatch.Value
Case "\"
Return "\\"
Case """"
Return "\x22"
Case "'"
Return "\x27"
End Select
Return ""
End Function
And the results are as I expected... that this is far, far less efficient than either of the functions in my original question function. (The following are in milliseconds)
GetJSSafeString, no converted characters: 182
GetJSSafeString, converted characters: 316
GetJSSafeString2, no converted characters: 60
GetJSSafeString2, converted characters: 354
GetJSSafeString3, no converted characters: 477
GetJSSafeString3, converted characters: 856
As the majority of the strings that I will be converting will not contain any of the characters mentioned, I am implementing the GetJSSafeString*2* function, as that is by far the most efficient for the majority of situations.

regex to validate intl phone number

Can anyone helps me to write a regex that satisfies these conditions to validate international phone number:
it must starts with +, 00 or 011.
the only allowed characters are [0-9],-,.,space,(,)
length is not important
so these tests should pass:
+1 703 335 65123
001 (703) 332-6261
+1703.338.6512
This is my attempt ^\+?(\d|\s|\(|\)|\.|\-)+$ but it's not working properly.
To clean up the regexp use square-brackets to define "OR" situations of characters, instead of |.
Below is a rewritten version of your regular-expression, matching the provided description.
/^(?:\+|00|011)[0-9 ().-]+$/
What is the use of ?:?
When doing ?: directly inside a parenthesis it's for telling the regular-expression engine that you'd want to group something, but not store away the information for later use.
with only 1 space and more successive space is not allowed ( note the " ?" at the end of second group)
(\+|00|011)([\d-.()]+ ?)+$
faster (i guess) with adding passive groups modifier (?:) at the beginnings of each group
(?:\+|00|011)(?:[\d-.()]+ ?)+$
you can use some regex cheat sheets like this one and Linqpad for faster tuning this regex to your needs.
in case you are not familiar with Linqpad, you should just copy & paste this next block to it and change language to C# statements and press F5
string pattern = #"^(?:\+|00|011)(?:[\d-.()]+ ?)+$";
Regex.IsMatch("+1 703 335 65123", pattern).Dump();
Regex.IsMatch("001 (703) 332-6261",pattern).Dump();
Regex.IsMatch("+1703.338.6512",pattern).Dump();
^(?:\+|00|011)[\d. ()-]*$
To specify a length (in case you do care about length later on), use the following:
^(?:\+|00|011)(?:[. ()-]*\d){11,12}[. ()-]*$
And you could obviously change the 11,12 to whatever you want. And just for fun, this also does the same exact thing as the one above:
^(?:\+|00|011)[. ()-]*(?:\d[. ()-]*){11,12}$
I'd go for a completely different route (in fact I had the same problem as you at one point, except I did it in Java).
The plan here is to take the input, make replacements on it and check that the input is empty:
first substitute \s* with nothing, globally;
then substitute \(\d+\) by nothing, globally;
then substitute ^(\+|00|011)\d+([-.]\d+)*$ by nothing.
after these, if the result string is empty, you have a match, otherwise you don't.
Since I did it in Java, I found Google's libphonenumber since then and have dropped that. But it still works:
fge#erwin ~ $ perl -ne '
> s,\s*,,g;
> s,\(\d+\),,g;
> s,^(\+|00|011)\d+([-.]\d+)*$,,;
> printf("%smatch\n", $_ ? "no " : "");
> '
+1 703 335 65123
match
001 (703) 332-6261
match
+1703.338.6512
match
+33209283892
match
22989018293
no match
Note that a further test is required to see if the input string is at least of length 1.
Try this:
^(\([+]?\d{1,3}\)|([+0]?\d{1,3}))?( |-)?(\(\d{1,3}\)|\d{1,3})( |-)?\d{3}( |-)?\d{4}$
It is compatible with E164 standard along with some combinations of brackets, space and hyphen.

Categories