Regex find comma delimiters not in quotes AND not in parenthesis - javascript

My ultimate goal is to develop a function to convert Access IIF() statements to T-SQL (2008) CASE WHEN statments. I already have a VBA routine that finds the IIF and the matching closing parenthesis even if it is outside of quotes. It is recursive and finds nested IIF() statements without using RegEx. When I narrow down to the text inside the IIF parenthesis I need to identify the two comma delimiters that separate the three parameters. I am having trouble when a parenthesis is inside of quotes. How can I setup the RegEx to ignore anything in quotes before processing the rest of the expression?
I'm trying to create an expression group that will find anything inside single quotes and anything inside of parenthesis, then exclude anything that matches that group, and find the commas. (Please forgive me if I'm not saying this correctly since 'capturing group' and 'non-capturing group' sometimes give me the opposite of what I expect).
Note that this solution has to work with the VBScript Regular Expression support which is basically the same as the JavaScript flavor.
condition, true, false <-- this is the string my IIF parsing function returns before trying to split into 3 parts.
This is the expression that I've pieced together so far:
,(?=([^']*'[^']*')*(?![^']*'))(?![^(]*[)])
which works on this:
a=1, nz(b,0), Left('xy,z',2)
But these lines are more challenging. I can't find an expression that works on all of them.
a=1, '1st)', '(2nd)'
left(right(a,5),1)='b', '1st)', '(2nd)'
a=1, Left('a,bc',1) , 'xy,z'
Here's a Regex101 that I've been working on:
https://regex101.com/r/qH0wD8/2

The answer is, what I was trying to do was not appropriate for RegEx. Instead I solved the problem by writing a parser that handles nested parenthesis, matching quotes, and miscellaneous commas.
I hope you find this answer because you need a function to convert IIF() statements to CASE WHEN and not my complex RegEx inquiry. The main use case I have for this function is converting Access SQL to T-SQL for SQL Server 2008 and earlier. SQL Server 2012 and later support the IIF() function. You can use this VBA function in any VBA editor. You can use Excel if Access isn't handy.
Here is a VBA function to convert Access IIF() statements to T-SQL CASE WHEN statments
Public Function ReplaceIIFwithCASE(ByVal strInput As String) As String
' Parse the passed string and find the first "IIF(" and parse it into
' a standard T-SQL CASE WHEN statement. If a nested IIF is found,
' recurse and call this function again.
'
' Ben Sacherich - May 2016-Feb 2017
'
' This supports:
' IIF() embedded anywhere in the input string.
' Nested IIF() statements.
' The input string containing multiple IIF() statements on the same level.
' Strings between open and close IIF parenthesis that contains literal commas or commas for other functions.
' Example: IIF(a=1, nz(b,0) , Left('xy,z',2))
'
' Be aware:
' This does not optimize the CASE statement in places where a nested IIF could
' be written as a single CASE statement.
' It will fail if text inside IIF() containes the pipe character |. If you
' need to process statements with this character, modify this routine
' to use another temporary character for |.
'
' Try these in the Immediate window:
' ? ReplaceIIFwithCASE("IIF(a=1, nz(b,0) , Left('xy,z',2))")
' ? ReplaceIIFwithCASE("IIf(x='one',IIf(Abs(Z)=1,2,3),'three')")
' ? ReplaceIIFwithCASE("IIF(a=1,'1st)', '2nd)')")
' ? ReplaceIIFwithCASE("SELECT Name, IIF(Gender='M', 'Boy', 'Girl') FROM Students")
'
' How this works:
' Find "IIF(" in the passed string. Return original string if not found.
' Search for the matching closing parenthesis.
' When the match is found, recurse and make sure an "IIF(" is not nested.
' After recursing, replace the IIF with a CASE statement.
' - Once I find the inner part of an IIF this will use the Split function
' to delimit by commas "," into an array.
' - Then it looks at each array element. If it contains an odd number of
' single or double quote characters or different number of opening and
' closing parenthesis, it combines the array element part with the next
' part and tests again.
' - When there are matched single/double quotes and equivalent number of
' parenthesis it holds that part and appends the "|" character. This
' means that it has identified one of the 3 parameters that is passed
' to the IIF function.
' - Then it splits the string by the "|" character into three pieces
' and builds the CASE statement.
' Continue searching the passed string for another occurrence of "IIF(" (not nested).
Dim lngFuncStart As Long
Dim lngPosition As Long
Dim intStack As Integer
Dim strFunction As String
Dim strChar As String
Dim strQuoteChar As String
Dim bolInQuotes As Boolean
Dim strSplit() As String
Dim ReturnValue As String
On Error GoTo ErrorHandler
strFunction = "IIF("
strQuoteChar = "'" ' Define the style of quotes to look for and exclude.
bolInQuotes = False ' We are currently not inside quotes.
lngFuncStart = InStr(1, strInput, strFunction, vbTextCompare)
If lngFuncStart > 0 Then
lngFuncStart = lngFuncStart + Len(strFunction)
intStack = 1
lngPosition = lngFuncStart
Do While lngPosition <= Len(strInput)
' Use a WHILE loop instead of a FOR loop because the current and end positions will change inside the loop.
strChar = Mid(strInput, lngPosition, 1)
If strChar = strQuoteChar Then
bolInQuotes = Not bolInQuotes
' Move on to the next character
ElseIf bolInQuotes = False Then
' We are currently not inside quotes.
Select Case strChar
Case ")"
' Closing a group
intStack = intStack - 1
Case "("
' Starting new group
intStack = intStack + 1
End Select
If intStack = 0 Then ' Found closing parenthesis.
' See if there is a nested match. ### Recursive ###
ReturnValue = ReplaceIIFwithCASE(Mid(strInput, lngFuncStart, lngPosition - lngFuncStart))
' Begin parsing commas.
strSplit() = Split(ReturnValue, ",")
Dim strPart As String
Dim strRebuilt As String
Dim i As Integer
strRebuilt = ""
If UBound(strSplit()) > 2 Then ' There are more than 2 commas. Piece together the parts.
strPart = strSplit(0)
For i = 1 To UBound(strSplit)
If UBound(Split(strPart, "'")) Mod 2 = 0 _
And UBound(Split(strPart, """")) Mod 2 = 0 _
And UBound(Split(strPart, "(")) = UBound(Split(strPart, ")")) Then
' Number of single quotes is Even or Zero (matched)
' Number of double quotes is Even or Zero (matched)
' Number of parenthesis is matched
' Add the "|" symbol where the IIF should have commas.
strRebuilt = strRebuilt & "|" & strPart
strPart = strSplit(i)
Else
strPart = strPart & "," & strSplit(i)
End If
Next
ReturnValue = Mid(strRebuilt & "|" & strPart, 2)
strSplit() = Split(ReturnValue, "|")
End If
If UBound(strSplit) = 2 Then
' IIF has 3 parameters and is the normal case.
'--- Replace the IIF statement with CASE WHEN ---
' CASE statement help: https://msdn.microsoft.com/en-us/library/ms181765.aspx
ReturnValue = "(CASE WHEN " & Trim(strSplit(0)) & " THEN " & Trim(strSplit(1)) & " ELSE " & Trim(strSplit(2)) & " END)"
'ReturnValue = "(CASE WHEN...)"
If Right(Mid(strInput, 1, lngFuncStart - Len(strFunction) - 1), 2) = vbCrLf Then
' Don't bother to add a CrLf
Else
' Add a CrLf before the CASE statement to make identification easier.
' Comment this out if you don't want it added.
ReturnValue = vbCrLf & ReturnValue
End If
strInput = Mid(strInput, 1, lngFuncStart - Len(strFunction) - 1) & ReturnValue & Mid(strInput, lngPosition + 1)
Else
' Something is wrong. Return the original IIF statement.
' Known issues:
' Text inside IIF() contained pipe character |
' Text contained unmatched parenthesis, maybe inside of a literal string like '1st)'
ReturnValue = "IIF(" & ReturnValue & ") /*### Unable to parse IIF() ###*/ "
strInput = Mid(strInput, 1, lngFuncStart - Len(strFunction) - 1) & ReturnValue & Mid(strInput, lngPosition + 1)
End If
'--- Check to see if there is another function call following the one just addressed. ---
lngFuncStart = InStr(lngFuncStart + Len(ReturnValue) - Len(strFunction), strInput, strFunction, vbTextCompare)
If lngFuncStart > 0 Then
' Another IIF function call is at the same level as the one just processed.
lngFuncStart = lngFuncStart + Len(strFunction)
intStack = 1
lngPosition = lngFuncStart
Else
ReturnValue = strInput
Exit Do
End If
End If
End If
lngPosition = lngPosition + 1
Loop
Else
' Function not found in passed string.
ReturnValue = strInput
End If
ReplaceIIFwithCASE = ReturnValue
Exit Function
ErrorHandler:
MsgBox "Error #" & Err.Number & " - " & Err.Description & vbCrLf & "in procedure ReplaceIIFwithCASE()" _
& vbCrLf & "Input: " & strInput, vbExclamation, "Error"
End Function

You can use recursive matching:
(?>(?>\([^()]*(?R)?[^()]*\))|(?>'[^']*')|(?>[^()' ,]+))+
The outer match is allowed to be repeated (?>...)+. Inside, there are three options. The first one matches balanced ():
(?>\([^()]*(?R)?[^()]*\)
The second matches anything between single quotes '...':
(?>'[^']*')
The third matches anything except (), or ', or comma, or space:
(?>[^()' ,]+)

Related

replace words that starts with "(uuid: )"

I would like to replace text using javascript/regex
"TV "my-samsung" (UUID: a1c3bbc1d27c5be8:8baabe2fa7f5d9ca) is already switched off."
with
TV 'my-samsung' is already switched off.
by removing text (UUID: ) and replace " with '
Looks like regex can be used
\([\s\S]*?\)
https://regex101.com/r/xXDncn/1
or have also tried using replace method in JS
str = str.replace("(UUID", "");
You can use
const str = '" "Tv "my-samsung" (UUID: a1c3bbc1d27c5be8:8baabe2fa7f5d9ca) is already switched-off""';
console.log(
str.replace(/\s*\(UUID:[^()]*\)/g, '').replace(/^[\s"]+|[\s"]+$/g, '').replaceAll('"', "'")
)
See the first regex demo. It matches
\s* - zero or more whitespaces
\(UUID: - (UUID: string
[^()]* - zero or more chars other than ( and )
\) - a ) char.
The g flag makes it replace all occurrences.
The second regex removes trailing and leading whitespace and double quotation marks:
^[\s"]+ - one or more whitespaces and double quotes at the start of string
| - or
[\s"]+$ - one or more whitespaces and double quotes at the end of string.
The .replaceAll('"', "'") is necessary to replace all " with ' chars.
It is not a good idea to merge these two operations into one as the replacements are different. Here is how it could be done, just for learning purposes:
const str = '" "Tv "my-samsung" (UUID: a1c3bbc1d27c5be8:8baabe2fa7f5d9ca) is already switched-off""';
console.log(
str.replace(/^[\s"]+|[\s"]+$|\s*\(UUID:[^()]*\)|(")/g, (x,y) => y ? "'" : "")
)
That is, " is captured into Group 1, the replacement is now a callable, where x is the whole match and y is the Group 1 contents. If Group 1 matched, the replacement is ', else, the replacement is an empty string (to remove the match found).
you can try this
str.replace(/\(.*?\)/, "")
str.replace(/\(.*?\)/, "with")
--- update ---
const str = `"TV "my-samsung" (UUID: a1c3bbc1d27c5be8:8baabe2fa7f5d9ca) is already switched off."`;
const a = str.replace(/"(.*?)\(.*\)(.*)"/, (a, b, c) => {
return b.replace(/"/g, "'") + c
});
console.log(a); //TV 'my-samsung' is already switched off.

Check if first and last character contains given special char

I have input string
..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''
I want to check if the first and last char place contains - or ' or ..
If yes then trim until we get name.
Expected output : VAibhavs.sharma
I am using like this.
while (
myString.charAt(0) == "." ||
myString.charAt(0) == "'" ||
myString.charAt(0) == "-" ||
myString.charAt(myString.length - 1) == "." ||
myString.charAt(myString.length - 1) == "'" ||
myString.charAt(myString.length - 1) == "-"
)
I know this is not correct way. How can I use regex?
I tried /^\'$. But this only checks or first char for a single special char.
You can use regular expression:
input = "..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''"
output = input.replace(/^[-'\.]+/,"").replace(/[-'\.]+$/,"")
console.log(output)
[-'\.] ... -, ' or . character
+ ... one or more times
^ ... beginning of the string
$ ... end of the string
EDIT:
using match:
input = "..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''"
output = input.match(/^[-'\.]+(.*?)[-'\.]+$/)[1]
console.log(output)
(...) ... (1st) group
.*? ... any chacter, zero or more times, ? means non-greedy
.match(...)[1] ... 1 means 1st group
There is already one accepted answer but still, this is how I would do.
var pattern = /\b[A-Za-z.]+\b/gm;
var str = "..-----''''''.......VAibhavs.sharma'..'-.'-.''-....''";
console.log(str.match(pattern));
// Output
// ["VAibhavs.sharma"]
\b is a zero-width word boundary. It matches positions where one side is a word character (usually a letter, digit or underscore) and the other side is not a word character (for instance, it may be the beginning of the string or a space character).

regular expression to check only one decimal point

I have following regular expression to check only one decimal point for type number tag in html
^-?[0-9]*\\.?[0-9]*$
but this regular failed to check If I put decimal at the end e.g 12.12.
what further I have to add to check this
I think your regex can be easily fixed using a + instead of last * quantifier:
^-?[0-9]*\.?[0-9]+$
Tests:
const regex = /^-?[0-9]*\.?[0-9]+$/gm;
console.log('regex.test?')
console.log('12 = ' + regex.test('12'));
console.log('12. = ' + regex.test('12.'));
console.log('12.1 = ' + regex.test('12.1'));
console.log('12.12. = ' + regex.test('12.12.'));
console.log('-1 = ' + regex.test('-1'));
console.log('-1. = ' + regex.test('-1.'));
console.log('-1.2 = ' + regex.test('-1.2'));
console.log('-.12 = ' + regex.test('-.12'));
console.log('-. = ' + regex.test('-.'));
console.log('-. = ' + regex.test('-'));
console.log('. = ' + regex.test('.'));
Demo
Can you try the below : [1-9]\d*(\.\d+)?$
The simplest way to allow a possible . at the end is to have \.? just before the $. Also, the double \ looks wrong (unless you need it for escaping a \ in the context in which you are using it):
^-?[0-9]*\.?[0-9]*\.?$
But please recognize that your regex does not require any actual digits, so will match some non-numbers, like ., -. and (with my edit) -.. The above regex will also match an empty string!
You will want to either change your regex to require digits, or take into account somewhere else that they might not be there.

What does "(\'' + element + '\')" mean?

function expand(element) {
var target = document.getElementById(element);
var h = target.offsetHeight;
var sh = target.scrollHeight;
var loopTimer = setTimeout('expand(\'' + element + '\')', 10);
if (h < sh) {
h += 1;
} else {
clearTimeout(loopTimer);
alert("伸縮完成");
}
target.style.height = h + "px"
}
\' is an escape character for ', so what this is doing is building a string that can be consumed as a function, which contains a parameter, which is wrapped in single quotes...
'expand(\''
The above portion "opens" the string, applies expand( as a literal, then an escaped ', followed by one more ' to close that portion of the string. So, the return on that is:
expand('
Next, they concatenate the value of element variable:
'expand(\'' + element
The string now contains:
expand('elementVariableValue
Next up is to open another literal string, add in another single quote (escaped), followed by the closing parenthese:
'\')'
this is evaluated to:
')
put it all together and you get:
expand('elementVariableValue')
(which is finally interpreted as a function for the timeout).
Now, with JavaScript, you can use both " and ' for string delimiters, so much easier might have been:
setTimeout("expand('" + element + "')", 10);
Code in your example is a recursive call. It's a timer and the callback is expand(element). Understand this, you can easy understand that var loopTimer = setTimeout('expand(\'' + element + '\')', 10); means another call to expand(element). However, function expand need a string parameter, so \'' + element + '\' it is. Finally, if element here equals to scaleid, we finally get expand('scaleid'), it is obviously another call to expand(). Cause it is in string, so \' is needed to escape it.
In Javascript you can pass as the first parameter of the function a string, this string is evaluated as if you use eval(). That code is like if you call the function expand("something") every 10 milliseconds.

Check for existence of a keyword within a code string

I’m loading the contents of a JS file using FileReader and dumping the results into a textarea container. I then want to run some checks on the actual JS file.
I know there are probably tools out there for this already (or better ways), but this is for a closed-environment project.
After the textarea contains the content of the JS file as one large string, I need to loop the string and find all instances of parseInt() to check if they have been supplied with a radix.
I would provide code, but I have nothing working at this point. Any ideas?
The following snippet will search the string value of your <textarea> element for parseInt() and output the occurences, with radix where applicable:
var textareaValue = 'var func = function(){' +
'var i = parseInt(1,1);' +
'var j = parseInt(10, 10);' +
'var k = parseInt(3) + j;' +
'};';
occurences = textareaValue.match(/parseInt\(.+?(, ?\d+)?\)/g);
occurences.forEach(function(occurence){
var hasRadix = /, ?\d+\)$/.test(occurence);
document.body.innerHTML += '<p>"' + occurence + '" has ' +
(hasRadix ? 'a' : 'no') + ' radix' +
(hasRadix ? ' (' + occurence.match(/, ?(\d+)\)$/)[1] + ')' : '') +
'.</p>';
});
Note that this is no actual syntax interpretation, it’s merely text analysis. You will have to go from the result, which comprises all the occurences of parseInt() as strings. Also, JavaScript allows whitespace, comments, expressions and other witchcraft at the text passage in question. You might to have to check for anything.
The actual regex /parseInt\(.+?(, ?\d+)?\)/g will demand…
parseInt( at the beginning of the match
any characters (might need to be expanded to include brackets, etc. by :punct:)
as optional group, determining whether a radix is supplied or not:
a comma, an optional space (might need to respond to any number of whitespace using *)
at least one digit (might need to limit to {1,2}, because only 2 to 36 are valid)
a trailing closing bracket.
The following function should be able to tell the difference between usages of parseInt with radix versus its usages without radix by simplistic regex matching:
function have_radix(str){
parseIntRegex = /parseInt\(.+?\)/g;
parseIntRegexWithRadix = /parseInt\(.+?(,.+?\))/g;
indices = [];
while ( (result = parseIntRegex.exec(str)) ) {
indices.push(result.index);
}
count = indices.length;
indices = [];
while ( (result = parseIntRegexWithRadix.exec(str)) ) {
indices.push(result.index);
}
diff = count - indices.length;
return diff;
}

Categories