Javascript regexp that matches '|' not preceded by '\' (lookbehind alternative) - javascript

I'm trying to split string name\|dial_num|032\|0095\\|\\0099|\9925 by delimiter | but it will skip \|.
I have found solution in this link: Javascript regexp that matches '.' not preceded by '\' (lookbehind alternative) but it skips \\| too.
The right result must be: [name\|dial_num,032\|0095\\,\\0099,\9925].
The rule is in case \\\| or \\\\\| or etc, | is still a valid delimiter but in case \\\\| or even more, it isn't.
Any help will be appreciate .

the usual workaround is to use match instead of split:
> s = "name\\|dial_num|032\\|0095\\\\|\\\\0099|\\9925"
"name\|dial_num|032\|0095\\|\\0099|\9925"
> s.match(/(\\.|[^|])+/g)
["name\|dial_num", "032\|0095\\", "\\0099", "\9925"]
As a side note, even if JS did support lookbehinds, it won't be a solution, because (?<!\\)| would also incorrectly skip \\|.

I challenged myself to use replace String method..
I got the right result using regex101.com (a popular online tester for PCRE, Javascript and Python regular expressions engines)
// input : name\|dial_num|032\|0095\\|\\0099|\9925
// regex : ([^\\|](?:\\\\)*)\| with global flag
// replacement : $1,
// output: name\|dial_num,032\|0095\\,\\0099,\9925 <= seams okey right !?
Test ..
var str = 'name\\|dial_num|032\\|0095\\\\|\\\\0099|\\9925';
str = str.replace(/([^\\|](?:\\\\)*)\|/g,'$1,');
console.log(str);
// > name\|dial_num,032\|0095\\,\\0099,\9925

Related

Find character characters except when surrounded by specific characters

I have a string: "${styles.button} ${styles[color]} ${styles[size]} ${styles[_state]} ${iconOnly ? styles.iconOnly : ''}", and I'm trying to use regex to find all the spaces, except for spaces that are part of an interpolation string (${...}).
I'm willing to admit that regex might not be the right tool for this job, but I'm curious what I'm missing.
Essentially what I'm trying to do is replace the spaces with a newline character.
You can split the string in interpolation string and non-interpolation string sequences and then only modify the odd sequences (the resulting array always starts with a non-interpolation string, don't worry about that). This has to be done, because regular expressions are limited in the states they can remember (for more about that study CS). A solution would be:
var string = "${styles.button} ${styles[color]} ${styles[size]} ${styles[_state]} ${iconOnly ? styles.iconOnly : ''}";
var result = string
// split in non-interpolation string and interpolation string sequences
.split(/(\${[^}]*})/g)
// modify the sequences with odd indices ( non-interpolation)
.map((part, i) => (i % 2 ? part : part.replace(/ +/g, '')))
// concatenate the strings
.join('');
console.log(result);
But also mind the comment by ggorlen on your question:
Looks like you're trying to use regex to parse arbitrary JS template strings. That isn't an easy task in the general case and regex is probably the wrong tool for the job--it's likely an xy problem. Can you provide more context (why do you need to parse JS template strings in the first place?) and show an attempt? Thanks.
Assuming you have only have ${...} patterns separated by space as per your example you can apply this regex:
var str = "${styles.button} ${styles[color]} ${styles[size]} ${styles[_state]} ${iconOnly ? styles.iconOnly : ''}"
var re = /(\}) +(\$\{)/g;
var result = str.replace(re, "$1\n$2");
console.log('result: ' + result);
Result:
result: ${styles.button}
${styles[color]}
${styles[size]}
${styles[_state]}
${iconOnly ? styles.iconOnly : ''}
I tested with a simple find ' \$' (without quotes), replace with '\n$' (without quotes) - in sublime text regex search, works well

Match pattern except under one condition Regex

I'm trying to match a patterned with regex except when the pattern is escaped.
Test text:
This is AT\&T® is really cool Regex
You can see with my \& I'm manually escaping. And therefore, do not want the regex to match.
Regex:
const str = 'This is AT\&T® is really cool Regex'
str.replace(/\&(.*?)\;/g, '<sup>&$1;</sup>');
Expected output
This is AT&T<sup>®</sup> is really cool Regex
Hard to explain I guess but when the start of this regex looks for a & and ends with a ; however, if & is preceded with at \ like \& than do not match and look for the next \&(.*?)\;
You can use negative lookbehind
This regex works fine with the example
/(?<!\\)\&(.*?)\;/g
Edit 1
To workaround in JS you can use [^\\] that will match everything except backslash. The overall regex /[^\\]\&(.*?)\;/g It works for your example.
Since JavaScript have no support for lookbehind assertions - it is possible to add some custom substitution logic to achieve desired results. I've updated test string with examples of different kinds of html entities for test purposes:
const str = '&T;his is AT\\&T® is & really &12345; &xAB05; \\&cool; Regex'
console.log(str.replace(/&([a-z]+|[0-9]{1,5}|x[0-9a-f]{1,4});/ig, function (m0, m1, index, str) {
return (str.substr(index - 1, 1) !== '\\') ? '<sup>' + m0 + '</sup>' : m0;
}));

what's wrong with this regular expression? getting the hash part of an url

I´m trying to get the first part of a hash from a url (the part between the # and a /, a ? or the end of the string
So far now I came out with this:
r = /#(.*)[\?|\/|$]/
// OK
r.exec('http://localhost/item.html#hash/sub')
["#hash/", "hash"]
// OK
r.exec('http://localhost/item.html#hash?sub')
["#hash?", "hash"]
// WAT?
r.exec('http://localhost/item.html#hash')
null
I was expeting to receive "hash"
I tracked down the problem to
/#(.*)[$]/
r2.exec('http://localhost/item.html#hash')
null
any idea what could be wrong?
r = /#(.*)[\?|\/|$]/
When $ appears in [] (character class, it's the literal "$" character, not the end of input/line. In fact, your [\?|\/|$] part is equivalent to just [?/$|], which matches the 4 specific characters (including pipe).
Use this instead (JSFiddle)
r = /#(.+?)(\?|\/|$)/
You aren't supposed to write [$] (within a character class) unless you want to match the $ literally and not the end of line.
/#(.*)$/
Code:
var regex = /\#(.*)$/;
regex.exec('http://localhost/item.html#hash');
Output:
["#hash", "hash"]
Your regex: /#(.*)[\?|\/|$]/
//<problem>-----^ ^-----<problem>
| operator won't work within [], but within ()
$ will be treated literally within []
.* will match as much as possible. .*? will be non-greedy
On making the above changes,
you end up with /#(.*?)(\?|\/|$)/
I use http://regexpal.com/ to test my regular expressions.
Your problem here is that your regular expression wants a /. So it don't works with http://localhost/item.html#hash but it works with http://localhost/item.html#hash/
Try this one :
r = /#([^\?|\/|$]*)/
You can't use the $ end-of-string marker in a character class. You're probably better off just matching characaters that aren't / or ?, like this:
/#([^\?\/]*)/
Why Regex? Do it like this (nearly no regex):
var a = document.createElement('a');
a.href = 'http://localhost/item.html#hash/foo?bar';
console.log(a.hash.split(/[\/\?]/)[0]); // #hash
Just for the sake, if it is node.js you are working with:
var hash = require('url').parse('http://localhost/item.html#hash').hash;
I found this regular expression that seems to work
r = /#([^\/\?]*)/
r.exec('http://localhost/item.html#hash/sub')
["#hash", "hash"]
r.exec('http://localhost/item.html#hash?sub')
["#hash", "hash"]
r.exec('http://localhost/item.html#hash')
["#hash", "hash"]
Anyway, I still don't get why the original one isn't working

replace a number (not preceding with character) with _(underscore)

I want to replace the numbers in the string with _number.We have to fetch the numbers only that dont begin with a character and replace them with a underscore .
Requirement : I have a string, so while processing I want to replace constants with _Constant.
example string :"(a/(b1/8))*100"
output expected :"(a/(b1/_8))*_100"
Please suggest how to do this in asp.net code behind.
Thanks in advance.
You'll need a regular expression and the replace function:
var str = '(a/(b1/8))*100';
alert( str.replace(/([^a-zA-Z0-9])([0-9])/g, '$1_$2') );
So, what's going on?
The /'s mark the beginning and end of the regular expression (the tool best suited to this task).
[^a-zA-Z0-9] means "nothing which is a letter or a number".
[0-9] means "a digit".
Together, they mean, "something which is not a letter or a number followed by a digit".
The g at the end means "find all of them".
The () groups the regular expression into two parts $1 and $2
The '$1_$2' is the output format.
So, the expression translates to:
Find all cases where a digit follows a non-alphanumeric. Place a '_' between the digit and the non-alphanumeric. Return the result.
Edit
As an aside, when I read the question, I had thought that the JS function was the desired answer. If that is not the case, please read rkw's answer as that provides the C# version.
Edit 2
Bart brought up a good point that the above will fail in cases where the string starts with a number. Other languages can solve this with a negative lookbehind, but JavaScript cannot (it does not support negative lookbehinds). So, an alternate function must be used (a NaN test on substr( 0, 1 ) seems the easiest approach):
var str = '(a/(b1/8))*100';
var fin = str.replace(/([^a-zA-Z0-9])([0-9])/g, '$1_$2');
if( !isNaN( fin.substr( 0, 1 ) ) ) fin = "_" + fin;
alert( fin );
Same as cwallenpoole's, just in C# code behind
string str = '(a/(b1/8))*100';
str = Regex.Replace(str, '([^a-zA-Z])([0-9])', '$1_$2');
Updated:
string str = "(a/(b1/8))*100";
str = Regex.Replace(str, "([^a-zA-Z0-9]|^)([0-9])", "$1_$2");
Why not try regular expressions:
That is:
search for the regex: "[0-9]+" & replace with "_ + regex."
ie.
String RegExPattern = #"[0-9]+";
String str = "(a/(b1/8))*100";
Regex.Replace(str, RegExPattern, "_$1");
Source: http://msdn.microsoft.com/en-us/library/ms972966.aspx
Hope that helps ya some!

Split string in JavaScript using a regular expression

I'm trying to write a regex for use in javascript.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var match = new RegExp("'[^']*(\\.[^']*)*'").exec(script);
I would like split to contain two elements:
match[0] == "'areaog_og_group_og_consumedservice'";
match[1] == "'\x26roleOrd\x3d1'";
This regex matches correctly when testing it at gskinner.com/RegExr/ but it does not work in my Javascript. This issue can be replicated by testing ir here http://www.regextester.com/.
I need the solution to work with Internet Explorer 6 and above.
Can any regex guru's help?
Judging by your regex, it looks like you're trying to match a single-quoted string that may contain escaped quotes. The correct form of that regex is:
'[^'\\]*(?:\\.[^'\\]*)*'
(If you don't need to allow for escaped quotes, /'[^']*'/ is all you need.) You also have to set the g flag if you want to get both strings. Here's the regex in its regex-literal form:
/'[^'\\]*(?:\\.[^'\\]*)*'/g
If you use the RegExp constructor instead of a regex literal, you have to double-escape the backslashes: once for the string literal and once for the regex. You also have to pass the flags (g, i, m) as a separate parameter:
var rgx = new RegExp("'[^'\\\\]*(?:\\\\.[^'\\\\]*)*'", "g");
while (result = rgx.exec(script))
print(result[0]);
The regex you're looking for is .*?('[^']*')\s*,\s*('[^']*'). The catch here is that, as usual, match[0] is the entire matched text (this is very normal) so it's not particularly useful to you. match[1] and match[2] are the two matches you're looking for.
var script = "function onclick() {loadArea('areaog_og_group_og_consumedservice', '\x26roleOrd\x3d1');}";
var parameters = /.*?('[^']*')\s*,\s*('[^']*')/.exec(script);
alert("you've done: loadArea("+parameters[1]+", "+parameters[2]+");");
The only issue I have with this is that it's somewhat inflexible. You might want to spend a little time to match function calls with 2 or 3 parameters?
EDIT
In response to you're request, here is the regex to match 1,2,3,...,n parameters. If you notice, I used a non-capturing group (the (?: ) part) to find many instances of the comma followed by the second parameter.
/.*?('[^']*')(?:\s*,\s*('[^']*'))*/
Maybe this:
'([^']*)'\s*,\s*'([^']*)'

Categories