How do I convert a PHP regex with a lookbehind to Javascript? - javascript

Javascript doesn't support lookbehinds in regexes. How do I convert the following PHP regex to Javascript?
regPattern="(?<!\\)\\x"
Here is the test case (in Node.js):
var str = '{"key":"abc \\x123 \xe2\x80\x93 xyz"}'
var newStr = str.replace(/regPattern/g, '\\u')
console.log(newStr); // output: '{"key":"abc \\x123 \ue2\u80\u93 xyz"}'
\\x123 doesn't match because it contains \\x, but \x matches.

Try this:
var newStr = str.replace(/([^\\]|^)\\x/g, '$1\\u');
In other words, match the ^ (start of string) or any non-\ character, followed by \x, capturing the first character in capture group 1.
Then replace the whole 3-character matched group with capture group 1, followed by \u.
For example, in abc?\x, the string ?\x will be matched, and capture group 1 will be ?. So we replace the match (?\x) with $1\u, which evaluates to ?\u. So abc?\x -> abc?\u.

Related

Only Tokens in Regex in Javascript

I'm using JavaScript for parsing a string that looks as follow:
var myString = "unimportant:part.one:unimportant:part.two:unimportant:part.three";
var regex = /\w*:(part.\w*)./gi
How can I put only the highlighted part within the parenthesis in an array?
var myArray = myString.match(regex); gives me the whole line.
In your pattern \w*:(part.\w*). the leading \w* is optional so if you want to match at least a single word character you can just use \w
After the capture group there is a dot . which matches any character so you will miss the last character at the end of the string as the character is mandatory.
Note to escape the dot \. if you want to match it literally
The pattern can look like:
\w:(part\.\w+)
Then getting the capture group 1 values into an array using matchAll
const myString = "unimportant:part.one:unimportant:part.two:unimportant:part.three";
const regex = /\w:(part\.\w+)/gi;
const result = Array.from(myString.matchAll(regex), m => m[1])
console.log(result)
Or without a capture group using a lookbehind if that is supported using match
const myString = "unimportant:part.one:unimportant:part.two:unimportant:part.three";
const regex = /(?<=\w:)part\.\w+/gi;
console.log(myString.match(regex))

Javascript Regex: negative lookbehind

I am trying to replace in a formula all floating numbers that miss the preceding zero. Eg:
"4+.5" should become: "4+0.5"
Now I read look behinds are not supported in JavaScript, so how could I achieve that? The following code also replaces, when a digit is preceding:
var regex = /(\.\d*)/,
formula1 = '4+1.5',
formula2 = '4+.5';
console.log(formula1.replace(regex, '0$1')); //4+10.5
console.log(formula2.replace(regex, '0$1')); //4+0.5
Try this regex (\D)(\.\d*)
var regex = /(\D)(\.\d*)/,
formula1 = '4+1.5',
formula2 = '4+.5';
console.log(formula1.replace(regex, '$10$2'));
console.log(formula2.replace(regex, '$10$2'));
You may use
s = s.replace(/\B\.\d/g, '0$&')
See the regex demo.
Details
\B\. - matches a . that is either at the start of the string or is not preceded with a word char (letter, digit or _)
\d - a digit.
The 0$& replacement string is adding a 0 right in front of the whole match ($&).
JS demo:
var s = "4+1.5\n4+.5";
console.log(s.replace(/\B\.\d/g, '0$&'));
Another idea is by using an alternation group that matches either the start of the string or a non-digit char, capturing it and then using a backreference:
var s = ".4+1.5\n4+.5";
console.log(s.replace(/(^|\D)(\.\d)/g, '$10$2'));
The pattern will match
(^|\D) - Group 1 (referred to with $1 from the replacement pattern): start of string (^) or any non-digit char
(\.\d) - Group 2 (referred to with $2 from the replacement pattern): a . and then a digit

javascript regex to check if first and last character are similar?

Is there any simple way to check if first and last character of a string are the same or not, only with regex?
I know you can check with charAt
var firstChar = str.charAt(0);
var lastChar = str.charAt(length-1);
console.log(firstChar===lastChar):
I'm not asking for this : Regular Expression to match first and last character
You can use regex with capturing group and its backreference to assert both starting and ending characters are same by capturing the first caharacter. To test the regex match use RegExp#test method.
var regex = /^(.).*\1$/;
console.log(
regex.test('abcdsa')
)
console.log(
regex.test('abcdsaasaw')
)
Regex explanation here :
^ asserts position at start of the string
1st Capturing Group (.)
.* matches any character (except newline) - between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\1 matches the same text as most recently matched by the 1st capturing group
$ asserts position at the end of the string
The . doesn't include newline character, in order include newline update the regex.
var regex = /^([\s\S])[\s\S]*\1$/;
console.log(
regex.test(`abcd
sa`)
)
console.log(
regex.test(`ab
c
dsaasaw`)
)
Refer : How to use JavaScript regex over multiple lines?
Regex explanation here :
[.....] - Match a single character present
\s - matches any whitespace character (equal to [\r\n\t\f\v ])
\S - matches any non-whitespace character (equal to [^\r\n\t\f ])
finally [\s\S] is matches any character.
You can try it
const rg = /^([\w\W]+)[\w\W]*\1$/;
console.log(
rg.test(`abcda`)
)
console.log(
rg.test(`aebcdae`)
)
console.log(
rg.test(`aebcdac`)
)
var rg = /^([a|b])([a|b]+)\1$|^[a|b]$/;
console.log(rg.test('aabbaa'))
console.log(rg.test('a'))
console.log(rg.test('b'))
console.log(rg.test('bab'))
console.log(rg.test('baba'))
This will make sure that characters are none other than a and b which have the same start and end.
It will also match single characters because they too start and end with same character.

translating RegEx syntax working in php and python to JS

I have this RegEx syntax: "(?<=[a-z])-(?=[a-z])"
It captures a dash between 2 lowercase letters. In example below the second dash is captured:
Krynica-Zdrój, ul. Uzdro-jowa
Unfortunately I can't use <= in JS.
My ultimate goal is to remove the hyphen with RegEx replace.
It seems to me you need to remove the hyphen in between lowercase letters.
Use
var s = "Krynica-Zdrój, ul. Uzdro-jowa";
var res = s.replace(/([a-z])-(?=[a-z])/g, "$1");
console.log(res);
Note the first lookbehind is turned into a simple capturing group and the second lookahead is OK to use since - potentially, if there are chunks of hyphenated single lowercase letters - it will be able to deal with overlapping matches.
Details:
([a-z]) - Group 1 capturing a lowercase ASCII letter
- - a hyphen
(?=[a-z]) - that is followed with a lowercase ASCII letter that is not added to the result
-/g - a global modifier, search for all occurrences of the pattern
"$1" - the replacement pattern containing just the backreference to the value stored in Group 1 buffer.
VBA sample code:
Sub RemoveExtraHyphens()
Dim s As String
Dim reg As New regexp
reg.pattern = "([a-z])-(?=[a-z])"
reg.Global = True
s = "Krynica-Zdroj, ul. Uzdro-jowa"
Debug.Print reg.Replace(s, "$1")
End Sub

Javascript reg exp not right

Here is a string str = '.js("aaa").js("bbb").js("ccc")', I want to write a regular expression to return an Array like this:
[aaa, bbb, ccc];
My regular expression is:
var jsReg = /.js\(['"](.*)['"]\)/g;
var jsAssets = [];
var js;
while ((js = jsReg.exec(find)) !== null) {
jsAssets.push(js[1]);
}
But the jsAssets result is
[""aaa").js("bbb").js("ccc""]
What's wrong with this regular expression?
Use the lazy version of .*:
/\.js\(['"](.*?)['"]\)/g
^
And it would be better if you escape the first dot.
This will match the least number of characters until the next quote.
jsfiddle demo
If you want to allow escaped quotes, use something like this:
/\.js\(['"]((?:\\['"]|[^"])+)['"]\)/g
regex101 demo
I believe it can be done in one-liner with replace and match method calls:
var str = '.js("aaa").js("bbb").js("ccc")';
str.replace(/[^(]*\("([^"]*)"\)[^(]*/g, '$1,').match(/[^,]+/g);
//=> ["aaa", "bbb", "ccc"]
The problem is that you are using .*. That will match any character. You'll have to be a bit more specific with what you are trying to capture.
If it will only ever be word characters you could use \w which matches any word character. This includes [a-zA-Z0-9_]: uppercase, lowercase, numbers and an underscore.
So your regex would look something like this :
var jsReg = /js\(['"](\w*)['"]\)/g;
In
/.js\(['"](.*)['"]\)/g
matches as much as possible, and does not capture group 1, so it matches
"aaa").js("bbb").js("ccc"
but given your example input.
Try
/\.js\(('(?:[^\\']|\\.)*'|"(?:[\\"]|\\.)*"))\)/
To break this down,
\. matches a literal dot
\.js\( matches the literal string ".js("
( starts to capture the string.
[^\\']|\\. matches a character other than quote or backslash or an escaped non-line terminator.
(?:[\\']|\\.)* matches the body of a string
'(?:[\\']|\\.)*' matches a single quoted string
(...|...) captures a single quoted or double quoted string
)\) closes the capturing group and matches a literal close parenthesis
The second major problem is your loop.
You're doing a global match repeatedly which makes no sense.
Get rid of the g modifier, and then things should work better.
Try this one - http://jsfiddle.net/UDYAq/
var str = new String('.js("aaa").js("bbb").js("ccc")');
var regex = /\.js\(\"(.*?)\"\){1,}/gi;
var result = [];
result = str.match (regex);
for (i in result) {
result[i] = result[i].match(/\"(.*?)\"/i)[1];
}
console.log (result);
To be sure that matched characters are surrounded by the same quotes:
/\.js\((['"])(.*?)\1\)/g

Categories