translating RegEx syntax working in php and python to JS - javascript

I have this RegEx syntax: "(?<=[a-z])-(?=[a-z])"
It captures a dash between 2 lowercase letters. In example below the second dash is captured:
Krynica-Zdrój, ul. Uzdro-jowa
Unfortunately I can't use <= in JS.
My ultimate goal is to remove the hyphen with RegEx replace.

It seems to me you need to remove the hyphen in between lowercase letters.
Use
var s = "Krynica-Zdrój, ul. Uzdro-jowa";
var res = s.replace(/([a-z])-(?=[a-z])/g, "$1");
console.log(res);
Note the first lookbehind is turned into a simple capturing group and the second lookahead is OK to use since - potentially, if there are chunks of hyphenated single lowercase letters - it will be able to deal with overlapping matches.
Details:
([a-z]) - Group 1 capturing a lowercase ASCII letter
- - a hyphen
(?=[a-z]) - that is followed with a lowercase ASCII letter that is not added to the result
-/g - a global modifier, search for all occurrences of the pattern
"$1" - the replacement pattern containing just the backreference to the value stored in Group 1 buffer.
VBA sample code:
Sub RemoveExtraHyphens()
Dim s As String
Dim reg As New regexp
reg.pattern = "([a-z])-(?=[a-z])"
reg.Global = True
s = "Krynica-Zdroj, ul. Uzdro-jowa"
Debug.Print reg.Replace(s, "$1")
End Sub

Related

Write a regex that matches with string separated by dash but should be all upper case or lower case

I am writing a regex that checks for the strings like
ju-NIP-er-us skop-u-LO-rum and ui-LA-ui LO-iu-yu
the set of characters separated by the -
this is what I have got
let str = "ju-NIP-er-us skop-u-LO-rum";
let str2 = "jU-NiP-Er-us skop-u-LO-rum"
console.log(/^\p{L}+(?:[- ']\p{L}+)*$/u.test(str)) // this matches
console.log(/^\p{L}+(?:[- ']\p{L}+)*$/u.test(str2)) // this also matches but this shouldn't match
The problem is set of characters separated by the - should either be all capital or all smaller ,but right now it is matching the mix characters as well ,see the snippet .How to make this regex match only all small or all caps between dashes.
You may use this regex to make sure to match same case substrings between - or space or ' delimiters:
^(?:\p{Lu}+|\p{Ll}+)(?:[- '](?:\p{Lu}+|\p{Ll}+))*$
RegEx Demo
Non-capturing group (?:\p{Lu}+|\p{Ll}+) matches either 1+ of uppercase unicode letters or 1+ of lowercase unicode letters but not a mix of both cases.

Matching Arabic and English letters only javascript regex

I'm trying to write a regex that matches Arabic and English letters only (numbers and special characters are not allowed) spaces are allowed.
This regex worked fine but allows numbers in the middle of the string
/[\u0620-\u064A\040a-zA-Z]+$/
for example, it matches (سم111111ر) which suppose not to match.
The question is there a way not to match numbers in the middle of the letters.
Note in JavaScript you will have to use the ECMAScript 2018+ with Unicode category class support:
const texts = ['أسبوع أسبوع','week week','hunāka','سم111111ر'];
const re = /^(?:(?=[\p{Script=Arabic}A-Za-z])\p{L}|\s)+$/u;
for (const text of texts) {
console.log(text, '=>', re.test(text))
}
The ^(?:(?=[\p{Script=Arabic}A-Za-z])\p{L}|\s)+$ means
^ - start of string
(?: - start of a non-capturing group container:
(?=[\p{Script=Arabic}A-Za-z]) - a positive lookahead that requires a char from the Arabic script or an ASCII letter to occur immediately to the right of the current location
\p{L} - any Unicode letter (note \p{Alphabetic} includes a bit more "letter" chars, you may want to try it out)
| - or
\s - whitespace
)+ - repeat one or more times
$ - end of string.

How to match regular expression In Javascript

I have string [FBWS-1] comes first than [FBWS-2]
In this string, I want to find all occurance of [FBWS-NUMBER]
I tried this :
var term = "[FBWS-1] comes first than [FBWS-2]";
alert(/^([[A-Z]-[0-9]])$/.test(term));
I want to get all the NUMBERS where [FBWS-NUMBER] string is matched.
But no success. I m new to regular expressions.
Can anyone help me please.
Note that ^([[A-Z]-[0-9]])$ matches start of a string (^), a [ or an uppercase ASCII letter (with [[A-Z]), -, an ASCII digit and a ] char at the end of the string. So,basically, strings like [-2] or Z-3].
You may use
/\[[A-Z]+-[0-9]+]/g
See the regex demo.
NOTE If you need to "hardcode" FBWS (to only match values like FBWS-123 and not ABC-3456), use it instead of [A-Z]+ in the pattern, /\[FBWS-[0-9]+]/g.
Details
\[ - a [ char
[A-Z]+ - one or more (due to + quantifier) uppercase ASCII letters
- - a hyphen
[0-9]+ - one or more (due to + quantifier) ASCII digits
] - a ] char.
The /g modifier used with String#match() returns all found matches.
JS demo:
var term = "[FBWS-1] comes first than [FBWS-2]";
console.log(term.match(/\[[A-Z]+-[0-9]+]/g));
You can use:
[\w+-\d]
var term = "[FBWS-1] comes first than [FBWS-2]";
alert(/[\w+-\d]/.test(term));
There are several reasons why your existing regex doesn't work.
You trying to match the beginning and ending of your string when you
actually want everything in between, don't use ^$
Your only trying to match one alpha character [A-Z] you need to make this greedy using the +
You can shorten [A-Z] and [0-9] by using the shorthands \w and \d. The brackets are generally unnecessary.
Note your code only returns a true false value (your using test) ATM it's unclear if this is what you want. You may want to use match with a global modifier (//g) instead of test to get a collection.
Here is an example using string.match(reg) to get all matches strings:
var term = "[FBWS-1] comes first than [FBWS-2]";
var reg1 = /\[[A-Z]+-[0-9]\]/g;
var reg2 = /\[FBWS-[0-9]\]/g;
var arr1 = term.match(reg1);
var arr2 = term.match(reg2)
console.log(arr1);
console.log(arr2);
Your regular expression /^([[A-Z]-[0-9]])$/ is wrong.
Give this regex a try, /\[FBWS-\d\]/g
remove the g if you only want to find 1 match, as g will find all similar matches
Edit: Someone mentioned that you want ["any combination"-"number"], hence if that's what you're looking for then this should work /\[[A-Z]+-\d\]/

Javascript Regex: negative lookbehind

I am trying to replace in a formula all floating numbers that miss the preceding zero. Eg:
"4+.5" should become: "4+0.5"
Now I read look behinds are not supported in JavaScript, so how could I achieve that? The following code also replaces, when a digit is preceding:
var regex = /(\.\d*)/,
formula1 = '4+1.5',
formula2 = '4+.5';
console.log(formula1.replace(regex, '0$1')); //4+10.5
console.log(formula2.replace(regex, '0$1')); //4+0.5
Try this regex (\D)(\.\d*)
var regex = /(\D)(\.\d*)/,
formula1 = '4+1.5',
formula2 = '4+.5';
console.log(formula1.replace(regex, '$10$2'));
console.log(formula2.replace(regex, '$10$2'));
You may use
s = s.replace(/\B\.\d/g, '0$&')
See the regex demo.
Details
\B\. - matches a . that is either at the start of the string or is not preceded with a word char (letter, digit or _)
\d - a digit.
The 0$& replacement string is adding a 0 right in front of the whole match ($&).
JS demo:
var s = "4+1.5\n4+.5";
console.log(s.replace(/\B\.\d/g, '0$&'));
Another idea is by using an alternation group that matches either the start of the string or a non-digit char, capturing it and then using a backreference:
var s = ".4+1.5\n4+.5";
console.log(s.replace(/(^|\D)(\.\d)/g, '$10$2'));
The pattern will match
(^|\D) - Group 1 (referred to with $1 from the replacement pattern): start of string (^) or any non-digit char
(\.\d) - Group 2 (referred to with $2 from the replacement pattern): a . and then a digit

Regex to match char after string

I'm attempting to match the first 3 letters that could be a-z followed by a specific character.
For testing I'm using a regex online tester.
I thought this should work (without success):
^[a-z]{0,3}$[z]
My test string is abcz.
Hope you can tell me what I'm doing wrong.
If you need to match a whole string abcz, use
/^[a-z]{0,3}z$/
^^
or - if the 3 letters are compulsory:
/^[a-z]{3}z$/
See the regex demo.
The $[z] in your pattern attempts to match a z after the end of string anchor, which makes the regex fail always.
Details:
^ - string start
[a-z]{0,3} - 0 to 3 lowercase ASCII letters (to require 3 letters, remove 0,)
z - a z
$ - end of string anchor.
You've got the end of line identifier too early
/^[a-z]{0,3}[z]$/m
You can see a working version here
You can do away with the [] around z. Square brackets are used to define a range or list of characters to match - as you're matching only one they're not needed here.
/^[a-z]{0,3}z$/m

Categories