Regex to avoid specific content

Regex to avoid specific content - javascript

I have a string like 23DGERA#SPK_20W L+R FA-2#1+342HSHC#CPU_8PIN INTEL_TEST!#1+2356GHMX#SSD_256G MICRON_CONTENT#2 + blablabla.
What I would like to do is to split up the string by +, yet in SPK section there is a L+R that would interrupt the process. Is there any REGEX that could achieve what I want?
In result should be:
23DGERA#SPK_20W L+R FA-2#1
342HSHC#CPU_8PIN INTEL_TEST!#2
2356GHMX#SSD_256G MICRON_CONTENT#2
and now what i always get:
23DGERA#SPK_20W L
R FA-2#1
342HSHC#CPU_8PIN INTEL_TEST!#2
2356GHMX#SSD_256G MICRON_CONTENT#2
I'm using Javascript .split('+') by now.
Any help will be appretiated.

You can use a matching regex solution:
text.match(/(?:L\+R|[^+])+/g)
See the regex demo. Details:
(?: - start of a non-capturing group:
L\+R - L+R string
| - or
[^+] - any char other than +
)+ - end of the group, one or more occurrences.
See the JavaScript demo:
var text = '23DGERA#SPK_20W L+R FA-2#1+342HSHC#CPU_8PIN INTEL_TEST!#1+2356GHMX#SSD_256G MICRON_CONTENT#2';
console.log(text.match(/(?:L\+R|[^+])+/g));
ECMAScript 2018+ compliant solution
In case you want to migrate to a more modern ECMAScript flavor, you can use
text.split(/\+(?<!L\+(?=R))/)
This will match a + that is not part of an L+R string.
const text = '23DGERA#SPK_20W L+R FA-2#1+342HSHC#CPU_8PIN INTEL_TEST!#1+2356GHMX#SSD_256G MICRON_CONTENT#2';
console.log(text.split(/\+(?<!L\+(?=R))/));
See the regex demo.

Instead of splitting on a + you could match the format in the example data.
First match a part containing a single #, and then match till the first occurrence of # followed by a digit.
Note that the second match will be 342HSHC#CPU_8PIN INTEL_TEST!#1 instead of 342HSHC#CPU_8PIN INTEL_TEST!#2
\w+#\w+ [^#]*#\d\b
The pattern matches:
\w+#\w+ Match 1+ word characters, # and at 1+ word characters
[^#]*# Match a space, optional chars other than #, then match #
\d\b Match a digit and a word boundary to prevent a partial match
Regex demo
const s = "23DGERA#SPK_20W L+R FA-2#1+342HSHC#CPU_8PIN INTEL_TEST!#1+2356GHMX#SSD_256G MICRON_CONTENT#2 + blablabla";
const regex = /\w+#\w+ [^#]*#\d\b/g;
console.log(s.match(regex));

The string looks like a list of parts each with a quantity e.g. #1. You can use that to identify the correct + characters to split on.
Using a look-behind containing #\d+ -> (?<=#\d+) followed by the character you want to match (escaped because + has a special meaning) gives:
(?<=#\d+)\+
Using this in code we also need specify the g modifier to match all instances instead of just the first one.
const str = '23DGERA#SPK_20W L+R FA-2#1+342HSHC#CPU_8PIN INTEL_TEST!#1+2356GHMX#SSD_256G MICRON_CONTENT#2'
const items = str.split(/(?<=#\d+)\+/g);
console.log(items);

Related

How to format a JavaScript string with replaceAll using regex

I am trying to format a kind of a board game notation which consists of tabs and spaces.
The original string is looking like this:
1. \td11-d9 \te7-e10 \n2. \ta8-c8 \tg7-g10xf10 \n3. \th11-h9 \tf7-i7
I used this replace method to clean up all of the tabs and new lines
string.replace(/\s\s+/g, ' ').replaceAll('. ', '.');
So, after that the string is looking like this:
1.d11-d9 e7-e10 2.a8-c8 g7-g10xf10 3.h11-h9 f7-i7
However, I want to add more space before the number with the dot. So, the string must look like this with 3 spaces before the number of the move (the number with the dot):
1.d11-d9 e7-e10 2.a8-c8 g7-g10xf10 3.h11-h9 f7-i7
Can I also make all of these operations with a one line code or just one JavaScript method?

Here is how you can do this in a single .replace call:
const s = "1. \td11-d9 \te7-e10 \n2. \ta8-c8 \tg7-g10xf10 \n3. \th11-h9 \tf7-i7 ";
var r = s.replace(/([.\s])\s*\t|\s+$|\n(?=\d\.)/g, '$1');
console.log(r);
//=> "1.d11-d9 e7-e10 2.a8-c8 g7-g10xf10 3.h11-h9 f7-i7"
RegEx Breakup:
([.\s])\s*\t: Match dot or a whitespace and capture in group #1 followed by 0+ whitespaces followed by a tab. We will put back this replacement using $1
|: OR
\s+$: Match 1+ whitespaces before end
|: OR
\n(?=\d\.): Match \n if it is followed by a digit and a dot

You can use lookahead with (?=[1-9]) and (?=[a-z]) to check if the number add two spaces, and if a letter just add one space.
const string = `1. \td11-d9 \te7-e10 \n2. \ta8-c8 \tg7-g10xf10 \n3. \th11-h9 \tf7-i7`
const result = string.replace(/\s+(?=[a-z])/gi, ' ').replace(/\s+(?=[1-9])/gi, ' ').replaceAll('. ', '.');
console.log(result)

Regular expression capture with optional trailing underscore and number

I'm trying to find a regular expression that will match the base string without the optional trailing number (_123). e.g.:
lorem_ipsum_test1_123 -> capture lorem_ipsum_test1
lorem_ipsum_test2 -> capture lorem_ipsum_test2
I tried using the following expression, but it would only work when there is a trailing _number.
/(.+)(?>_[0-9]+)/
/(.+)(?>_[0-9]+)?/
Similarly, adding the ? (zero or more) quantifier only worked when there is no trailing _number, otherwise, the trailing _number would just be part of the first capture.
Any suggestions?

You may use the following expression:
^(?:[^_]+_)+(?!\d+$)[^_]+
^ Anchor beginning of string.
(?:[^_]+_)+ Repeated non capturing group. Negated character set for anything other than a _, followed by a _.
(?!\d+$) Negative lookahead for digits at the end of the string.
[^_]+ Negated character set for anything other than a _.
Regex demo here.
Please note that the \n in the character sets in the Regex demo are only for demonstration purposes, and should by all means be removed when using as a pattern in Javascript.
Javascript demo:
var myString = "lorem_ipsum_test1_123";
var myRegexp = /^(?:[^_]+_)+(?!\d+$)[^_]+/g;
var match = myRegexp.exec(myString);
console.log(match[0]);
var myString = "lorem_ipsum_test2"
var myRegexp = /^(?:[^_]+_)+(?!\d+$)[^_]+/g;
var match = myRegexp.exec(myString);
console.log(match[0]);

You might match any character and use a negative lookahead that asserts that what follows is not an underscore, one or more digits and the end of the string:
^(?:(?!_\d+$).)*
Explanation
^ Assert start of the string
(?: Non capturing group
(?! Negative lookahead to assert what is on the right side is not
_\d+$Match an underscore, one or more digits and assert end of the string
.) Match any character and close negative lookahead
)* Close non capturing group and repeat zero or more times
Regex demo
const strings = [
"lorem_ipsum_test1_123",
"lorem_ipsum_test2"
];
let pattern = /^(?:(?!_\d+$).)*/;
strings.forEach((s) => {
console.log(s + " ==> " + s.match(pattern)[0]);
});

You are asking for
/^(.*?)(?:_\d+)?$/
See the regex demo. The point here is that the first dot pattern must be non-greedy and the _\d+ should be wrapped with an optional non-capturing group and the whole pattern (especially the end) must be enclosed with anchors.
Details
^ - start of string
(.*?) - Capturing group 1: any zero or more chars other than line break chars, as few as possible due to the non-greedy ("lazy") quantifier *?
(?:_\d+)? - an optional non-capturing group matching 1 or 0 occurrences of _ and then 1+ digits
$ - end of string.
However, it seems easier to use a mere replacing approach,
s = s.replace(/_\d+$/, '')
If the string ends with _ and 1+ digits, the substring will get removed, else, the string will not change.
See this regex demo.

Try to check if the string contains the trailing number. If it does you get only the other part. Otherwise you get the whole string.
var str = "lorem_ipsum_test1_123"
if(/_[0-9]+$/.test(str)) {
console.log(str.match(/(.+)(?=_[0-9]+)/g))
} else {
console.log(str)
}
Or, a lot more concise:
str = str.replace(/_[0-9]+$/g, "")

How to match regular expression In Javascript

I have string [FBWS-1] comes first than [FBWS-2]
In this string, I want to find all occurance of [FBWS-NUMBER]
I tried this :
var term = "[FBWS-1] comes first than [FBWS-2]";
alert(/^([[A-Z]-[0-9]])$/.test(term));
I want to get all the NUMBERS where [FBWS-NUMBER] string is matched.
But no success. I m new to regular expressions.
Can anyone help me please.

Note that ^([[A-Z]-[0-9]])$ matches start of a string (^), a [ or an uppercase ASCII letter (with [[A-Z]), -, an ASCII digit and a ] char at the end of the string. So,basically, strings like [-2] or Z-3].
You may use
/\[[A-Z]+-[0-9]+]/g
See the regex demo.
NOTE If you need to "hardcode" FBWS (to only match values like FBWS-123 and not ABC-3456), use it instead of [A-Z]+ in the pattern, /\[FBWS-[0-9]+]/g.
Details
\[ - a [ char
[A-Z]+ - one or more (due to + quantifier) uppercase ASCII letters
- - a hyphen
[0-9]+ - one or more (due to + quantifier) ASCII digits
] - a ] char.
The /g modifier used with String#match() returns all found matches.
JS demo:
var term = "[FBWS-1] comes first than [FBWS-2]";
console.log(term.match(/\[[A-Z]+-[0-9]+]/g));

You can use:
[\w+-\d]
var term = "[FBWS-1] comes first than [FBWS-2]";
alert(/[\w+-\d]/.test(term));
There are several reasons why your existing regex doesn't work.
You trying to match the beginning and ending of your string when you
actually want everything in between, don't use ^$
Your only trying to match one alpha character [A-Z] you need to make this greedy using the +
You can shorten [A-Z] and [0-9] by using the shorthands \w and \d. The brackets are generally unnecessary.
Note your code only returns a true false value (your using test) ATM it's unclear if this is what you want. You may want to use match with a global modifier (//g) instead of test to get a collection.

Here is an example using string.match(reg) to get all matches strings:
var term = "[FBWS-1] comes first than [FBWS-2]";
var reg1 = /\[[A-Z]+-[0-9]\]/g;
var reg2 = /\[FBWS-[0-9]\]/g;
var arr1 = term.match(reg1);
var arr2 = term.match(reg2)
console.log(arr1);
console.log(arr2);

Your regular expression /^([[A-Z]-[0-9]])$/ is wrong.
Give this regex a try, /\[FBWS-\d\]/g
remove the g if you only want to find 1 match, as g will find all similar matches
Edit: Someone mentioned that you want ["any combination"-"number"], hence if that's what you're looking for then this should work /\[[A-Z]+-\d\]/

Javascript Regex: negative lookbehind

I am trying to replace in a formula all floating numbers that miss the preceding zero. Eg:
"4+.5" should become: "4+0.5"
Now I read look behinds are not supported in JavaScript, so how could I achieve that? The following code also replaces, when a digit is preceding:
var regex = /(\.\d*)/,
formula1 = '4+1.5',
formula2 = '4+.5';
console.log(formula1.replace(regex, '0$1')); //4+10.5
console.log(formula2.replace(regex, '0$1')); //4+0.5

Try this regex (\D)(\.\d*)
var regex = /(\D)(\.\d*)/,
formula1 = '4+1.5',
formula2 = '4+.5';
console.log(formula1.replace(regex, '$10$2'));
console.log(formula2.replace(regex, '$10$2'));

You may use
s = s.replace(/\B\.\d/g, '0$&')
See the regex demo.
Details
\B\. - matches a . that is either at the start of the string or is not preceded with a word char (letter, digit or _)
\d - a digit.
The 0$& replacement string is adding a 0 right in front of the whole match ($&).
JS demo:
var s = "4+1.5\n4+.5";
console.log(s.replace(/\B\.\d/g, '0$&'));
Another idea is by using an alternation group that matches either the start of the string or a non-digit char, capturing it and then using a backreference:
var s = ".4+1.5\n4+.5";
console.log(s.replace(/(^|\D)(\.\d)/g, '$10$2'));
The pattern will match
(^|\D) - Group 1 (referred to with $1 from the replacement pattern): start of string (^) or any non-digit char
(\.\d) - Group 2 (referred to with $2 from the replacement pattern): a . and then a digit

Regex needed to split a string by "."

I am in need for a regex in Javascript. I have a string:
'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'
I want to split this string by periods such that I get an array:
[
'*window',
'some1',
'some\.2', //ignore the . because it's escaped
'(a.b ? cc\.c : d.n [a.b, cc\.c])', //ignore everything inside ()
'some\.3',
'(this.o.p ? ".mike." [ff\.])',
'some5'
]
What regex will do this?

var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array
Fiddle: http://jsfiddle.net/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:
/ Start of RegExp literal
(?: Create a group without reference (example: say, group A)
\( `(` character
(?: Create a group without reference (example: say, group B)
(['"]) ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
\) `)` character
\1 The character as matched at group 1, either `'` or `"`
| OR
[^)]+? Any non-`)` character, at least once (see below)
)+ End of group (B). Let this group occur at least once
| OR
\\\. `\.` (escaped backslash and dot, because they're special chars)
| OR
[^.]+? Any non-`.` character, at least once (see below)
)+ End of group (A). Let this group occur at least once
/g "End of RegExp, global flag"
/*Summary: Match everything which is not satisfying the split-by-dot
condition as specified by the OP*/
There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.
The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.
When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:
Index 0: <Whole match>
Index 1: <Group 1>

The regex below :
result = subject.match(/(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g);
Can be used to acquire the desired results. Group 1 has the results since you want to omit the .
Use this :
var myregexp = /(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
for (var i = 0; i < match.length; i++) {
// matched text: match[i]
}
match = myregexp.exec(subject);
}
Explanation :
// (?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))
//
// Match the regular expression below «(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))»
// Match the regular expression below and capture its match into backreference number 1 «(\(.*?[^'"]\)|.*?[^\\])»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\(.*?[^'"]\)»
// Match the character “(” literally «\(»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match a single character NOT present in the list “'"” «[^'"]»
// Match the character “)” literally «\)»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^\\]»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match any character that is NOT a “A \ character” «[^\\]»
// Match the regular expression below «(?:\.|$)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\.»
// Match the character “.” literally «\.»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
// Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

It is notoriously difficult to use a Regex to do balanced parenthesis matching, especially in Javascript.
You would be way better off creating your own parser. Here's a clever way to do this that will utilize the strength of Regex's:
Create a Regex that matches and captures any "pattern of interest" - /(?:(\\.)|([\(\[\{])|([\)\]\}])|(\.))/g
Use string.replace(pattern, function (...)), and in the function, keep a count of opening braces and closing braces.
Add the matching text to a buffer.
If the split character is found and the opening and closing braces are balanced, add the buffer to your results array.
This solution will take a bit of work, and requires knowledge of closures, and you should probably see the documentation of string.replace, but I think it is a great way to solve your problem!
Update:
After noticing the number of questions related to this one, I decided to take on the above challenge.
Here is the live code to use a Regex to split a string.
This code has the following features:
Uses a Regex pattern to find the splits
Only splits if there are balanced parenthesis
Only splits if there are balanced quotes
Allows escaping of parenthesis, quotes, and splits using \
This code will work perfectly for your example.

not need regex for this work.
var s = '*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5';
console.log(s.match(/(?:\([^\)]+\)|.*?\.)/g));
output:
["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]

So, was working with this, and now I see that #FailedDev is rather not a failure, since that was pretty nice. :)
Anyhow, here's my solution. I'll just post the regex only.
((\(.*?((?<!")\)(?!")))|((\\\.)|([^.]))+)
Sadly this won't work in your case however, since I'm using negative lookbehind, which I don't think is supported by javascript regex engine. It should work as intended in other engines however, as can be confirmed here: http://gskinner.com/RegExr/. Replace with $1\n.

We Keep Coding

JavaScript is the programming language of the Web.

Regex to avoid specific content - javascript

Related

How to format a JavaScript string with replaceAll using regex

Regular expression capture with optional trailing underscore and number

How to match regular expression In Javascript

Javascript Regex: negative lookbehind

Regex needed to split a string by "."

Categories

Resources