Regex keeps finding character I want matched along with previous character

Regex keeps finding character I want matched along with previous character - javascript

I have the following regex in javascript for a split operation since I can't do a negative look behind to find any delimiters , in a string that is not proceeded by one or more escape characters of \.
[^\\],
The regex works fine for finding where the commas not proceeded by \ are, but also finds the character that proceeds the comma as a match and thus splits the string incorrectly.
For example if I had the string
hello\,there,are
The result would be that e, matches my regex and not just ,. Making the split string array read
[hello\,ther] [are]
Why does the regex I am using keep finding the comma and the proceeding character instead of only matching the comma?

You cannot use split here because you'd need a lookbehind that JS regex does not support. Use a match with appropriate regex. Like the one below:
/(?:[^\\,]|\\.)+/g
See the regex demo.
The pattern matches 1 or more (+) sequences of any char other than , and \ ([^\\,]) or (|) any escaped character (excluding linebreak chars) with \\.
JS demo:
var regex = /(?:[^\\,]|\\.)+/g;
var str = "hello\\,there,are";
var res = str.match(regex);
console.log(res);

Related

Regex remove all leading and trailing special characters?

Let's say I have the following string in javascript:
&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&
I want to remove all the leading and trailing special characters (anything which is not alphanumeric or alphabet in another language) from all the words.
So the string should look like
a.b.c a.b.c a.b.c a.b.c a.b&.c a.b.&&dc ê.b..c
Notice how the special characters in between the alphanumeric is left behind. The last ê is also left behind.

This regex should do what you want. It looks for
start of line, or some spaces (^| +) captured in group 1
some number of symbol characters [!-\/:-#\[-``\{-~]*
a minimal number of non-space characters ([^ ]*?) captured in group 2
some number of symbol characters [!-\/:-#\[-``\{-~]*
followed by a space or end-of-line (using a positive lookahead) (?=\s|$)
Matches are replaced with just groups 1 and 2 (the spacing and the characters between the symbols).
let str = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*([^ ]*?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
Note that if you want to preserve a string of punctuation characters on their own (e.g. as in Apple & Sauce), you should change the second capture group to insist on there being one or more non-space characters (([^ ]+?)) instead of none and add a lookahead after the initial match of punctuation characters to assert that the next character is not punctuation:
let str = 'Apple &&& Sauce; -This + !That!';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*(?![!-\/:-#\[-`\{-~])([^ ]+?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);

a-zA-Z\u00C0-\u017F is used to capture all valid characters, including diacritics.
The following is a single regular expression to capture each individual word. The logic is that it will look for the first valid character as the beginning of the capture group, and then the last sequence of invalid characters before a space character or string terminator as the end of the capture group.
const myRegEx = /[^a-zA-Z\u00C0-\u017F]*([a-zA-Z\u00C0-\u017F].*?[a-zA-Z\u00C0-\u017F]*)[^a-zA-Z\u00C0-\u017F]*?(\s|$)/g;
let myString = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&'.replace(myRegEx, '$1$2');
console.log(myString);

Something like this might help:
const string = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
const result = string.split(' ').map(s => /^[^a-zA-Z0-9ê]*([\w\W]*?)[^a-zA-Z0-9ê]*$/g.exec(s)[1]).join(' ');
console.log(result);
Note that this is not one single regex, but uses JS help code.
Rough explanation: We first split the string into an array of strings, divided by spaces. We then transform each of the substrings by stripping
the leading and trailing special characters. We do this by capturing all special characters with [^a-zA-Z0-9ê]*, because of the leading ^ character it matches all characters except those listed, so all special characters. Between these two groups we capture all relevant characters with ([\w\W]*?). \w catches words, \W catches non-words, so \w\W catches all possible characters. By appending the ? after the *, we make the quantifier * lazy, so that the group stops catching as soon as the next group, which catches trailing special characters, catches something. We also start the regex with a ^ symbol and end it with an $ symbol to capture the entire string (they respectively set anchors to the start end the end of the string). With .exec(s)[1] we then execute the regex on the substring and return the first capturing group result in our transform function. Note that this might be null if a substring does not include proper characters. At the end we join the substrings with spaces.

Regex catch from the hash sign "#" to the next white space

I have a script line this :
#type1 this is the text of the note
I've tried this bu didn't workout for me :
^\#([^\s]+)
I watch to catch type in other words I to get whats between the hash sign "#" and the next white space, excluding the hash "#" sign, and the string that I want to catch is alphanumeric string.

With the regex functionality provided by Javascript:
exec_result = /#(\w*)/.exec('#whatever string comes here');
I believe exec_result[1] should be the string you want.
The return value of exec() method could be found over here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

You're really close:
/^\#(\w+)\s/
The \w matches any letters or numbers (and underscores too). And the space should be outside the matching group since I guess you don't want to capture it.

To get an alphanumeric match (which will get you type1), instead of the negated character class [^\s] which matches not a whitespace character, you could use a character class and specify what you want to match like [A-Za-z0-9].
Then use a negative lookahead to assert what is on the right is not a non-whitespace char:
^#([A-Za-z0-9]+)(?!\S)
Regex demo
Your match is in the first capturing group. Note that you don't have to escape the \#
For example using the case insensitive flag /i
const regex = /^#([A-Za-z0-9]+)(?!\S)/i;
const str = `#type1 this is the text of the note`;
console.log(str.match(regex)[1]);
If you only want to match type, you might use:
^#([a-z]+)[a-z0-9]*(?!\S)
Regex demo
const regex = /^#([a-z]+)[a-z0-9]*(?!\S)/i;
const str = `#type1 this is the text of the note`;
console.log(str.match(regex)[1]);

I've figured it out.
/^\#([^\s]+)+(.*)$/

Regex: How do I remove the character BEFORE the matched string?

I am intercepting messages which contain the following characters:
*_-
However, whenever any one of these characters comes through, it will always be preceded by a \. The \ is just for formatting though and I want to remove it before sending it off to my server. I know how to easily create a regex which would remove this backslash from a single letter:
'omg\_bbq\_everywhere'.replace(/\\_/g, '')
And I recognize I could just do this operation 3 times: once for each character I want to remove the preceding backslash for. But how can I create a single regex which would detect all three characters and remove the preceding backslash in all 3 cases?

You can use a character class like [*_-].
To remove only the backslash before these characters:
document.body.innerHTML =
"omg\\-bbq\\*everywhere\\-".replace(/\\([*_-])/g, '$1');
When you place a subpattern into a capturing group ((...)), you capture that subtext into a numbered buffer, and then you can reference it with a $1 backreference (1 because there is only one (...) in the pattern.)

This is a good time to use atomic matching. Specifically you want to check for the slash and then positive lookahead for any of those characters.
Ignoring the code, the raw regex you want is:
\\(?=[*_-])
A literal backslash, with one of these characters in front of it: *_-
So now you are matching the slash. The atomic match is a 0 length match, so it doesn't match anything, but sets a requirement that "for this to be a valid match, it needs to be followed by [*_-]"
Atomic groups: http://www.regular-expressions.info/atomic.html
Lookaround statements: http://www.regular-expressions.info/lookaround.html
Positive and negative lookahead and lookbehind matches are available.

How to split regex space and punctuation matches, but keep the punctuation in the resulting array?

I want to split a string with regex matching spaces, commas, question marks, and exclamation points. But I'd like to include the matched punctuation in the resulting array (Spaces should be discarded.) For example:
Regex irritates me, I can't take it!
the string above should split() to:
["Regex", "irritates", "me", ",", "I", "can't", "take", "it", "!"]
I'm starting easy with just spaces and commas for now; I have the following code:
inputStr.split(/\s|(,)/);
Unfortunately, it gives me undefined items - I'm doing it wrong. I spent a couple hours researching (as usual) and coming up empty. I read about "lookahead" but can't figure it out either. Can any regex gurus give me a hand?

Try using String.prototype.match() with RegExp /(\w+'\w+)|\w+|,|\!/g
(\w+'\w+) Matches \w+'\w+ and remembers the match. These are called capturing groups. \w+'\w+ matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_] , followed by match apostrophe , followed by match alphanumeric character.
+ matches the preceding item \w 1 or more times. Equivalent to {1,}.
\w+ Matches any alphanumeric character from the basic Latin alphabet, including the underscore.
, Matches comma
\! Matches exclamation mark
See RegExp
var str = "Regex irritates me, I can't take it!";
var res = str.match(/(\w+'\w+)|\w+|,|\!/g);
console.log(res)

This should work
String pat="Regex irritates me, I can't take it!"
pat.split("\s");

The regex string is ([\w'!,]*)\S
Explanation:
the ( ) captures groups.
the [ \w'! ]* captures any word character, apostrophe, or exclamation
the \S will not capture the space.
Try it in regexpal.com

regex and javascript

using http://www.regular-expressions.info/javascriptexample.html I tested the following regex
^\\{1}([0-9])+
this is designed to match a backslash and then a number.
It works there
If I then try this directly in code
var reg = /^\\{1}([0-9])+/;
reg.exec("/123")
I get no matches!
What am I doing wrong?

Update:
Regarding the update of your question. Then the regex has to be:
var reg = /^\/(\d+)/;
You have to escape the slash inside the regex with \/.
The backslash needs to be escaped in the string too:
reg.exec("\\123")
Otherwise \1 will be treated as special character.
Btw, the regular expression can be simplified:
var reg = /^\\(\d+)/;
Note that I moved the quantifier + inside the capture group, otherwise it will only capture a single digit (namely 3) and not the whole number 123.

You need to escape the backslash in your string:
"\\123"
Also, for various implementation bugs, you may want to set reg.lastIndex = 0;.
In addition, {1} is completely redundant, you can simplify your regex to /^\\(\d)+/.
One last note: (\d)+ will only capture the last digit, you may want (\d+).

We Keep Coding

JavaScript is the programming language of the Web.

Regex keeps finding character I want matched along with previous character - javascript

Related

Regex remove all leading and trailing special characters?

Regex catch from the hash sign "#" to the next white space

Regex: How do I remove the character BEFORE the matched string?

How to split regex space and punctuation matches, but keep the punctuation in the resulting array?

regex and javascript

Categories

Resources