How To Match the begining and the end of string - javascript

How To Match the begining and the end of a string but ignore everything in between. I made it like this, but the result is always zero for everything.
document.write( (str.match(/^["“”'’]$/g) || []).length );
// expected results
str = "please \"don't\" talk"; // 2
str = "please don't talk"; // 0
str = "Thomas' car"; // 1
to macth all the qoutes like in my code str.match(/["“”'’]/g)
but giving ^ and $ is always giving zero result

Try
document.write( (str.match(/["“”'’]\s|\s["“”'’]/g) || []).length );
and you will get a count for every quote that has an adjacent space. It won't work for a quote at the end of the string. If you need that also, ask.
The | separates different searches in this case. So we are searching for a quote followed by a space or a space followed by a quote. If you explicitly want the \" to be counted then the expression would be a little more complex. I am assuming the \ is just escaping " in the string.

You need
\B["“”'’]|["“”'’]\B
See demo
The non-word boundary \B make sure the match only occurs at a position that is not word boundary position:
\B matches at any position between two word characters as well as at any position between two non-word characters.
So, \B["“”'’] matches all substrings that are not preceded with a word ([a-zA-Z0-9_]) character, and ["“”'’]\B matches the apostrophes that are not followed by a word character.

Related

Regex remove all leading and trailing special characters?

Let's say I have the following string in javascript:
&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&
I want to remove all the leading and trailing special characters (anything which is not alphanumeric or alphabet in another language) from all the words.
So the string should look like
a.b.c a.b.c a.b.c a.b.c a.b&.c a.b.&&dc ê.b..c
Notice how the special characters in between the alphanumeric is left behind. The last ê is also left behind.
This regex should do what you want. It looks for
start of line, or some spaces (^| +) captured in group 1
some number of symbol characters [!-\/:-#\[-``\{-~]*
a minimal number of non-space characters ([^ ]*?) captured in group 2
some number of symbol characters [!-\/:-#\[-``\{-~]*
followed by a space or end-of-line (using a positive lookahead) (?=\s|$)
Matches are replaced with just groups 1 and 2 (the spacing and the characters between the symbols).
let str = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*([^ ]*?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
Note that if you want to preserve a string of punctuation characters on their own (e.g. as in Apple & Sauce), you should change the second capture group to insist on there being one or more non-space characters (([^ ]+?)) instead of none and add a lookahead after the initial match of punctuation characters to assert that the next character is not punctuation:
let str = 'Apple &&& Sauce; -This + !That!';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*(?![!-\/:-#\[-`\{-~])([^ ]+?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
a-zA-Z\u00C0-\u017F is used to capture all valid characters, including diacritics.
The following is a single regular expression to capture each individual word. The logic is that it will look for the first valid character as the beginning of the capture group, and then the last sequence of invalid characters before a space character or string terminator as the end of the capture group.
const myRegEx = /[^a-zA-Z\u00C0-\u017F]*([a-zA-Z\u00C0-\u017F].*?[a-zA-Z\u00C0-\u017F]*)[^a-zA-Z\u00C0-\u017F]*?(\s|$)/g;
let myString = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&'.replace(myRegEx, '$1$2');
console.log(myString);
Something like this might help:
const string = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
const result = string.split(' ').map(s => /^[^a-zA-Z0-9ê]*([\w\W]*?)[^a-zA-Z0-9ê]*$/g.exec(s)[1]).join(' ');
console.log(result);
Note that this is not one single regex, but uses JS help code.
Rough explanation: We first split the string into an array of strings, divided by spaces. We then transform each of the substrings by stripping
the leading and trailing special characters. We do this by capturing all special characters with [^a-zA-Z0-9ê]*, because of the leading ^ character it matches all characters except those listed, so all special characters. Between these two groups we capture all relevant characters with ([\w\W]*?). \w catches words, \W catches non-words, so \w\W catches all possible characters. By appending the ? after the *, we make the quantifier * lazy, so that the group stops catching as soon as the next group, which catches trailing special characters, catches something. We also start the regex with a ^ symbol and end it with an $ symbol to capture the entire string (they respectively set anchors to the start end the end of the string). With .exec(s)[1] we then execute the regex on the substring and return the first capturing group result in our transform function. Note that this might be null if a substring does not include proper characters. At the end we join the substrings with spaces.

(/\s+(\W)/g, '$1') - how are the spaces being removed?

let a = ' lots of spaces in this ! '
console.log(a.replace(/\s+(\W)/g, '$1'))
log shows lots of spaces in this!
The above regex does exactly what I want, but I am trying to understand why?
I understand the following:
s+ is looking for 1 or more spaces
(\W) is capturing the non-alphanumeric characters
/g - global, search/replace all
$1 returns the prior alphanumeric character
The capture/$1 is what removes the space between the words This and !
I get it, but what I don't get is HOW are all the other spaces being removed?? I don't believe I have asked for them to (although I am happy they are).
I get this one console.log(a.replace(/\s+/g, ' ')); because the replace is replacing 1 or more spaces between alphanumeric characters with a single space ' '.
I'm scratching my head to understand HOW the first RegEx /\s+(\W)/g, '$1'replaces 1 or more spaces with a single space.
What your regex says is "match one or more spaces, followed by one or more non-alphanumeric character, and replace that whole result with that one or more non-alphanumeric character". The key is that the \s+ is greedy, meaning that it will try and match as many characters as possible. So in any given string of spaces it will try and match all of the spaces it can. However, your regex also requires one or more non-word characters (\W+). Because in your case the next character after each final space is a word character (i.e. a letter), this last part of the regex must match the last space.
Therefore, given the string a b, and using parens to mark the \s+ and \W+ matches, a( )( )b is the only way for the regex to be valid (\s+ matches the first two spaces and \W+ matches the last space). Now it's just a simple substitution. Since you wrapped the \W+ in parentheses that makes it the first and only capturing group, so replacing the match with $1 will replace it with that final space.
As another example, running this replace against a !b will result in the match looking like a( )(!)b (since ! is now the last non-word character), so the final replaced result will be a!b.
Lets take this string 'aaa &bbb' and run it through.
We get 'aaa&bbb'
\s+ grabs the 3 spaces before the ampersand
(\W) grabs the ampersand
$1 is the ampersand and replaces ' &' with '&'
That same principal applies to the spaces. You are forcing one of the spaces to satisfy the (\W) capture group for the replacement. It's also why your exclamation point isn't nuked.
List of matches would be the following. I replaced space with ☹ so it is easier to see
"☹☹☹☹(☹)",
"☹☹☹☹(☹)",
"☹☹(!)",
"☹(☹)"
And the code is saying to replace the match with what is in the capture group.
' lots of☹☹☹☹(☹)spaces☹☹☹☹(☹)in this☹☹(!)☹(☹)'
so when you replace it you get
' lots of☹spaces☹in this!☹'

Regex to match # followed by square brackets containing a number

I want to parse a pattern similar to this using javascript:
#[10] or #[15]
With all my efforts, I came up with this:
#\\[(.*?)\\]
This pattern works fine but the problem is it matches anything b/w those square brackets. I want it to match only numbers. I tried these too:
#\\[(0-9)+\\]
and
#\\[([(0-9)+])\\]
But these match nothing.
Also, I want to match only pattern which are complete words and not part of a word in the string. i.e. should contain spaces both side if its not starting or ending the script. That means it should not match phrase like this:
abxdcs#[13]fsfs
Thanks in advance.
Use the regex:
/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
It will match if the pattern (#[number]) is not a part of a word. Should contain spaces both sides if its not starting or ending the string.
It uses groups, so if need the digits, use the group 1.
Testing code (click here for demo):
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("#[10]")); // true
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("#[15]")); // true
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("abxdcs#[13]fsfs")); // false
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("abxdcs #[13] fsfs")); // true
var r1 = /(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
var match = r1.exec("#[10]");
console.log(match[1]); // 10
var r2 = /(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
var match2 = r2.exec("abxdcs #[13] fsfs");
console.log(match2[1]); // 13
var r3 = /(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
var match3;
while (match3 = r3.exec("#[111] #[222]")) {
console.log(match3[1]);
}
// while's output:
// 111
// 222
You were close, but you need to use square brackets:
#\[[0-9]+\]
Or, a shorter version:
#\[\d+\]
The reason you need those slashes is to "escape" the square bracket. Usually they are used for denoting a "character class".
[0-9] creates a character class which matches exactly one digit in the range of 0 to 9. Adding the + changes the meaning to "one or more". \d is just shorthand for [0-9].
Of course, the backslash character is also used to escape characters inside of a javascript string, which is why you must escape them. So:
javascript
"#\\[\\d+\\]"
turns into:
regex
#\[\d+\]
which is used to match:
# a literal "#" symbol
\[ a literal "[" symbol
\d+ one or more digits (nearly identical to [0-9]+)
\] a literal "]" symbol
I say that \d is nearly identical to [0-9] because, in some regex flavors (including .NET), \d will actually match numeric digits from other cultures in addition to 0-9.
You don't need so many characters inside the character class. More importantly, you put the + in the wrong place. Try this: #\\[([0-9]+)\\].

regex string replace

I am trying to do a basic string replace using a regex expression, but the answers I have found do not seem to help - they are directly answering each persons unique requirement with little or no explanation.
I am using str = str.replace(/[^a-z0-9+]/g, ''); at the moment. But what I would like to do is allow all alphanumeric characters (a-z and 0-9) and also the '-' character.
Could you please answer this and explain how you concatenate expressions.
This should work :
str = str.replace(/[^a-z0-9-]/g, '');
Everything between the indicates what your are looking for
/ is here to delimit your pattern so you have one to start and one to end
[] indicates the pattern your are looking for on one specific character
^ indicates that you want every character NOT corresponding to what follows
a-z matches any character between 'a' and 'z' included
0-9 matches any digit between '0' and '9' included (meaning any digit)
- the '-' character
g at the end is a special parameter saying that you do not want you regex to stop on the first character matching your pattern but to continue on the whole string
Then your expression is delimited by / before and after.
So here you say "every character not being a letter, a digit or a '-' will be removed from the string".
Just change + to -:
str = str.replace(/[^a-z0-9-]/g, "");
You can read it as:
[^ ]: match NOT from the set
[^a-z0-9-]: match if not a-z, 0-9 or -
/ /g: do global match
More information:
https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Regular_Expressions
Your character class (the part in the square brackets) is saying that you want to match anything except 0-9 and a-z and +. You aren't explicit about how many a-z or 0-9 you want to match, but I assume the + means you want to replace strings of at least one alphanumeric character. It should read instead:
str = str.replace(/[^-a-z0-9]+/g, "");
Also, if you need to match upper-case letters along with lower case, you should use:
str = str.replace(/[^-a-zA-Z0-9]+/g, "");
str = str.replace(/\W/g, "");
This will be a shorter form
We can use /[a-zA-Z]/g to select small letter and caps letter sting in the word or sentence and replace.
var str = 'MM-DD-yyyy'
var modifiedStr = str.replace(/[a-zA-Z]/g, '_')
console.log(modifiedStr)

Need help with a regular expression in Javascript

The box should allow:
Uppercase and lowercase letters (case insensitive)
The digits 0 through 9
The characters, ! # $ % & ' * + - / = ? ^ _ ` { | } ~
The character "." provided that it is not the first or last character
Try
^(?!\.)(?!.*\.$)[\w.!#$%&'*+\/=?^`{|}~-]*$
Explanation:
^ # Anchor the match at the start of the string
(?!\.) # Assert that the first characters isn't a dot
(?!.*\.$) # Assert that the last characters isn't a dot
[\w.!#$%&'*+\/=?^`{|}~-]* # Match any number of allowed characters
$ # Anchor the match at the end of the string
Try something like this:
// the '.' is not included in this:
var temp = "\\w,!#$%&'*+/=?^`{|}~-";
var regex = new RegExp("^["+ temp + "]([." + temp + "]*[" + temp + "])?$");
// ^
// |
// +---- the '.' included here
Looking at your comments it's clear you don't know exactly what a character class does. You don't need to separate the characters with comma's. The character class:
[0-9,a-z]
matches a single (ascii) -digit or lower case letter OR a comma. Note that \w is a "short hand class" that equals [a-zA-Z0-9_]
More information on character classes can be found here:
http://www.regular-expressions.info/charclass.html
You can do something like:
^[a-zA-Z0-9,!#$%&'*+-/=?^_`{|}~][a-zA-Z0-9,!#$%&'*+-/=?^_`{|}~.]*[a-zA-Z0-9,!#$%&'*+-/=?^_`{|}~]$
Here's how I would do it:
/^[\w!#$%&'*+\/=?^`{|}~-]+(?:\.[\w!#$%&'*+\/=?^`{|}~-]+)*$/
The first part is required to match at least one non-dot character, but everything else is optional, allowing it to match a string with only one (non-dot) character. Whenever a dot is encountered, at least one non-dot character must follow, so it won't match a string that begins or ends with a dot.
It also won't match a string with two or more consecutive dots in it. You didn't specify that, but it's usually one of the requirements when people ask for patterns like this. If you want to permit consecutive dots, just change the \. to \.+.

Categories