Remove trailing and leading white spaces around delimiter using regex - javascript

I am trying to remove white spaces from a string. However, I want to remove spaces around the delimiter and from beginning and ending of the string.
Before:
" one two, three , four ,five six,seven "
After:
"one two,three,four,five six,seven"
I've tried this pattern without success:
/,\s+|\s$/g,","

You could use /\s*,\s*/g, and then .trim() the string.

Use the regex ^\s+|(,)\s+|\s+(?=,)|\s$ and replace matches with the first capturing group $1:
var string = " one two, three , four ,five six,seven ";
console.log(string.replace(/^\s+|(,)\s+|\s+(?=,)|\s$/g, '$1'));
The capturing group is either empty or contains a comma when the regex engine encounters a space after a comma (,)\s+ (for which we would better use lookbehind, but JavaScript does not support it).

Related

Regex for any character except quote after comma

I want to match every word separated by comma, but it must not include a quote like ' or ".
I was using this regex:
^[a-zA-Z0-9][\!\[\#\\\:\;a-zA-Z0-9`_\s,]+[a-zA-Z0-9]$
However, it only matches a character and number and not a symbol.
The output should be:
example,example //true
exaplle,examp#3 //true, with symbol or number
example, //false, because there is no word after comma
,example //false, because there is no word before comma
##example&$123,&example& //true, with all character and symbol except quote
You can match 1+ times what is present in the character class. Then repeat 1+ times in a non capturing group (?: what is present in the character class, preceded by a comma.
^[!\[#\\:;a-zA-Z0-9`_ &$#]+(?:,[!\[#\\:;a-zA-Z0-9`_ &$#]+)+$
Regex demo
Note that you don't have to escape \!, \#, \: and \; in the character class, and that \s might also possibly match a newline.
I'm assuming you want the whole string to match perfectly with your conditions and return true then and then only.
These are the conditions-
Each word should be separated by a comma, said comma should have 2 valid words on each side
Words can contain anything except the 2 kinds of quotes (' and ") and whitespace characters (spaces and newlines).
The regex you would use is this- ^(?:[^,'"\s]+,[^,'"\s]+)+$, with the global flag (g) on.
Check out the demo here
Edit: As per request of being able to match only a single word.
This is the regex you would use for that- ^(?:(?:[^,'"\s]+,[^,'"\s]+)+|[^,'"\s]+)$
This will match words separated by a , as well as match just a single word.
The conditions for what qualifies as a word remains the same as aforementioned.
Quick explanation:-
^[^,'"\s]+,[^,'"\s]+$
This part matches 2 words separated by a comma, [^,'"\s]+ denotes a word
Wrapping that whole thing in ^(?:[^,'"\s]+,[^,'"\s]+)+$ simply makes it repeat, so it'll match N number of words separated by a comma, not just 2
Then adding another alternative using | and wrapping the whole thing in a group (non-capturing), we get ^(?:(?:[^,'"\s]+,[^,'"\s]+)+|[^,'"\s]+)$
This simply just adds the alternative [^,'"\s]+ - which matches a singular word.
Check out the updated demo here

(/\s+(\W)/g, '$1') - how are the spaces being removed?

let a = ' lots of spaces in this ! '
console.log(a.replace(/\s+(\W)/g, '$1'))
log shows lots of spaces in this!
The above regex does exactly what I want, but I am trying to understand why?
I understand the following:
s+ is looking for 1 or more spaces
(\W) is capturing the non-alphanumeric characters
/g - global, search/replace all
$1 returns the prior alphanumeric character
The capture/$1 is what removes the space between the words This and !
I get it, but what I don't get is HOW are all the other spaces being removed?? I don't believe I have asked for them to (although I am happy they are).
I get this one console.log(a.replace(/\s+/g, ' ')); because the replace is replacing 1 or more spaces between alphanumeric characters with a single space ' '.
I'm scratching my head to understand HOW the first RegEx /\s+(\W)/g, '$1'replaces 1 or more spaces with a single space.
What your regex says is "match one or more spaces, followed by one or more non-alphanumeric character, and replace that whole result with that one or more non-alphanumeric character". The key is that the \s+ is greedy, meaning that it will try and match as many characters as possible. So in any given string of spaces it will try and match all of the spaces it can. However, your regex also requires one or more non-word characters (\W+). Because in your case the next character after each final space is a word character (i.e. a letter), this last part of the regex must match the last space.
Therefore, given the string a b, and using parens to mark the \s+ and \W+ matches, a( )( )b is the only way for the regex to be valid (\s+ matches the first two spaces and \W+ matches the last space). Now it's just a simple substitution. Since you wrapped the \W+ in parentheses that makes it the first and only capturing group, so replacing the match with $1 will replace it with that final space.
As another example, running this replace against a !b will result in the match looking like a( )(!)b (since ! is now the last non-word character), so the final replaced result will be a!b.
Lets take this string 'aaa &bbb' and run it through.
We get 'aaa&bbb'
\s+ grabs the 3 spaces before the ampersand
(\W) grabs the ampersand
$1 is the ampersand and replaces ' &' with '&'
That same principal applies to the spaces. You are forcing one of the spaces to satisfy the (\W) capture group for the replacement. It's also why your exclamation point isn't nuked.
List of matches would be the following. I replaced space with ☹ so it is easier to see
"☹☹☹☹(☹)",
"☹☹☹☹(☹)",
"☹☹(!)",
"☹(☹)"
And the code is saying to replace the match with what is in the capture group.
' lots of☹☹☹☹(☹)spaces☹☹☹☹(☹)in this☹☹(!)☹(☹)'
so when you replace it you get
' lots of☹spaces☹in this!☹'

Matching items in a comma-delimited list which aren't surrounded by single or double quotes

I'm wanting to match any instance of text in a comma-delimited list. For this, the following regular expression works great:
/[^,]+/g
(Regex101 demo).
The problem is that I'm wanting to ignore any commas which are contained within either single or double quotes and I'm unsure how to extend the above selector to allow me to do that.
Here's an example string:
abcd, efgh, ij"k,l", mnop, 'q,rs't
I'm wanting to either match the five chunks of text or match the four relevant commas (so I can retreive the data using split() instead of match()):
abcd
efgh
ij"k,l"
mnop
'q,rs't
Or:
abcd, efgh, ij"k,l", mnop, 'q,rs't
^ ^ ^ ^
How can I do this?
Three relevant questions exist, but none of them cater for both ' and " in JavaScript:
Regex for splitting a string using space when not surrounded by single or double quotes - Java solution, doesn't appear to work in JavaScript.
A regex to match a comma that isn't surrounded by quotes - Only matches on "
Alternative to regex: match all instances not inside quotes - Only matches on "
Okay, so your matching groups can contain:
Just letters
A matching pair of "
A matching pair of '
So this should work:
/((?:[^,"']+|"[^"]*"|'[^']*')+)/g
RegEx101 Demo
As a nice bonus, you can drop extra single-quotes inside the double-quotes, and vice versa. However, you'll probably need a state machine for adding escaped double-quotes inside double quoted strings (eg. "aa\"aa").
Unfortunately it matches the initial space as well - you'll have to the trim the matches.
Using a double lookahead to ascertain matched comma is outside quotes:
/(?=(([^"]*"){2})*[^"]*$)(?=(([^']*'){2})*[^']*$)\s*,\s*/g
(?=(([^"]*"){2})*[^"]*$) asserts that there are even number of double quotes ahead of matching comma.
(?=(([^']*"){2})*[^']*$) does the same assertion for single quote.
PS: This doesn't handle case of unbalanced, nested or escaped quotes.
RegEx Demo
Try this in JavaScript
(?:(?:[^,"'\n]*(?:(?:"[^"\n]*")|(?:'[^'\n]*'))[^,"'\n]*)+)|[^,\n]+
Demo
Add group for more readable (remove ?<name> for Javascript)
(?<has_quotes>(?:[^,"'\n]*(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+)|(?<simple>[^,\n]+)
Demo
Explanation:
(?<double_quotes>"[^"\n]*") matches "Any inside but not "" = (1) (in double quote)
(?<single_quotes>'[^'\n]*') matches 'Any inside but not '' = (2) (in single quote)
(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*')) matches (1)or(2) = (3)
[^,"'\n]* matches any text but not "', = (w)
(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*) matches (3)(w)
(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+ matches repeat (3)(w) = (3w+)
(?<has_quotes>[^,"'\n]*(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+) matches (w)(3w+) = (4) (has quotes)
[^,\n]+ matches other case (5) (simple)
So in final we have (4)|(5) (has quote or simple)
Input
abcd,efgh, ijkl
abcd, efgh, ij"k,l", mnop, 'q,rs't
'q, rs't
"'q,rs't, ij"k, l""
Output:
MATCH 1
simple [0-4] `abcd`
MATCH 2
simple [5-9] `efgh`
MATCH 3
simple [10-15] ` ijkl`
MATCH 4
simple [16-20] `abcd`
MATCH 5
simple [21-26] ` efgh`
MATCH 6
has_quotes [27-35] ` ij"k,l"`
double_quotes [30-35] `"k,l"`
MATCH 7
simple [36-41] ` mnop`
MATCH 8
has_quotes [42-50] ` 'q,rs't`
single_quotes [43-49] `'q,rs'`
MATCH 9
has_quotes [51-59] `'q, rs't`
single_quotes [51-58] `'q, rs'`
MATCH 10
has_quotes [60-74] `"'q,rs't, ij"k`
double_quotes [60-73] `"'q,rs't, ij"`
MATCH 11
has_quotes [75-79] ` l""`
double_quotes [77-79] `""`

Replace multiline text between two strings

I need to replace old value between foo{ and }bar using Javascript regex.
foo{old}bar
This works if old is a single line:
replace(
/(foo{).*(}bar)/,
'$1' + 'new' + '$2'
)
I need to make it work with:
foo{old value
which takes more
than one line}bar
How should I change my regex?
Change your regex to,
/(foo{)[^{}]*(}bar)/
OR
/(foo{)[\s\S]*?(}bar)/
so that it would match also a newline character. [^{}]* matches any character but not of { or }, zero or more times. [\s\S]*? matches any space or non-space characters, zero or more times non-greedily.

Regular expression to remove space in the beginning of each line?

I want to remove space in the beggining of each line.
I have data in each line with a set of spaces in the beginning so data appears in the middle, I want to remove spaces in the beginning of each line.
tmp = tmp.replace(/(<([^>]+)>)/g,"")
How can I add the ^\s condition into that replace()?
To remove all leading spaces:
str = str.replace(/^ +/gm, '');
The regex is quite simple - one or more spaces at the start. The more interesting bits are the flags - /g (global) to replace all matches and not just the first, and /m (multiline) so that the caret matches the beginning of each line, and not just the beginning of the string.
Working example: http://jsbin.com/oyeci4
var text = " this is a string \n"+
" \t with a much of new lines \n";
text.replace(/^\s*/gm, '');
this supports multiple spaces of different types including tabs.
If all you need is to remove one space, then this regex is all you need:
^\s
So in JavaScript:
yourString.replace(/(?<=\n) /gm,"");

Categories