Regex: make "enter" and extra spaces equal to single space?

Regex: make "enter" and extra spaces equal to single space? - javascript

I'm trying to format the input of a user into single spaces because I will use the input to compare the strings in the results. The problem is the compare function only works in single spaces, so when a user accidentally added some extra spaces or press enter it will not work.
here's my regex and code plus a fiddle:http://jsfiddle.net/purmou/eSE9Y/
html
<textarea id="input"></textarea>
<button>Submit</button>
javascript
$("button").click(function(){
$('#input').val($('#input').val().replace(/[\t ]+/g,' '));
});
I already solved the extra spaces part my only problem is the "enter" part.

Use \s+ instead to match all whitespace characters including spaces, tabs, newlines etc.
$('#q').val($('#q').val().replace(/\s+/g,' '));
Demo: http://jsfiddle.net/eSE9Y/1/
\s stands for "whitespace character". (...) which characters this actually includes, depends on the regex flavor. (...) \s matches a space, a tab, a line break, or a form feed. Most flavors also include the vertical tab (...). In flavors that support Unicode, \s normally includes all characters from the Unicode "separator" category. Java and PCRE are exceptions once again. But JavaScript does match all Unicode whitespace with \s.
Reference: http://www.regular-expressions.info/shorthand.html

Newline character can be represented with \n in the RegEx. So, just change your RegEx like this
/[\t\n ]+/g
Instead, you can simply replace all whitespace characters with a single space, with this regular expression
/\s+/g
Quoting from MDN RegularExpressions page, about \s
Matches a single white space character, including space, tab, form
feed, line feed. Equivalent to [
\f\n\r\t\v\u00a0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000].

Related

Odd RegEx request for Javascript

I'm having trouble with a certain RegEx replacement string for later use in Javascript.
We have quite a bit of text that was stored in a rather odd format that we aren't allowed to fix.
But we do need to find all the "network path" strings inside it, following these rules:
A. The matches always start with 2 backslashes.
B. The matching characters should stop as soon as it hits a first occurrence of any 1 of these:
A < character
A space
A line feed
A carriage return
A & character
A literal "\r" or "\n" string (but only if occurring at end of line)
We "almost" have it working with /\\\\[^ &<\s]*/gi as shown in this RegEx Tester page:
https://regex101.com/r/T4cDOL/5
Even if we get it working, the RegEx has to be even futher "escape escaped" before putting on
our Javascript code, but that's also not working as expected.

From your example, it seems you literally have a backslash followed by an n and a backslash followed by an r (as opposed to a newline or carriage return), which means you can't only use a negated character class (since you need to handle a sequence of two characters). I'd use a positive lookahead to know where to stop, so I can use an alternation for that part.
You haven't said what parts of those strings should match, so I've had to guess a bit, but here's my best guess (with useful input from Niet the Dark Absol):
const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$| ))/gmi;
That says:
Match starting with \\
Take everything prior to the lookahead (non-greedy)
Lookahead: An alternation of:
A space, &, <, carriage return (\r, character 13), or a newline (\n, character 10); or
A backslash followed by r or n if that's either at the end of a line or followed by a space (so we get the \nancy but not the \n after it).
Updated regex101
You might want to have more characters than just a space after the \r/\n. If so, make it a character class (and/or use \s for "whitespace" if that applies):
const rex = /\\\\.*?(?=[ &<\r\n]|\\[rn](?:$|[ others]))/gmi;
// −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−^^^^^^^^^

Ignore newlines in a regex that doesn't care about order

I have a regex here at scriptular.com
/(?=.*net)(?=.*income)(?=.*total)(?=.*depreciation)/i
How do I make the regex successfully match the string?
Without the newline characters in the string, the regex would succeed. I could remove them... but I'd rather not.

1.) The dot matches any character besides newline. It won't skip over newlines if the desired words would match in lines after the first one. In many regex flavors there is the dotall or single line s-flag available for making the dot also match newlines but unfortunately not in JS Regex.
Workarounds are to use a character class that contains any character. Such as [\s\S] any whitespace character \s together with any non whitespace \S or [\w\W] for any word character together with any non word character or even [^] for not nothing instead of the dot.
2.) Anchor the lookaheads to ^ start of string as it's not wanted to repeat the lookaheads at any position in the string. This will drastically improve performance.
3.) Use lazy matching for being satisfied with first match of each word.
/^(?=[\s\S]*?net)(?=[\s\S]*?income)(?=[\s\S]*?total)(?=[\s\S]*?depreciation)/i
See demo at regex101 (dunno why this doesn't work in your demo tool)
Additionally you can use \b word boundaries around the words for making sure such as net won't be matched in brunet, network... so the regex becomes ^(?=[\s\S]*?\bnet\b)...

JavaScript regular expression for word boundaries, tolerating in-word hyphens and apostrophes

I'm looking for a Regular Expression for JavaScript that will identify word boundaries in English, while accepting hyphens and apostrophes that appear inside words, but excluding those that appear alone or at the beginning or end of a word.
For example, for the sentence ...
  She said - 'That'll be all, Two-Fry.'
... I want the characters shown in grey below to be detected:
  Shesaid- 'That'llbeall,Two-Fry.'
If I use the regex /[^A-Za-z'-]/g, then "loose" hyphens and apostrophes are not detected.
  Shesaid-'That'llbeall,Two-Fry.'
How can I alter my regex so that it detects apostrophes and hyphens that don't have a word character on both sides?
You can test my regex here: https://regex101.com/r/bR8sV1/2
Note: the text I will be working on may contain other writing scripts, like руский and ไทอ so it will not be feasible to simply include all the characters that are not part of any English word.

You can organize your word-boundary characters into two groups.
Characters that cannot be alone.
Characters that can be alone.
A regex that works with your example would be:
[\s.,'-]{2,}|[\s.]
Regex101 Demo
Now all that's left is to keep adding all non-word characters into those two groups until it fits all of your needs. So you might start adding symbols and more punctuation to those character classes.

You could write something like that:
(\s|[!-/]|[:-#]|[\[-`]|[\{-~])*\s(\s|[!-/]|[:-#]|[\[-`]|[\{-~])*
Or the compact version:
(\s|[!-/:-#\[-`\{-~])*\s(\s|[!-/:-#\[-`\{-~])*
The RegExp requires one \s (Space character) and selects als spaces and non alphanumeric chars before and after it.
https://regex101.com/r/bR8sV1/4
\s matches all spaces
!-/ every char from ! to /
:-# every char from : to #
\[-`` every char from [ to ``
\{-~ every char from { to ~

RegExp to match hashtag at the begining of the string or after a space

I have looked through previous questions and answers, however they do not solve the following:
https://stackoverflow.com/questions/ask#notHashTag
The closest I got to is this: (^#|(?:\s)#)(\w+), which finds the hashtag in half the necessary cases and also includes the leading space in the returned text. Here are all the cases that need to be matched:
#hashtag
a #hashtag
a #hashtag world
cool.#hashtag
##hashtag, but only until the comma and starting at second hash
#hashtag#hashtag two separate matches
And these should be skipped:
https://stackoverflow.com/questions/ask#notHashTag
Word#notHashTag
#ab is too short to be a hashtag, 3 characters minimum

This should work for everything but #hashtag#duplicates, and because JS doesn't support lookbehind, that's probably not possible to match that by itself.
\B#\w{3,}
\B is designed to match only between two word characters or two non-word characters. Since # is a non-word character, this forces the match to be preceded by a space or punctuation, or the beginning of the string.

Try this regex:
(?:^|[\s.])(#+\w{3,})(#+\w{3,})?
Online Demo: http://regex101.com/r/kG1nD5

Javascript Regular Expression "Single Space Character"

I am learning javascript and I am analyzing existing codes.
In my JS reference book, it says to search on a single space use "\s"?
But I have came across the code
obj.match(/Kobe Bryant/);
Instead of using \s, it uses the actual space?
Why doesn't this generate an error?

The character class \s does not just contain the space character but also other Unicode white space characters. \s is equivalent to this character class:
[\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]

No. It is perfectly legal to include a literal space in a regex.
However, it's not equivalent - \s will include any whitespace character, including tabs, non-breaking spaces, half-width spaces and other characters, whereas a literal space will only match the regular space character.

\s matches any whitespace character, including tabs etc. Sure you can use a literal space also without problems. Just like you can use [0-9] instead of \d to denote any digit. However, keep in mind that [0-9] is equivalent to \d whereas the literal space is a subset of \s.

In addition to normal spaces, \s matches different kinds of white space characters, including tabs (and possibly newline characters, according to configuration). That said, matching with a normal space is certainly valid, especially in your case where it seems you want to match a name, which is normally separated by a normal space.

We Keep Coding

JavaScript is the programming language of the Web.

Regex: make "enter" and extra spaces equal to single space? - javascript

Related

Odd RegEx request for Javascript

Ignore newlines in a regex that doesn't care about order

JavaScript regular expression for word boundaries, tolerating in-word hyphens and apostrophes

RegExp to match hashtag at the begining of the string or after a space

Javascript Regular Expression "Single Space Character"

Categories

Resources