Remove all special characters except for # symbol from string in JavaScript - javascript

I have a string:
“Gazelles were mentioned by #JohnSmith while he had $100 in his pocket and screamed W#$#%#$!!!!"
I need:
“Gazelles were mentioned by #JohnSmith while he had 100 in his pocket and screamed"
How to remove all special characters from string EXCEPT the # symbol. I tried:
str.replace(/[^\w\s]/gi, '')

If you want to keep the # when it is followed by a word char and keeping the W is also ok and also remove the newlines, you could for example change the \s to match spaces or tabs [ \t]
Add the # to the negated character class and use an alternation specifying to only match the # when it is not followed by a word character using a negative lookahead.
[^\w \t#]+|#(?!\w)
[^\w \t#]+ Match 1+ times any char except a word char, space or tab
| Or
#(?!\w) Match an # not directly followed by a word char
Regex demo
In the replacement use an empty string.

Related

Regex splitting on newline outside of quotes

I want to split a stream of data on new lines that are NOT within double quotes. The stream contains rows of data, where each row is separated by a newline. However, the rows of data can potentially contain newlines within double quotes. These newlines do not signify that the next row of data has started, so I want to ignore them.
So the data might look something like this:
Row 1: bla bla, 12345, ...
Row 2: "bla
bla", 12345, ...
Row 3: bla bla, 12345, ...
I tried using regex from a similar post about splitting on commas not found with double quotes (Splitting on comma outside quotes) by replacing the comma with the newline character:
\n(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)
This regex doesn't match where I'd expect it to though. Am I missing something?
Here are two ways of doing that.
#1
You can match the regular expression
[^"\r\n]+(?:"[^"]*"[^"\r\n]+)*
Demo
The expression can be broken down as follows.
[^"\r\n]* # match zero or more characters other than those in the
# character class
(?: # begin non-capture group
"[^"]*" # match double-quote followed by zero or more characters
# other than a double-quote, followed by a double-quote
[^"\r\n]+ # match zero or more characters other than those in the
# character class
)* # end non-capture group and execute it zero or more times
#2
Matching line terminators that are not between double-quotes is equivalent to matching line terminators that are preceded, from the beginning of the string, by an even number of double quotes. You can match such line terminators with the following regular expression (with the multi-line flag not set, so that ^ matches the beginning of the string, not the beginning of a line).
/(?<=^[^"]*(?:"[^"]*"[^"]*)*)\r?\n/
Start your engine!
Javascript's regex engine (which impressively supports variable-length lookbehinds) performs the following operations.
(?<= : begin positive lookbehind
^ : match beginning of string (not line)
[^"]* : match 0+ chars other than '"'
(?: : begin non-capture group
"[^"]*" : match '"', 0+ chars other than '"', '"'
[^"]* : match 0+ chars other than '"'
)* : end non-capture group and execute 0+ times
) : end positive lookbehind
\r?\n : match line terminator

Regex to match words with hyphens and/or apostrophes

I was looking for a regex to match words with hyphens and/or apostrophes. So far, I have:
(\w+([-'])(\w+)?[']?(\w+))
and that works most of the time, though if there's a apostrophe and then a hyphen, like "qu'est-ce", it doesn't match. I could append more optionals, though perhaps there's another more efficient way?
Some examples of what I'm trying to match: Mary's, High-school, 'tis, Chambers', Qu'est-ce.
use this pattern
(?=\S*['-])([a-zA-Z'-]+)
Demo
(?= # Look-Ahead
\S # <not a whitespace character>
* # (zero or more)(greedy)
['-] # Character in ['-] Character Class
) # End of Look-Ahead
( # Capturing Group (1)
[a-zA-Z'-] # Character in [a-zA-Z'-] Character Class
+ # (one or more)(greedy)
) # End of Capturing Group (1)
[\w'-]+ would match pretty much any occurrence of words with (or without) hyphens and apostrophes, but also in cases where those characters are adjacent.
(?:\w|['-]\w)+ should match cases where the characters can't be adjacent.
If you need to be sure that the word contains hyphens and/or apostrophes and that those characters aren't adjacent maybe try \w*(?:['-](?!['-])\w*)+. But that would also match ' and - alone.
debuggex.com is a great resource for visualizing these sorts of things
\b\w*[-']\w*\b should do the trick
The problem you're running into is that you actually have three possible sub-patterns: one or more chars, an apostrophe followed by one or more chars, and a hyphen followed by one or more chars.
This presumes you don't wish to accept words that begin or end with apostrophes or hyphens or have hyphens next to apostrophes (or vice versa).
I believe the best way to represent this in a RegExp would be:
/\b[a-z]+(?:['-]?[a-z]+)*\b/
which is described as:
\b # word-break
[a-z]+ # one or more
(?: # start non-matching group
['-]? # zero or one
[a-z]+ # one or more
)* # end of non-matching group, zero or more
\b # word-break
which will match any word that begins and ends with an alpha and can contain zero or more groups of either a apos or a hyphen followed by one or more alpha.
How about: \'?\w+([-']\w+)*\'?
demo
I suppose these words shouldn't be matched:
something- or -something: start or end with -
some--thing or some'-thing: - not followed by a character
some'': two hyphens
This worked for me:
([a-zA-Z]+'?-?[a-zA-Z]+(-?[a-zA-Z])?)|[a-zA-Z]
Use
([\w]+[']*[\w]*)|([']*[\w]+)
It will properly parse
"You've and we i've it' '98"
(supports ' in any place in the word but single ' is ignored).
If needed \w could be replaced with [a-zA-Z] etc.

Checking an Array Element for Two Blank Space Characters

How would you be able to tell if an array element is made up of three words (i.e. if it has two blank space characters in it)? It might look something like "abc def ghi". I am trying to search through an array for elements of this form and will remove this while others of the format "jkl xyz" or '"jkl"' would remain in the array.
You can use search function with following regex :
str.search(/\b(\w+ \w+ \w+)\b/g);
Read the detail in Demo
You can use a regex like:
/^[^\s]+\s[^\s]+\s[^\s]+$/.test("abc def def") // true
/^[^\s]+\s[^\s]+\s[^\s]+$/.test("abc def ") // false
It means:
^ Start of string
[^\s]+ 1 or more none space characters
\s a space character
[^\s]+ 1 or more none space characters
\s a space character
[^\s]+ 1 or more none space characters
\s a space character
$ End of string
var myArray = ["abc def ghi","jkl xyz","gty slp","zxc vbn jkl"];
for (i=0;i<myArray.length;++i) {
if (/\w+ \w+ \w+/.test(myArray[i])) {
myArray.splice([i], 1);
}
};
console.log(myArray);
Outputs:
["jkl xyz", "gty slp"]
CODEPEN DEMO
RegexExplanation:
\w+ \w+ \w+
-----------
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « »
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “ ” literally « »
Match a single character that is a “word character” (ASCII letter, digit, or underscore only) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Regex match string until whitespace Javascript

I want to be able to match the following examples:
www.example.com
http://example.com
https://example.com
I have the following regex which does NOT match www. but will match http:// https://. I need to match any prefix in the examples above and up until the next white space thus the entire URL.
var regx = ((\s))(http?:\/\/)|(https?:\/\/)|(www\.)(?=\s{1});
Lets say I have a string that looks like the following:
I have found a lot of help off www.stackoverflow.com and the people on there!
I want to run the matching on that string and get
www.stackoverflow.com
Thanks!
You can try
(?:www|https?)[^\s]+
Here is online demo
sample code:
var str="I have found a lot of help off www.stackoverflow.com and the people on there!";
var found=str.match(/(?:www|https?)[^\s]+/gi);
alert(found);
Pattern explanation:
(?: group, but do not capture:
www 'www'
| OR
http 'http'
s? 's' (optional)
) end of grouping
[^\s]+ any character except: whitespace
(\n, \r, \t, \f, and " ") (1 or more times)
You have an error in your regex.
Use this:
((\s))(http?:\/\/)|(https?:\/\/)|(www\.)(?!\s{1})
^--- Change to negative lookaround
Btw, I think you can use:
(?:(http?:\/\/)|(https?:\/\/)|(www\.))(?!\s{1})
MATCH 1
3. [0-4] `www.`
MATCH 2
1. [16-23] `http://`
MATCH 3
2. [35-43] `https://`
Not quite sure what you're trying to do, but this should match any group of non-space characters not immediately preceded with "www." case insensitive.
/(https?:\/\/)?(?<!(www\.))[^\s]*/i
... [edit] but you did want to match www.
/(https?:\/\/)?([^\s\.]{2,}\.?)+/i
First things first, to match any whitespace char, use \S construct (in POSIX, you would use [^[:space:]], but JavaScript regex is not POSIX compliant). Here are some common patterns with \S:
\S* - zero or more non-whitespace chars
\S+ - one or more non-whitespace chars
Matching any text until first whitespace can mean match any zero or more chars other than whitespace, so, the answer to the current OP problem is
(?:www|https?)\S*
// ^^^
See the regex demo. This pattern will match up to the first whitespace or end of string. If there must be a whitespace char on the right use
(?:www|https?)\S*(?=\s)
The (?=\s) positive lookahead requires a whitespace immediately to the right of the current location.
Whenver there is a need to match until last whitespace you could match any zero or more chars that are followed with a whitespace, \s, pattern:
/(?:www|https?)[\w\W]*(?=\s)/
/(?:www|https?)[^]*(?=\s)/
// Or even (for ECMAScript 2018+):
/(?:www|https?).*(?=\s)/s
The [\w\W], [^] and . with s flag match any char including line break chars.

Javascript regex for matching twitter-like hashtags

I'd like some help on figuring out the JS regex to use to identify "hashtags", where they should match all of the following:
The usual twitter style hashtags: #foobar
Hashtags with text preceding: abc123#xyz456
Hashtags with space in them, which are denoted as: #[foo bar] (that is, the [] serves as delimiter for the hashtag)
For 1 and 2, I was using something of the following form:
var all_re =/\S*#\S+/gi;
I can't seem to figure out how to extend it to 3. I'm not good at regexps, some help please?
Thanks!
So it has to match either all non-space characters or any characters between (and including) [ and ]:
\S*#(?:\[[^\]]+\]|\S+)
Explanation:
\S* # any number of non-white space characters
# # matches #
(?: # start non-capturing group
\[ # matches [
[^\]]+ # any character but ], one or more
\] # matches ]
| # OR
\S+ # one or more non-white space characters
) # end non-capturing group
Reference: alternation, negated character classes.
How about this?
var all_re =/(\S*#\[[^\]]+\])|(\S*#\S+)/gi;
I had a similar problem, but only want to match when a string starts and ends with the hashtag. So similar problem, hopefully someone else can have use of my solution.
This one matches "#myhashtag" but not "gfd#myhashtag" or "#myhashtag ".
/^#\S+$/
^ #start of regex
\S #Any char that is not a white space
+ #Any number of said char
$ #End of string
Simple as that.

Categories