Regex match string until whitespace Javascript - javascript

I want to be able to match the following examples:
www.example.com
http://example.com
https://example.com
I have the following regex which does NOT match www. but will match http:// https://. I need to match any prefix in the examples above and up until the next white space thus the entire URL.
var regx = ((\s))(http?:\/\/)|(https?:\/\/)|(www\.)(?=\s{1});
Lets say I have a string that looks like the following:
I have found a lot of help off www.stackoverflow.com and the people on there!
I want to run the matching on that string and get
www.stackoverflow.com
Thanks!

You can try
(?:www|https?)[^\s]+
Here is online demo
sample code:
var str="I have found a lot of help off www.stackoverflow.com and the people on there!";
var found=str.match(/(?:www|https?)[^\s]+/gi);
alert(found);
Pattern explanation:
(?: group, but do not capture:
www 'www'
| OR
http 'http'
s? 's' (optional)
) end of grouping
[^\s]+ any character except: whitespace
(\n, \r, \t, \f, and " ") (1 or more times)

You have an error in your regex.
Use this:
((\s))(http?:\/\/)|(https?:\/\/)|(www\.)(?!\s{1})
^--- Change to negative lookaround
Btw, I think you can use:
(?:(http?:\/\/)|(https?:\/\/)|(www\.))(?!\s{1})
MATCH 1
3. [0-4] `www.`
MATCH 2
1. [16-23] `http://`
MATCH 3
2. [35-43] `https://`

Not quite sure what you're trying to do, but this should match any group of non-space characters not immediately preceded with "www." case insensitive.
/(https?:\/\/)?(?<!(www\.))[^\s]*/i
... [edit] but you did want to match www.
/(https?:\/\/)?([^\s\.]{2,}\.?)+/i

First things first, to match any whitespace char, use \S construct (in POSIX, you would use [^[:space:]], but JavaScript regex is not POSIX compliant). Here are some common patterns with \S:
\S* - zero or more non-whitespace chars
\S+ - one or more non-whitespace chars
Matching any text until first whitespace can mean match any zero or more chars other than whitespace, so, the answer to the current OP problem is
(?:www|https?)\S*
// ^^^
See the regex demo. This pattern will match up to the first whitespace or end of string. If there must be a whitespace char on the right use
(?:www|https?)\S*(?=\s)
The (?=\s) positive lookahead requires a whitespace immediately to the right of the current location.
Whenver there is a need to match until last whitespace you could match any zero or more chars that are followed with a whitespace, \s, pattern:
/(?:www|https?)[\w\W]*(?=\s)/
/(?:www|https?)[^]*(?=\s)/
// Or even (for ECMAScript 2018+):
/(?:www|https?).*(?=\s)/s
The [\w\W], [^] and . with s flag match any char including line break chars.

Related

Issue with javascript regex not matching less than 3 characters

I have the following javascript regex:
/^[^\s][a-z0-9 ]+[^\s]$/i
I need to allow any alphanumeric character as well as spaces inside the string but not at the beginning nor at the end.
Oddly enough, the above regex will not accept less than 3 characters, e.g. aa will not match but aaa will.
I am not sure why. Can anyone please help ?
You have: [^\s] (requires matching at least one non-whitespace character), [a-z0-9 ]+ (requires matching at least one alphanumeric or space character), and [^\s] again (requires matching at least one non-whitespace character). So, in total, you need at least 3 characters in the string.
Use word boundaries at the beginning and end instead:
/^\b[a-z0-9 ]+\b$/i
https://regex101.com/r/2GhH3N/1
Try the following regex:
^(?! )[a-z0-9 ]*[a-z0-9]$
Details:
^(?! ) - Start of the string and no space after it (so here we exclude the
initial space).
[a-z0-9 ]* - A sequence of letters, digits and spaces, possibly empty
(the content before the last letter(see below).
[a-z0-9]$ - The last letter and the end of string (so here we exclude the
terminal space).
You should re-write the expression as
/^[a-z0-9]+(?:\s+[a-z0-9]+)*$/i
See the regex demo.
NOTE: If only one whitespace is allowed between the alphanumeric chars use
/^[a-z0-9]+(?:\s[a-z0-9]+)*$/i
^^
Details
^ - start of string
[a-z0-9]+ - 1+ letters/digits
(?:\s+[a-z0-9]+)* - 0 or more repetitions of 1+ whitespaces (\s+) and 1+ digit/letters
$ - end of string.
See the regex graph:

Javascript in regexp not matching something

I want to match everything except the one with the string '1AB' in it. How do I do that? When I tried it, it said nothing is matched.
var text = "match1ABmatch match2ABmatch match3ABmatch";
var matches = text.match(/match(?!1AB)match/g);
console.log(matches[0]+"..."+matches[1]);
Lookarounds do not consume the text, i.e. the regex index does not move when their patterns are matched. See Lookarounds Stand their Ground for more details. You still must match the text with a consuming pattern, here, the digits.
Add \w+ word matching pattern after the lookahead. NOTE: You may also use \S+ if there can be any one or more non-whitespace chars. If there can be any chars, use .+ (to match 1 or more chars other than line break chars) or [^]+ (matches even line breaks).
var text = "match100match match200match match300match";
var matches = text.match(/match(?!100(?!\d))\w+match/g);
console.log(matches);
Pattern details
match - a literal substring
(?!100(?!\d)) - a negative lookahead that fails the match if, immediately to the right of the current location, there is 100 substring not followed with a digit (if you want to fail the matches where the number starts with 100, remove the (?!\d) lookahead)
\w+ - 1 or more word chars (letters, digits or _)
match - a literal substring
See the regex demo online.

Capture between pattern of digits

I'm stuck trying to capture a structure like this:
1:1 wefeff qwefejä qwefjk
dfjdf 10:2 jdskjdksdjö
12:1 qwe qwe: qwertyå
I would want to match everything between the digits, followed by a colon, followed by another set of digits. So the expected output would be:
match 1 = 1:1 wefeff qwefejä qwefjk dfjdf
match 2 = 10:2 jdskjdksdjö
match 3 = 12:1 qwe qwe: qwertyå
Here's what I have tried:
\d+\:\d+.+
But that fails if there are word characters spanning two lines.
I'm using a javascript based regex engine.
You may use a regex based on a tempered greedy token:
/\d+:\d+(?:(?!\d+:\d)[\s\S])*/g
The \d+:\d+ part will match one or more digits, a colon, one or more digits and (?:(?!\d+:\d)[\s\S])* will match any char, zero or more occurrences, that do not start a sequence of one or more digits followed with a colon and a digit. See this regex demo.
As the tempered greedy token is a resource consuming construct, you can unroll it into a more efficient pattern like
/\d+:\d+\D*(?:\d(?!\d*:\d)\D*)*/g
See another regex demo.
Now, the () is turned into a pattern that matches strings linearly:
\D* - 0+ non-digit symbols
(?: - start of a non-capturing group matching zero or more sequences of:
\d - a digit that is...
(?!\d*:\d) - not followed with 0+ digits, : and a digit
\D* - 0+ non-digit symbols
)* - end of the non-capturing group.
you can use or not the ñ-Ñ, but you should be ok this way
\d+?:\d+? [a-zñA-ZÑ ]*
Edited:
If you want to include the break lines, you can add the \n or \r to the set,
\d+?:\d+? [a-zñA-ZÑ\n ]*
\d+?:\d+? [a-zñA-ZÑ\r ]*
Give it a try ! also tested in https://regex101.com/
for more chars:
^[a-zA-Z0-9!##\$%\^\&*)(+=._-]+$

java-script Regex filtering on words

I have the following Regex:
The regex is in a bit of code in our app, I can see it splits words. It obviously removes characters such as $#* and so on. I need it to do the same thing exactly but allow the a hash tag, since the words can now have #hashtags.
"Test #words".toLowerCase().split(/\b/).filter(function(w){return w.match(/^\w+$/) }) // returns ["test", "words"]
The current Regex removes the hash, i want it to remain. So i get:
["test", "#words"]
Your "Test #words".toLowerCase().split(/\b/).filter(function(w){return w.match(/^\w+$/) }) does the following:
The whole string is turned to lower case
The string is split at any word boundary (leading and trailing, meaning Test #words is split into [,Test, #,words,])
The parts that match the ^\w+$ regex (1+ word chars from the start till end of string) are kept in the array.
You may use an identical matching approach to also include # with /(?:\B#)?\w+/g:
console.log("Test #words".toLowerCase().match(/(?:\B#)?\w+/g))
The pattern matches:
(?:\B#)? - an optional # preceded with a non-word boundary
\w+ - 1 or more word chars (from [a-zA-Z0-9_] ranges)
If context is not so important, use a simpler /#?\w+/g regex that will match an optional # anywhere in the string, followed with 1+ word chars.
Just add optional # at the beginning of the regexp to support #hashtags.
"Test #words".toLowerCase().match(/#?\w+/g);

Regex to match words with hyphens and/or apostrophes

I was looking for a regex to match words with hyphens and/or apostrophes. So far, I have:
(\w+([-'])(\w+)?[']?(\w+))
and that works most of the time, though if there's a apostrophe and then a hyphen, like "qu'est-ce", it doesn't match. I could append more optionals, though perhaps there's another more efficient way?
Some examples of what I'm trying to match: Mary's, High-school, 'tis, Chambers', Qu'est-ce.
use this pattern
(?=\S*['-])([a-zA-Z'-]+)
Demo
(?= # Look-Ahead
\S # <not a whitespace character>
* # (zero or more)(greedy)
['-] # Character in ['-] Character Class
) # End of Look-Ahead
( # Capturing Group (1)
[a-zA-Z'-] # Character in [a-zA-Z'-] Character Class
+ # (one or more)(greedy)
) # End of Capturing Group (1)
[\w'-]+ would match pretty much any occurrence of words with (or without) hyphens and apostrophes, but also in cases where those characters are adjacent.
(?:\w|['-]\w)+ should match cases where the characters can't be adjacent.
If you need to be sure that the word contains hyphens and/or apostrophes and that those characters aren't adjacent maybe try \w*(?:['-](?!['-])\w*)+. But that would also match ' and - alone.
debuggex.com is a great resource for visualizing these sorts of things
\b\w*[-']\w*\b should do the trick
The problem you're running into is that you actually have three possible sub-patterns: one or more chars, an apostrophe followed by one or more chars, and a hyphen followed by one or more chars.
This presumes you don't wish to accept words that begin or end with apostrophes or hyphens or have hyphens next to apostrophes (or vice versa).
I believe the best way to represent this in a RegExp would be:
/\b[a-z]+(?:['-]?[a-z]+)*\b/
which is described as:
\b # word-break
[a-z]+ # one or more
(?: # start non-matching group
['-]? # zero or one
[a-z]+ # one or more
)* # end of non-matching group, zero or more
\b # word-break
which will match any word that begins and ends with an alpha and can contain zero or more groups of either a apos or a hyphen followed by one or more alpha.
How about: \'?\w+([-']\w+)*\'?
demo
I suppose these words shouldn't be matched:
something- or -something: start or end with -
some--thing or some'-thing: - not followed by a character
some'': two hyphens
This worked for me:
([a-zA-Z]+'?-?[a-zA-Z]+(-?[a-zA-Z])?)|[a-zA-Z]
Use
([\w]+[']*[\w]*)|([']*[\w]+)
It will properly parse
"You've and we i've it' '98"
(supports ' in any place in the word but single ' is ignored).
If needed \w could be replaced with [a-zA-Z] etc.

Categories