Hi I need to write a regex expression for vscode extension, which matches fields of the class in a string representation:
const str = `
title = models.CharField(
blank=True
)
text = models.TextField()
author=ForeignKey(
User,
on_delete=models.CASCADE
)
test_num_10 = models.TextFIeld()
`
from the multiline string bellow I need to capture strings title, text, author, test_num_10
Each group follow by a pattern:
white space
capturing group, any character
optional white space
= sign
optional white space
any character
( sign
optional any character
) sign .
So far my regexp looks like this /\s+(.+)(\s+)?\=(\s+)?.+\((.+)?\).+/. But it doesn't match what I expect, please help me figure it out.
For you example data you could match your values using:
\s*(.+?)\s*=\s*.+\(([\s\S]*?)\)
That will match:
\s* Match zero or more times a whitespace character
(.+?) Capture in a group any character zero or more times non greedy
\s* Match zero or more times a whitespace character
= Match literally
\s*.+ Match zero or more times a whitespace character followed by any character one or more times
\(([\s\S]*?)\) Between parenthesis capture in a group any character non greedy
Your regex suffers from greediness (all that dot-stars consume up to the end of line then cause a backtrack). You'd better look for a restrictive pattern while there is a chance:
(\S+)\s*=\s*[^(]*\(([^)]*)\)
\S Matches non-whitespace characters
\s This is the opposite of \S
[^(]* Matches anything but an opening parenthesis (optional)
[^)]* Matches anything but a closing parenthesis (optional)
Live demo
Related
I'm attempting to match the last character in a WORD.
A WORD is a sequence of non-whitespace characters
'[^\n\r\t\f ]', or an empty line matching ^$.
The expression I made to do this is:
"[^ \n\t\r\f]\(?:[ \$\n\t\r\f]\)"
The regex matches a non-whitespace character that follows a whitespace character or the end of the line.
But I don't know how to stop it from excluding the following whitespace character from the result and why it doesn't seem to capture a character preceding the end of the line.
Using the string "Hi World!", I would expect: the "i" and "!" to be captured.
Instead I get: "i ".
What steps can I take to solve this problem?
"Word" that is a sequence of non-whitespace characters scenario
Note that a non-capturing group (?:...) in [^ \n\t\r\f](?:[ \$\n\t\r\f]) still matches (consumes) the whitespace char (thus, it becomes a part of the match) and it does not match at the end of the string as the $ symbol is not a string end anchor inside a character class, it is parsed as a literal $ symbol.
You may use
\S(?!\S)
See the regex demo
The \S matches a non-whitespace char that is not followed with a non-whitespace char (due to the (?!\S) negative lookahead).
General "word" case
If a word consists of just letters, digits and underscores, that is, if it is matched with \w+, you may simply use
\w\b
Here, \w matches a "word" char, and the word boundary asserts there is no word char right after.
See another regex demo.
In Word text, if I want to highlight the last a in para. I search for all the words that have [space][para][space] to make sure I only have the word I want, then when it is found it should be highlighted.
Next, I search for the last [a ] space added, in the selection and I will get only the last [a] and I will highlight it or color it differently.
I have a string:
“Gazelles were mentioned by #JohnSmith while he had $100 in his pocket and screamed W#$#%#$!!!!"
I need:
“Gazelles were mentioned by #JohnSmith while he had 100 in his pocket and screamed"
How to remove all special characters from string EXCEPT the # symbol. I tried:
str.replace(/[^\w\s]/gi, '')
If you want to keep the # when it is followed by a word char and keeping the W is also ok and also remove the newlines, you could for example change the \s to match spaces or tabs [ \t]
Add the # to the negated character class and use an alternation specifying to only match the # when it is not followed by a word character using a negative lookahead.
[^\w \t#]+|#(?!\w)
[^\w \t#]+ Match 1+ times any char except a word char, space or tab
| Or
#(?!\w) Match an # not directly followed by a word char
Regex demo
In the replacement use an empty string.
Let's say I have the following string in javascript:
&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&
I want to remove all the leading and trailing special characters (anything which is not alphanumeric or alphabet in another language) from all the words.
So the string should look like
a.b.c a.b.c a.b.c a.b.c a.b&.c a.b.&&dc ê.b..c
Notice how the special characters in between the alphanumeric is left behind. The last ê is also left behind.
This regex should do what you want. It looks for
start of line, or some spaces (^| +) captured in group 1
some number of symbol characters [!-\/:-#\[-``\{-~]*
a minimal number of non-space characters ([^ ]*?) captured in group 2
some number of symbol characters [!-\/:-#\[-``\{-~]*
followed by a space or end-of-line (using a positive lookahead) (?=\s|$)
Matches are replaced with just groups 1 and 2 (the spacing and the characters between the symbols).
let str = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*([^ ]*?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
Note that if you want to preserve a string of punctuation characters on their own (e.g. as in Apple & Sauce), you should change the second capture group to insist on there being one or more non-space characters (([^ ]+?)) instead of none and add a lookahead after the initial match of punctuation characters to assert that the next character is not punctuation:
let str = 'Apple &&& Sauce; -This + !That!';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*(?![!-\/:-#\[-`\{-~])([^ ]+?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
a-zA-Z\u00C0-\u017F is used to capture all valid characters, including diacritics.
The following is a single regular expression to capture each individual word. The logic is that it will look for the first valid character as the beginning of the capture group, and then the last sequence of invalid characters before a space character or string terminator as the end of the capture group.
const myRegEx = /[^a-zA-Z\u00C0-\u017F]*([a-zA-Z\u00C0-\u017F].*?[a-zA-Z\u00C0-\u017F]*)[^a-zA-Z\u00C0-\u017F]*?(\s|$)/g;
let myString = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&'.replace(myRegEx, '$1$2');
console.log(myString);
Something like this might help:
const string = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
const result = string.split(' ').map(s => /^[^a-zA-Z0-9ê]*([\w\W]*?)[^a-zA-Z0-9ê]*$/g.exec(s)[1]).join(' ');
console.log(result);
Note that this is not one single regex, but uses JS help code.
Rough explanation: We first split the string into an array of strings, divided by spaces. We then transform each of the substrings by stripping
the leading and trailing special characters. We do this by capturing all special characters with [^a-zA-Z0-9ê]*, because of the leading ^ character it matches all characters except those listed, so all special characters. Between these two groups we capture all relevant characters with ([\w\W]*?). \w catches words, \W catches non-words, so \w\W catches all possible characters. By appending the ? after the *, we make the quantifier * lazy, so that the group stops catching as soon as the next group, which catches trailing special characters, catches something. We also start the regex with a ^ symbol and end it with an $ symbol to capture the entire string (they respectively set anchors to the start end the end of the string). With .exec(s)[1] we then execute the regex on the substring and return the first capturing group result in our transform function. Note that this might be null if a substring does not include proper characters. At the end we join the substrings with spaces.
I'm stuck trying to capture a structure like this:
1:1 wefeff qwefejä qwefjk
dfjdf 10:2 jdskjdksdjö
12:1 qwe qwe: qwertyå
I would want to match everything between the digits, followed by a colon, followed by another set of digits. So the expected output would be:
match 1 = 1:1 wefeff qwefejä qwefjk dfjdf
match 2 = 10:2 jdskjdksdjö
match 3 = 12:1 qwe qwe: qwertyå
Here's what I have tried:
\d+\:\d+.+
But that fails if there are word characters spanning two lines.
I'm using a javascript based regex engine.
You may use a regex based on a tempered greedy token:
/\d+:\d+(?:(?!\d+:\d)[\s\S])*/g
The \d+:\d+ part will match one or more digits, a colon, one or more digits and (?:(?!\d+:\d)[\s\S])* will match any char, zero or more occurrences, that do not start a sequence of one or more digits followed with a colon and a digit. See this regex demo.
As the tempered greedy token is a resource consuming construct, you can unroll it into a more efficient pattern like
/\d+:\d+\D*(?:\d(?!\d*:\d)\D*)*/g
See another regex demo.
Now, the () is turned into a pattern that matches strings linearly:
\D* - 0+ non-digit symbols
(?: - start of a non-capturing group matching zero or more sequences of:
\d - a digit that is...
(?!\d*:\d) - not followed with 0+ digits, : and a digit
\D* - 0+ non-digit symbols
)* - end of the non-capturing group.
you can use or not the ñ-Ñ, but you should be ok this way
\d+?:\d+? [a-zñA-ZÑ ]*
Edited:
If you want to include the break lines, you can add the \n or \r to the set,
\d+?:\d+? [a-zñA-ZÑ\n ]*
\d+?:\d+? [a-zñA-ZÑ\r ]*
Give it a try ! also tested in https://regex101.com/
for more chars:
^[a-zA-Z0-9!##\$%\^\&*)(+=._-]+$
I want to be able to match the following examples:
www.example.com
http://example.com
https://example.com
I have the following regex which does NOT match www. but will match http:// https://. I need to match any prefix in the examples above and up until the next white space thus the entire URL.
var regx = ((\s))(http?:\/\/)|(https?:\/\/)|(www\.)(?=\s{1});
Lets say I have a string that looks like the following:
I have found a lot of help off www.stackoverflow.com and the people on there!
I want to run the matching on that string and get
www.stackoverflow.com
Thanks!
You can try
(?:www|https?)[^\s]+
Here is online demo
sample code:
var str="I have found a lot of help off www.stackoverflow.com and the people on there!";
var found=str.match(/(?:www|https?)[^\s]+/gi);
alert(found);
Pattern explanation:
(?: group, but do not capture:
www 'www'
| OR
http 'http'
s? 's' (optional)
) end of grouping
[^\s]+ any character except: whitespace
(\n, \r, \t, \f, and " ") (1 or more times)
You have an error in your regex.
Use this:
((\s))(http?:\/\/)|(https?:\/\/)|(www\.)(?!\s{1})
^--- Change to negative lookaround
Btw, I think you can use:
(?:(http?:\/\/)|(https?:\/\/)|(www\.))(?!\s{1})
MATCH 1
3. [0-4] `www.`
MATCH 2
1. [16-23] `http://`
MATCH 3
2. [35-43] `https://`
Not quite sure what you're trying to do, but this should match any group of non-space characters not immediately preceded with "www." case insensitive.
/(https?:\/\/)?(?<!(www\.))[^\s]*/i
... [edit] but you did want to match www.
/(https?:\/\/)?([^\s\.]{2,}\.?)+/i
First things first, to match any whitespace char, use \S construct (in POSIX, you would use [^[:space:]], but JavaScript regex is not POSIX compliant). Here are some common patterns with \S:
\S* - zero or more non-whitespace chars
\S+ - one or more non-whitespace chars
Matching any text until first whitespace can mean match any zero or more chars other than whitespace, so, the answer to the current OP problem is
(?:www|https?)\S*
// ^^^
See the regex demo. This pattern will match up to the first whitespace or end of string. If there must be a whitespace char on the right use
(?:www|https?)\S*(?=\s)
The (?=\s) positive lookahead requires a whitespace immediately to the right of the current location.
Whenver there is a need to match until last whitespace you could match any zero or more chars that are followed with a whitespace, \s, pattern:
/(?:www|https?)[\w\W]*(?=\s)/
/(?:www|https?)[^]*(?=\s)/
// Or even (for ECMAScript 2018+):
/(?:www|https?).*(?=\s)/s
The [\w\W], [^] and . with s flag match any char including line break chars.