I need to write a little RegEx matcher which will match any occurrence of strings in the form of
[a-zA-Z]+(_[a-zA-Z0-9]+)?
If I use the regex above it does match the sections needed but would also match onto the abc part of 4_abc which is not intended. I tried to exclude it with:
(?:[^a-zA-Z0-9_]|^)([a-zA-Z]+(_[a-zA-Z0-9]+)?)(?:[^a-zA-Z0-9_]|$)
The problem is that the 'not' matches at the beginning and end are not really working like I hoped they would. If I use them on the example
a_d Dd_da 4_d d_4
they would block matching the second Dd_da because the space was used in the first match.Sadly I can't use lookarounds because I am using JS.
So the input:
a_d Dd_da 4_d d_4
should match: a_d, Dd_da and d_4
but matches: a_d (there is a space at the end)
Is there another way to match the needed sections, or to not consume the 'anchor' matches?
I really appreciate your help.
You can make use of \b:
\b[a-zA-Z]+(_[a-zA-Z0-9]+)?\b
\b matches the (zero-width) point where either the preceding character or following character is a letter, digit or underscore, but not both. It also matches with the start/end of the string if the first/last character is a letter, digit or underscore.
I am trying to capture varients of a word using Microsft Word find and replace function. Here is a searchable snippet:
There are going to be 3 instances of the word successful for the purpose of Regex matching. Here is the second sucesfull and here is another succesfull , both spelt incorrectly.
This is my Regex expression used in Find and Replace with "Use Wildcards" selected (I have also tried this with replacing the braces with brackets with no joy)
<([Ss]uc[1,]es[1,]ful[1,])>
[Ss]uc{1,}es{1,}ful{1,}
Replace the [ ] with { } and it should work fine. The curly braces specify how many times you want a character to repeat. Square brackets are used to specify the acceptable characters.
So the current regular expression will match the following.
succcccesssfulll
sucesful
successful
Successsssfull
and so on.
I think this is cleaner and easier to type.
[Ss]uc+es+ful+
"+" counts for one or more occurrence of a character.
The search string you want would be:
<[sS]uc#es#ful#>
This searches for a word (the < and > symbols) starting with either s or S and including one or more (the # symbol) of c, s, and l.
I want to take all lines except which contains # symbol
This is the regex for it ^[^#]*$/gm
Now how do i select only words in it as \S\S*?
Finally i want to combine these two regex ^[^#]*$/gm and \S\S*
Sample here
You probably need to make it a two-step process: First filter all lines by /^[^#]*$/, afterwards get all matches for /\S+/ from that line. You can't have an arbitrary number of matches from a single regex (e.g. all »words« individually). Unless you want all words separated by whitespace in a single match, such as /\S+(\s+\S+)*/, but even then you'd essentially just get the whole line in a single match, so there's little point to it.
Is there a regular expression pattern in JavaScript which searches for strings which starts with a letter (e.g. letter B) and consists of fixed number of characters (e.g. 8)?
I have tried a lot of variations with ^B followed by [A-Za-z]{7}, but nothing worked out.
UPDATE:
As a final solution an alternative version of #stribizhev's answer worked for me. As I was filtering object attributes in a relational DB style, I had to match the exact string without returning records with multiple words starting with the matching string and separated by whitespaces.
The RegEx \bB/S{7}$\b worked, as a record can contain special characters, and the whitespace character acts as the word separator, as in any human-friendly table.
\bB\w{7}\b is a pattern for any word starting with B and that has 8 characters. Have a look at https://regex101.com/r/tF3aA5/1.
The word boundary \b enables the whole word matching.
I was trying to do a regex for someone else when I ran into this problem. The requirement was that the regex should return results from a set of strings that has, let's say, "apple" in it. For example, consider the following strings:
"I have an apple"
"You have two Apples"
"I give you one more orange"
The result set should have the first two strings.
The regex(es) I tried are:
/[aA]pple/ and /[^a-zA-Z0-9][aA]pple/
The problem with the first one is that words like "aapple", "bapple", etc (ok, so they are meaningless, but still...) test positive with it, and the problem with the second one is that when a string actually starts with the word "apple", "Apples and oranges", for example, it tests negative. Can someone explain why the second regex behaves this way and what the correct regex would be?
/(^.*?\bapples?\b.*$)/i
Edit: The above will match the entire string containing the word "apples", which I thought is what you were asking for. If you are just trying to see if the string contains the word, the following will work.
/\bapples?\b/i
The regex(es) I tried are:
/[aA]pple/ and /[^a-zA-Z0-9][aA]pple/
The first one just checks for the existence of the following characters, in order: a-p-p-l-e, regardless of what context they are used in. The \b, or word-boundary character, matches any spot where a non-word character and a word character meet, ala \W\w.
The second one is trying to match other characters before the occurrance of a-p-p-l-e, and is essentially the same as the first, except it requires other characters in front of it.
The one I answered with works like following. From the beginning of the string, matches any characters (if they exist) non-greedily until it encounters a word boundary. If the string starts with apple, the beginning of a string is a word-boundary, so it still matches. It then matches the letters a-p-p-l-e, and s if it exists, followed by another word boundary. It then matches all characters to the end of the string. The /i at the end means it's case-insensitive, so 'Apple', 'APPLE', and 'apple' are all valid.
If you have the time, I would highly recommend walking through the tutorial at http://regular-expressions.info. It really goes in-depth and talks about how the regular expression engines match different expressions, it helped me a ton.
To build on #tj111, the reason your second regex fails is that [^a-zA-Z0-9] requires that a character matches; that is, there is some character in that position, and its value is not contained in the set [a-zA-Z0-9]. Markers like \b are called "zero-width assertions". \b, in particular, matches against boundaries between characters or at the beginning or end of a string. Because it is not matching against any character, its "width" is zero.
In sum, [^a-zA-Z0-9] requires a character that does not take a particular value be present, while \b requires only that a boundary be present.
Edit: #tj111 has added most of this to his response. I'm in too late, again :)
This works for apple and apples and its case-insensitive spellings:
var strings = ["I have an apple", "You have two Apples", "I give you one more orange"];
var result = [];
var pattern = /\bapples?\b/i;
for (var i=0; i<strings.length; i++) {
if (pattern.test(strings[i])) {
result.push(strings[i]);
}
}
Your second regex requires a nonalphanumeric character before the first a in apple. "apple" doesn't satisfy this. As others note, "\b" matches not a character, but a word boundary position.
/\bapple/i
\b is a word boundary.
To explain why your attempts do not work, the first one does not check that it is the beginning of the word, so it can have something before it. The second regex you gave says that something must be before the word "apple", but it can't be alphanumeric.