I need to hide surname of persons. For persons with three words in their name, just hide last word, ej:
Laura Torres Bermudez
shoud be
Laura Torres ********
and for
Maria Fernanda Gonzales Lopez
should be
Maria Fernanda ******** *****
I think they are two regex because based on the number of words, regex will be applied.
I know \w+ replaces all word by a single asterisk, and with (?!\s). I can replace chars except spaces. I hope you can help me. Thanks.
This is my example:
https://regex101.com/r/yW4aZ3/942
Try this:
(?<=\w+\s+\w+\s+.*)[^\s]
Explanation:
?<= is a negative lookbehind - match only occurrences preceded by specified pattern
[^\s] / match everything except whitespace (what you used - (?!\s). - is actually weird use of lookahead - "look to next character, if it is not a whitespate; then match any character")
summary: replace any non-whitespace space character preceded by at least two sequences of letters (\w) and spaces (\s).
Just note that it won't hide anything for persons with only two words in their name (which is common in many countries).
Also, the regex has to be slightly modified for that testing tool to match one name per line - see https://regex101.com/r/yW4aZ3/943 (^ was added to match from start of each line and a "multi line" flag was set).
A JavaScript solution that does not rely on the ECMAScript 2018 extended regex features is
s = s.replace(/^(\S+\s+\S+)([\s\S]*)/, function($0, $1, $2) {return $1 + $2.replace(/\S/g, '*');})
Details:
^ - start of string
(\S+\s+\S+) - Group 1: one or more non-whitespaces, 1 or more whitespaces and then 1 or more non-whitespaces
([\s\S]*) - Group 2: any 1 or more chars.
The replacement is Group 1 contents and the contents of Group 2 with each non-whitespace char replaced with an asterisk.
Java solution:
s = s.replaceAll("(\\G(?!^)\\s*|^\\S+\\s+\\S+\\s+)\\S", "$1*");
See the regex demo
Details
(\G(?!^)\s*|^\S+\s+\S+\s+) - Group 1: either then end of the previous match (\G(?!^)) and 0 or more whitespaces or (|) 1+ non-whitespaces, 1+ whitespaces and again 1+ non-whitespaces, 1+ whitespaces at the start of the string
\S - a non-whitespace char.
Interested if this can be done in JavaScript without a callback, I came up with
str = str.replace(/^(\w+\W+\w+\W+\b)\w?|(?!^)(\W*)\w/gy, '$1$2*');
See this demo at regex101
The idea might look a bit confusing but it seems to work fine. It should fail on one or two words but start as soon, as there appears a word character after the first two words. Important to use the sticky flag y which is similar to the \G anchor (continue on last match) but always is bound to start.
To not add an additional asterisk, the ...\b)\w?... part after the first two words is essential. The word boundary will force a third word to start but the first capturing group is closed after \b and the first character of the third word will be consumed but not captured to correctly match the asterisk count.
The second capturing group on the right side of the alternation will capture any optional non word characters appearing between any words after the third one.
var strs = ['Foo', 'Foo Bar B', 'Laura Torres Bermudez', 'Maria Fernanda Gonzales Lopez'];
strs = strs.map(str => str.replace(/^(\w+\W+\w+\W+\b)\w?|(?!^)(\W*)\w/gy, '$1$2*'));
console.log(strs);
Related
I have following example of american addresses.
6301 Stonewood Dr Apt-728, Plano TX-75024
13323 Maham Road, Apt # 1621, Dallas, TX 75240
17040 Carlson Drive, #1027 Parker, CO 80134
3465 25th St., San Francisco, CA 94110
I want to extract city from using regex
Plano, Dallas, Parker, San Francisco
I am using following regex which is working for first example
(?<=[,.|•]).*\s+(?=[\s,.]?CA?[\s,.-]?[\d]{4,})
can you help me for the same as?
You can match the comma, then all except A-Z and capture from the first occurrence of A-Z.
,[^A-Z,]*?\b([A-Z][^,]*?),?\s*[A-Z]{2}[-\s]\d{4,}\s*$
Explanation
,[^A-Z,]*?\b Match a comma, then any char except A-Z or a comma till a word boundary
([A-Z][^,]*?) Capture group 1 Match A-Z and then any char except a comma as least as possible
,?\s*[A-Z]{2} match optional comma, optional whiteapace chars and 2 uppecase chars A-Z
[-\s]\d{4,}\s* Match either - or a whitespace char and then 4 or more digits followed by optional whiteapace chars
$ end of string
Regex demo
You can use
,(?:\s*#\d+)?\s*([^\s,][^,]*)(?=\W+[A-Z]{2}\W*\d{4,}\s*$)
See the regex demo. The necessary value is in Group 1.
Details:
, - a comma
(?:\s*#\d+)? - an optional sequence of zero or more whitespaces, # and then one or more digits
\s* - zero or more whitespaces
([^\s,][^,]*) - Group 1: a char other than whitespace and comma and then zero or more non-comma chars
(?=\W+[A-Z]{2}\W*\d{4,}\s*$) - a positive lookahead that requires (immediately on the right)
\W+ - one or more non-word chars
[A-Z]{2} - two uppercase ASCII letters
\W* - zero or more non-word chars
\d{4,} - gfour or more digits
\s* - zero or more whitespaces
$ - end of string.
Another approach (assuming the structure of ending is more or less fixed)
.+\s(\w+?),?.{4}\d{4,}
The best guess I could achieve was starting from the end of the string looking for a chain of non-spacing characters (being the portion you are looking for) followed by a space, a chain of capital letters, then an option space/dash and in the end a chain of numbers.
([^\s]+?)\,?\s[A-Z]+[\s\-]?\d+$
Being the first group, the target you are aiming for.
This is a live example with your use case embedded:
https://regexr.com/6nkq5
(as a side note, the demo on regexr may tell you the expression took more than 250ms and can't render.. you just slightly edit the test case to make it update and show you the actual result)
As long as your match comes always after the (exactly) two country letters, you can use that simple condition to match your city.
(?<= )[A-Za-z ]+(?=,? [A-Z]{2})
Your match [A-Za-z ]+ will be found between
(?<= ): a space and
(?=,? [A-Z]{2}): an optional comma + a space + two uppercase letters
Check the demo here.
I want a JS regex that only matches names with capital letters at the beginning of each word and lowercase letters thereafter. (I don't care about technical accuracy as much as visual consistency — avoiding people using, say, all caps or all lower cases, for example.)
I have the following Regex from this answer as my starting point.
/^[a-z ,.'-]+$/gmi
Here is a link to the following Regex on regex101.com.
As you can see, it matches strings like jane doe which I want to prevent. And only want it to match Jane Doe instead.
How can I accomplish that?
Match [A-Z] initially, then use your original character set afterwards (sans space), and make sure not to use the case-insensitive flag:
/^[A-Z][a-z,.'-]+(?: [A-Z][a-z,.'-]+)*$/g
https://regex101.com/r/y172cv/1
You might want the non-word characters to only be permitted at word boundaries, to ensure there are alphabetical characters on each side of, eg, ,, ., ', and -:
^[A-Z](?:[a-z]|\b[,.'-]\b)+(?: [A-Z](?:[a-z]|\b[,.'-]\b)+)*$
https://regex101.com/r/nP8epM/2
If you want a capital letter at the beginning and lowercase letters following where the name can possibly end on one of ,.'- you might use:
^[A-Z][a-z]+[,.'-]?(?: [A-Z][a-z]+[,.'-]?)*$
^ Start of string
[A-Z][a-z]+ Match an uppercase char, then 1+ lowercase chars a-z
[,.'-]? Optionally match one of ,.'-
(?: Non capturing group
[A-Z][a-z]+[,.'-]? Match a space, then repeat the same pattern as before
)* Close group and repeat 0+ times to also match a single name
$ End of string
Regex demo
Here's my solution to this problem
const str = "jane dane"
console.log(str.replace(/(^\w{1})|(\s\w{1})/g, (v) => v.toUpperCase()));
So first find the first letter in the first word (^\w{1}), then use the PIPE | operator which serves as an OR in regex and look for the second block of the name ie last name where the it is preceded by space and capture the letter. (\s\w{1}). Then to close it off with the /g flag you continue to run through the string for any iterations of these conditions set.
Finally you have the function to uppercase them. This works for any name containing first, middle and lastname.
I have the following javascript regex:
/^[^\s][a-z0-9 ]+[^\s]$/i
I need to allow any alphanumeric character as well as spaces inside the string but not at the beginning nor at the end.
Oddly enough, the above regex will not accept less than 3 characters, e.g. aa will not match but aaa will.
I am not sure why. Can anyone please help ?
You have: [^\s] (requires matching at least one non-whitespace character), [a-z0-9 ]+ (requires matching at least one alphanumeric or space character), and [^\s] again (requires matching at least one non-whitespace character). So, in total, you need at least 3 characters in the string.
Use word boundaries at the beginning and end instead:
/^\b[a-z0-9 ]+\b$/i
https://regex101.com/r/2GhH3N/1
Try the following regex:
^(?! )[a-z0-9 ]*[a-z0-9]$
Details:
^(?! ) - Start of the string and no space after it (so here we exclude the
initial space).
[a-z0-9 ]* - A sequence of letters, digits and spaces, possibly empty
(the content before the last letter(see below).
[a-z0-9]$ - The last letter and the end of string (so here we exclude the
terminal space).
You should re-write the expression as
/^[a-z0-9]+(?:\s+[a-z0-9]+)*$/i
See the regex demo.
NOTE: If only one whitespace is allowed between the alphanumeric chars use
/^[a-z0-9]+(?:\s[a-z0-9]+)*$/i
^^
Details
^ - start of string
[a-z0-9]+ - 1+ letters/digits
(?:\s+[a-z0-9]+)* - 0 or more repetitions of 1+ whitespaces (\s+) and 1+ digit/letters
$ - end of string.
See the regex graph:
Its given: /(\S)\1(\1)+/g matches all occurrences of three equal non-whitespace characters following each other.
I don't understand why there is () around (\S) and 2nd (\1), but not around 1st (\1). Can anyone help in explaining how above regex works?
src: http://www.javascriptkit.com/javatutors/redev2.shtml
Thnx in advance.
The \S needs parentheses to capture its value, so you can refer back to the captured value with \1. \1 means "match the same text which capturing group #1 matched".
I believe there is a problem with this regex. You said you want to match "three equal non-whitespace characters". But the + will make this match 3 or more equal, consecutive non-whitespace characters.
The g on the end means "apply this regex over the entire input string, or globally".
The second set of parentheses is not necessary. It needlessly captures the repeated character a second time, while matching the same strings as this regex:
/(\S)\1\1+/g
Also, as #AlexD pointed out, the description should say that it matches at least three characters. If you replaced that regex with BONK in the string fooxxxxxxbar:
'fooxxxxxxbar'.replace(/(\S)\1\1+/g, 'BONK')
..you might expect the result to be fooBONKBONKbar from their description, because there are two sets of three 'x's. But in fact the result would be fooBONKbar; the first \1 matches the second 'x', and the \1+ matches the third 'x' and any 'x's that follow it. If they wanted to match just three characters, they should have left the + off.
I noticed several other sloppy descriptions like that, plus at least one outright error: \B is equivalent to (?!\b) (a position that's not a word boundary), not [^\b] (a character that's not a backspace). For that matter, their description of word boundaries--"the position between a word and a space"--is wrong, too. A word boundary isn't defined by any particular character, like a space--in fact, it can just as well be the absence of any character that creates one. The string:
Word
...starts with a word boundary because 'W' is a word character and, being first, it's not preceded by another word character. Similarly, the 'd' is not followed by another word character, so the end of the string is also a word boundary.
Also, a regex doesn't know from words, only word characters. The definition of a word character can vary depending on the regex flavor and Unicode or locale settings, but it always includes [A-Za-z0-9_] (ASCII letters and digits plus the underscore). A word boundary is simply a position that's between one of those characters and any other character (or no other character, as I explained earlier).
If you want to learn about regexes, I suggest you forget that site and start here instead: regular-expressions.info.
Hy, is there a way to find the first letter of the last word in a string? The strings are results in a XML parser function. Inside the each() loop i get all the nodes and put every name inside a variable like this: var person = xml.find("name").find().text()
Now person holds a string, it could be:
Anamaria Forrest Gump
John Lock
As you see, the first string holds 3 words, while the second holds 2 words.
What i need are the first letters from the last words: "G", "L",
How do i accomplish this? TY
This should do it:
var person = xml.find("name").find().text();
var names = person.split(' ');
var firstLetterOfSurname = names[names.length - 1].charAt(0);
This solution will work even if your string contains a single word. It returns the desired character:
myString.match(/(\w)\w*$/)[1];
Explanation: "Match a word character (and memorize it) (\w), then match any number of word characters \w*, then match the end of the string $". In other words : "Match a sequence of word characters at the end of the string (and memorize the first of these word characters)". match returns an array with the whole match in [0] and then the memorized strings in [1], [2], etc. Here we want [1].
Regexps are enclosed in / in javascript : http://www.w3schools.com/js/js_obj_regexp.asp
You can hack it with regex:
'Marry Jo Poppins'.replace(/^.*\s+(\w)\w+$/, "$1"); // P
'Anamaria Forrest Gump'.replace(/^.*\s+(\w)\w+$/, "$1"); // G
Otherwise Mark B's answer is fine, too :)
edit:
Alsciende's regex+javascript combo myString.match(/(\w)\w*$/)[1] is probably a little more versatile than mine.
regular expression explanation
/^.*\s+(\w)\w+$/
^ beginning of input string
.* followed by any character (.) 0 or more times (*)
\s+ followed by any whitespace (\s) 1 or more times (+)
( group and capture to $1
\w followed by any word character (\w)
) end capture
\w+ followed by any word character (\w) 1 or more times (+)
$ end of string (before newline (\n))
Alsciende's regex
/(\w)\w*$/
( group and capture to $1
\w any word character
) end capture
\w* any word character (\w) 0 or more times (*)
summary
Regular expressions are awesomely powerful, or as you might say, "Godlike!" Regular-Expressions.info is a great starting point if you'd like to learn more.
Hope this helps :)