Im using an API call to populate some fields on my website. These fields are populated with different parts of an address. However, in my first address line field the value is abbreviated. For example, if i had 'Smith Street' it would get inserted as 'Smith St'. To get around this issue i am using javascript to replace the value, for example:
value = value.replace("St", "Street");
However if i then have, for example, a value that is 'Stanley Street' it would return 'Streetanley Street'.
Does anybody know of a method i can use to apply the replace method to the last word in a string?
You're looking for a regular expression. Get used to them, if you plan on writing much JavaScript.
value = value.replace(/St$/, "Street");
will replace "St" only if it's the end of the string. ($ matches end-of-string)
If we wanted to allow for white space at the end of the string, and still replace, we would say:
value = value.replace(/St\s*$/, "Street");
Where \s means "any white space character" and * means "0 or more times".
And if we want to match both "St" and "St.", we'd say:
value = value.replace(/St\.?\s*$/, "Street");
where \. is just a ".", and ? means "at most once".
To avoid replacing "st" in the middle of a word, use a word boundary (\b):
value = value.replace(/\bSt\.?\s*$/, "Street");
And you probably want to use a case-insensitive match (/i), so "Main st" is converted just as well as "Main Street":
value = value.replace(/\bSt\.?\s*$/i, "Street");
value = value.replace(/(\s)St(\S*)$/, "$1Street$2");
/\sSt\S*$/ will match last word if it is beggining with St (\s - whitespace character, then goes St and then \S - not whitespace character * many times, and then goes $ - end of string).
Then you need to wrap with () any parts you will need to re-use and then re-use them with $1 $2 etc
You can use a boundary across your word. To create a boundary wrap your word in
/\bYourWord or words\b/g
value.replace(/\bSt\b/g, "Street");
You can use the word boundary expression: \b
var abbreviations= {
"st":"street",
"av":"avenue"
//...
};
for( var i in abbreviations ){
str= str.replace( new RegExp( "\\b" + i + "\\b" ,"i" ) , abbreviations[i] );
}
document.querySelector("input").addEventListener("input",function(evt){
document.querySelector("#output").innerHTML= correctAbbreviations(evt.target.value);
});
function correctAbbreviations(str){
var abbreviations= {
"st":"street",
"av":"avenue"
//...
};
for( var i in abbreviations ){
str= str.replace( new RegExp( "\\b" + i + "\\b" ,"i" ) , abbreviations[i] );
}
return str;
};
#output{
background:#ddd;
width:auto;
}
<input type="text" >
<br/><span id="output"></span>
Related
I would like to find all the matches of given strings (divided by spaces) in a string.
(The way for example, iTunes search box works).
That, for example, both "ab de" and "de ab" will return true on "abcde" (also "bc e a" or any order should return true)
If I replace the white space with a wild card, "ab*de" would return true on "abcde", but not "de*ab".
[I use * and not Regex syntax just for this explanation]
I could not find any pure Regex solution for that.
The only solution I could think of is spliting the search term and run multiple Regex.
Is it possible to find a pure Regex expression that will cover all these options ?
Returns true when all parts (divided by , or ' ') of a searchString occur in text. Otherwise false is returned.
filter(text, searchString) {
const regexStr = '(?=.*' + searchString.split(/\,|\s/).join(')(?=.*') + ')';
const searchRegEx = new RegExp(regexStr, 'gi');
return text.match(searchRegEx) !== null;
}
I'm pretty sure you could come up with a regex to do what you want, but it may not be the most efficient approach.
For example, the regex pattern (?=.*bc)(?=.*e)(?=.*a) will match any string that contains bc, e, and a.
var isMatch = 'abcde'.match(/(?=.*bc)(?=.*e)(?=.*a)/) != null; // equals true
var isMatch = 'bcde'.match(/(?=.*bc)(?=.*e)(?=.*a)/) != null; // equals false
You could write a function to dynamically create an expression based on your search terms, but whether it's the best way to accomplish what you are doing is another question.
Alternations are order insensitive:
"abcde".match(/(ab|de)/g); // => ['ab', 'de']
"abcde".match(/(de|ab)/g); // => ['ab', 'de']
So if you have a list of words to match you can build a regex with an alternation on the fly like so:
function regexForWordList(words) {
return new RegExp('(' + words.join('|') + ')', 'g');
}
'abcde'.match(['a', 'e']); // => ['a', 'e']
Try this:
var str = "your string";
str = str.split( " " );
for( var i = 0 ; i < str.length ; i++ ){
// your regexp match
}
This is script which I use - it works also with single word searchStrings
var what="test string with search cool word";
var searchString="search word";
var search = new RegExp(searchString, "gi"); // one-word searching
// multiple search words
if(searchString.indexOf(' ') != -1) {
search="";
var words=searchString.split(" ");
for(var i = 0; i < words.length; i++) {
search+="(?=.*" + words[i] + ")";
}
search = new RegExp(search + ".+", "gi");
}
if(search.test(what)) {
// found
} else {
// notfound
}
I assume you are matching words, or parts of words. You want space-separated search terms to limit search results, and it seems you intend to return only those entries which have all the words that the user supplies. And you intend a wildcard character * to stand for 0 or more characters in a matching word.
For example, if the user searches for the words term1 term2, you intend to return only those items which have both words term1 and term2. If the user searches for the word term*, it would match any word beginning with term.
There are suitable regular expressions which are equivalent to this search language and can be generated from it.
A simple example, the word term, can be asserted in regex by converting to \bterm\b. But two or more words which must match in any order require lookahead assertions. Using extended syntax, the equivalent regex is:
(?= .* \b term1 \b )
(?= .* \b term2 \b )
The asterisk wildcard can be asserted in regex with a character class followed by asterisk. The character class identifies which letters you consider to be part of word. For example, you might find that [A-Za-z0-9]* fits the bill.
In short, you might be satisfied if you convert an expression such as:
foo ba* quux
to:
(?= .* \b foo \b )
(?= .* \b ba[A-Za-z0-9]* \b )
(?= .* \b quux \b )
That is a simple matter of search and replace. But do be careful to sanitize the input string to avoid injection attacks by removing punctuation, etc.
I think you may be barking up the wrong tree with RegEx. What you might want to look at is the Levenshtein distance of two input strings.
There's a Javascript implementation here and a usage example here.
I have this line in my loop:
var regex1 = new RegExp('' + myClass + '[:*].*');
var rule1 = string.match(regex1)
Where "string" is a string of class selectors, for example: .hb-border-top:before, .hb-border-left
and "myClass" is a class: .hb-border-top
As I cycle through strings, i need to match strings that have "myClass" in them, including :before and :hover but not including things like hb-border-top2.
My idea for this regex is to match hb-border-top and then :* to match none or more colons and then the rest of the string.
I need to match:
.hb-fill-top::before
.hb-fill-top:hover::before
.hb-fill-top
.hb-fill-top:hover
but the above returns only:
.hb-fill-top::before
.hb-fill-top:hover::before
.hb-fill-top:hover
and doesn't return .hb-fill-top itself.
So, it has to match .hb-fill-top itself and then anything that follows as long as it starts with :
EDIT:
Picture below: my strings are the contents of {selectorText}.
A string is either a single class or a class with a pseudo element, or a rule with few clases in it, divided by commas.
each string that contains .hb-fill-top ONLY or .hb-fill-top: + something (hover, after, etc) has to be selected. Class is gonna be in variable "myClass" hence my issue as I can't be too precise.
I understand you want to get any CSS selector name that contains the value anywhere inside and has EITHER : and 0+ chars up to the end of string OR finish right there.
Then, to get matches for the .hb-fill-top value you need a solution like
/\.hb-fill-top(?::.*)?$/
and the following JS code to make it all work:
var key = ".hb-fill-top";
var rx = RegExp(key.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + "(?::.*)?$");
var ss = ["something.hb-fill-top::before","something2.hb-fill-top:hover::before","something3.hb-fill-top",".hb-fill-top:hover",".hb-fill-top2:hover",".hb-fill-top-2:hover",".hb-fill-top-bg-br"];
var res = ss.filter(x => rx.test(x));
console.log(res);
Note that .replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') code is necessary to escape the . that is a special regex metacharacter that matches any char but a line break char. See Is there a RegExp.escape function in Javascript?.
The ^ matches the start of a string.
(?::.*)?$ will match:
(?::.*)?$ - an optional (due to the last ? quantifier that matches 1 or 0 occurrences of the quantified subpattern) sequence ((?:...)? is a non-capturing group) of a
: - a colon
.* - any 0+ chars other than line break chars
$ - end of the string.
var regex1 = new RegExp(`^\\${myClass}(:{1,2}\\w+)*$`)
var passes = [
'.hb-fill-top::before',
'.hb-fill-top:hover::before',
'.hb-fill-top',
'.hb-fill-top:hover',
'.hb-fill-top::before',
'.hb-fill-top:hover::before',
'.hb-fill-top:hover'
];
var fails = ['.hb-fill-top-bg-br'];
var myClass = '.hb-fill-top';
var regex = new RegExp(`^\\${myClass}(:{1,2}\\w+)*$`);
passes.forEach(p => console.log(regex.test(p)));
console.log('---');
fails.forEach(f => console.log(regex.test(f)));
var regex1 = new RegExp('\\' + myClass + '(?::[^\s]*)?');
var rule1 = string.match(regex1)
This regex select my class, and everething after if it start with : and stop when it meets a whitespace character.
See the regex in action.
Notice also that I added '\\' at the beginning. This is in order to escape the dot in your className. Otherwise it would have matched something else like
ahb-fill-top
.some-other-hb-fill-top
Also be careful about .* it may match something else after (I don't know your set of strings). You might want to be more precise with :{1,2}[\w-()]+ in the last group. So:
var regex1 = new RegExp('\\' + myClass + '(?::{1,2}[\w-()]+)?');
I have an input of type text where I return true or false depending on a list of banned words. Everything works fine. My problem is that I don't know how to check against words with diacritics from the array:
var bannedWords = ["bad", "mad", "testing", "băţ"];
var regex = new RegExp('\\b' + bannedWords.join("\\b|\\b") + '\\b', 'i');
$(function () {
$("input").on("change", function () {
var valid = !regex.test(this.value);
alert(valid);
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type='text' name='word_to_check'>
Now on the word băţ it returns true instead of false for example.
Chiu's comment is right: 'aaáaa'.match(/\b.+?\b/g) yelds quite counter-intuitive [ "aa", "á", "aa" ], because "word character" (\w) in JavaScript regular expressions is just a shorthand for [A-Za-z0-9_] ('case-insensitive-alpha-numeric-and-underscore'), so word boundary (\b) matches any place between chunk of alpha-numerics and any other character. This makes extracting "Unicode words" quite hard.
For non-unicase writing systems it is possible to identify "word character" by its dual nature: ch.toUpperCase() != ch.toLowerCase(), so your altered snippet could look like this:
var bannedWords = ["bad", "mad", "testing", "băţ", "bať"];
var bannedWordsRegex = new RegExp('-' + bannedWords.join("-|-") + '-', 'i');
$(function() {
$("input").on("input", function() {
var invalid = bannedWordsRegex.test(dashPaddedWords(this.value));
$('#log').html(invalid ? 'bad' : 'good');
});
$("input").trigger("input").focus();
function dashPaddedWords(str) {
return '-' + str.replace(/./g, wordCharOrDash) + '-';
};
function wordCharOrDash(ch) {
return isWordChar(ch) ? ch : '-'
};
function isWordChar(ch) {
return ch.toUpperCase() != ch.toLowerCase();
};
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<input type='text' name='word_to_check' value="ba">
<p id="log"></p>
Let's see what's going on:
alert("băţ".match(/\w\b/));
This is [ "b" ] because word boundary \b doesn't recognize word characters beyond ASCII. JavaScript's "word characters" are strictly [0-9A-Z_a-z], so aä, pπ, and zƶ match \w\b\W since they contain a word character, a word boundary, and a non-word character.
I think the best you can do is something like this:
var bound = '[^\\w\u00c0-\u02c1\u037f-\u0587\u1e00-\u1ffe]';
var regex = new RegExp('(?:^|' + bound + ')(?:'
+ bannedWords.join('|')
+ ')(?=' + bound + '|$)', 'i');
where bound is a reversed list of all ASCII word characters plus most Latin-esque letters, used with start/end of line markers to approximate an internationalized \b. (The second of which is a zero-width lookahead that better mimics \b and therefore works well with the g regex flag.)
Given ["bad", "mad", "testing", "băţ"], this becomes:
/(?:^|[^\w\u00c0-\u02c1\u037f-\u0587\u1e00-\u1ffe])(?:bad|mad|testing|băţ)(?=[^\w\u00c0-\u02c1\u037f-\u0587\u1e00-\u1ffe]|$)/i
This doesn't need anything like ….join('\\b|\\b')… because there are parentheses around the list (and that would create things like \b(?:hey\b|\byou)\b, which is akin to \bhey\b\b|\b\byou\b, including the nonsensical \b\b – which JavaScript interprets as merely \b).
You can also use var bound = '[\\s!-/:-#[-`{-~]' for a simpler ASCII-only list of acceptable non-word characters. Be careful about that order! The dashes indicate ranges between characters.
You need a Unicode aware word boundary. The easiest way is to use XRegExp package.
Although its \b is still ASCII based, there is a \p{L} (or a shorter pL version) construct that matches any Unicode letter from the BMP plane. To build a custom word boundary using this contruct is easy:
\b word \b
---------------------------------------
| | |
([^\pL0-9_]|^) word (?=[^\pL0-9_]|$)
The leading word boundary can be represented with a (non)capturing group ([^\pL0-9_]|^) that matches (and consumes) either a character other than a Unicode letter from the BMP plane, a digit and _ or a start of the string before the word.
The trailing word boundary can be represented with a positive lookahead (?=[^\pL0-9_]|$) that requires a character other than a Unicode letter from the BMP plane, a digit and _ or the end of string after the word.
See the snippet below that will detect băţ as a banned word, and băţy as an allowed word.
var bannedWords = ["bad", "mad", "testing", "băţ"];
var regex = new XRegExp('(?:^|[^\\pL0-9_])(?:' + bannedWords.join("|") + ')(?=$|[^\\pL0-9_])', 'i');
$(function () {
$("input").on("change", function () {
var valid = !regex.test(this.value);
//alert(valid);
console.log("The word is", valid ? "allowed" : "banned");
});
});
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
<input type='text' name='word_to_check'>
In stead of using word boundary, you could do it with
(?:[^\w\u0080-\u02af]+|^)
to check for start of word, and
(?=[^\w\u0080-\u02af]|$)
to check for the end of it.
The [^\w\u0080-\u02af] matches any characters not (^) being basic Latin word characters - \w - or the Unicode 1_Supplement, Extended-A, Extended-B and Extensions. This include some punctuation, but would get very long to match just letters. It may also have to be extended if other character sets have to be included. See for example Wikipedia.
Since javascript doesn't support look-behinds, the start-of-word test consumes any before mentioned non-word characters, but I don't think that should be a problem. The important thing is that the end-of-word test doesn't.
Also, putting these test outside a non capturing group that alternates the words, makes it significantly more effective.
var bannedWords = ["bad", "mad", "testing", "băţ", "båt", "süß"],
regex = new RegExp('(?:[^\\w\\u00c0-\\u02af]+|^)(?:' + bannedWords.join("|") + ')(?=[^\\w\\u00c0-\\u02af]|$)', 'i');
function myFunction() {
document.getElementById('result').innerHTML = 'Banned = ' + regex.test(document.getElementById('word_to_check').value);
}
<!DOCTYPE html>
<html>
<body>
Enter word: <input type='text' id='word_to_check'>
<button onclick='myFunction()'>Test</button>
<p id='result'></p>
</body>
</html>
When dealing with characters outside my base set (which can show up at any time), I convert them to an appropriate base equivalent (8bit, 16bit, 32bit). before running any character matching over them.
var bannedWords = ["bad", "mad", "testing", "băţ"];
var bannedWordsBits = {};
bannedWords.forEach(function(word){
bannedWordsBits[word] = "";
for (var i = 0; i < word.length; i++){
bannedWordsBits[word] += word.charCodeAt(i).toString(16) + "-";
}
});
var bannedWordsJoin = []
var keys = Object.keys(bannedWordsBits);
keys.forEach(function(key){
bannedWordsJoin.push(bannedWordsBits[key]);
});
var regex = new RegExp(bannedWordsJoin.join("|"), 'i');
function checkword(word) {
var wordBits = "";
for (var i = 0; i < word.length; i++){
wordBits += word.charCodeAt(i).toString(16) + "-";
}
return !regex.test(wordBits);
};
The separator "-" is there to make sure that unique characters don't bleed together creating undesired matches.
Very useful as it brings all the characters down to a common base that everything can interact with. And this can be re-encoded back to it's original without having to ship it in key/value pair.
For me the best thing about it is that I don't have to know all of the rules for all of the character sets that I might intersect with, because I can pull them all into a common playing field.
As a side note:
To speed things up, rather than passing the large regex statement that you probably have, which takes exponentially longer to pass with the length of the words that you're banning, I would pass each separate word in the sentence through the filter. And break the filter up into length based segments. like;
checkword3Chars();
checkword4Chars();
checkword5chars();
who's functions you can generate systematically and even create on the fly as and when they become required.
I am trying to make a HTML form that accepts a rating through an input field from the user. The rating is to be a number from 0-10, and I want it to allow up to two decimal places. I am trying to use regular expression, with the following
function isRatingGood()
{
var rating = document.getElementById("rating").value;
var ratingpattern = new RegExp("^[0-9](\.[0-9][0-9]?)?$");
if(ratingpattern.test(rating))
{
alert("Rating Successfully Inputted");
return true;
}
else
{
return rating === "10" || rating === "10.0" || rating === "10.00";
}
}
However, when I enter any 4 or 3 digit number into the field, it still works. It outputs the alert, so I know it is the regular expression that is failing. 5 digit numbers do not work. I used this previous answer as a basis, but it is not working properly for me.
My current understanding is that the beginning of the expression should be a digit, then optionally, a decimal place followed by 1 or 2 digits should be accepted.
You are using a string literal to created the regex. Inside a string literal, \ is the escape character. The string literal
"^[0-9](\.[0-9][0-9]?)?$"
produces the value (and regex):
^[0-9](.[0-9][0-9]?)?$
(you can verify that by entering the string literal in your browser's console)
\. is not valid escape sequence in a string literal, hence the backslash is ignored. Here is similar example:
> "foo\:bar"
"foo:bar"
So you can see above, the . is not escaped in the regex, hence it keeps its special meaning and matches any character. Either escape the backslash in the string literal to create a literal \:
> "^[0-9](\\.[0-9][0-9]?)?$"
"^[0-9](\.[0-9][0-9]?)?$"
or use a regex literal:
/^[0-9](\.[0-9][0-9]?)?$/
The regular expression you're using will parsed to
/^[0-9](.[0-9][0-9]?)?$/
Here . will match any character except newline.
To make it match the . literal, you need to add an extra \ for escaping the \.
var ratingpattern = new RegExp("^[0-9](\\.[0-9][0-9]?)?$");
Or, you can simply use
var ratingPattern = /^[0-9](\.[0-9][0-9]?)?$/;
You can also use \d instead of the class [0-9].
var ratingPattern = /^\d(\.\d{1,2})?$/;
Demo
var ratingpattern = new RegExp("^[0-9](\\.[0-9][0-9]?)?$");
function isRatingGood() {
var rating = document.getElementById("rating").value;
if (ratingpattern.test(rating)) {
alert("Rating Successfully Inputted");
return true;
} else {
return rating === "10" || rating === "10.0" || rating === "10.00";
}
}
<input type="text" id="rating" />
<button onclick="isRatingGood()">Check</button>
Below find a regex candidate for your task:
^[0-1]?\d(\.\d{0,2})?$
Demo with explanation
var list = ['03.003', '05.05', '9.01', '10', '10.05', '100', '1', '2.', '2.12'];
var regex = /^[0-1]?\d(\.\d{0,2})?$/;
for (var index in list) {
var str = list[index];
var match = regex.test(str);
console.log(str + ' : ' + match);
}
This should also do the job. You don't need to escape dots from inside the square brackets:
^((10|\d{1})|\d{1}[.]\d{1,2})$
Also if you want have max rating 10 use
10| ---- accept 10
\d{1})| ---- accept whole numbers from 0-9 replace \d with [1-9]{1} if don't want 0 in this
\d{1}[.]\d{1,2} ---- accept number with two or one numbers after the coma from 0 to 9
LIVE DEMO: https://regex101.com/r/hY5tG4/7
Any character except ^-]\ All characters except the listed special characters are literal characters that add themselves to the character class. [abc] matches a, b or c literal characters
Just answered this myself.
Need to add square brackets to the decimal point, so the regular expression looks like
var ratingpattern = new RegExp("^[0-9]([\.][0-9][0-9]?)?$");
I've been trying to find a way to match a number in a Javascript string that is surrounded by parenthesis at the end of the string, then increment it.
Say I have a string:
var name = "Item Name (4)";
I need a RegExp to match the (4) part, and then I need to increment the 4 then put it back into the string.
This is the regex I have so far:
\b([0-9]+)$\b
This regex does not work. Furthermore, I do not know how to extract the integer retrieved and put it back in the same location in the string.
Thanks.
The replace method can take a function as its second argument. It gets the match (including submatches) and returns the replacement string. Others have already mentioned that the parentheses need to be escaped.
"Item Name (4)".replace(/\((\d+)\)/, function(fullMatch, n) {
return "(" + (Number(n) + 1) + ")";
});
I can can only think of a way of doing it in three steps: Extract, increment and replace.
// Tested on rhino
var name = "Item Name (4)";
var re = /\((\d+)\)/;
match = re.exec(name);
number = parseInt(match[1]) + 1;
name = name.replace(re, "(" + number + ")");
The important parts of the pattern:
You need to escape the parens to match literal parens
You also need the to use parens to capture the number so that you can extract it from the match.
\d matches a digit and is shorter and more common than writing out [0-9].
In order this pattern to work you shoud escape parenthesis. In addition \b and $ are unneeded. Thus
var s = "Item Name (4)";
var match = /\((\d+)\)/.exec( s );
var n = Number(match[1])+1;
alert( s.replace( /\(\d+\)/, '('+n+')' ) );
Solution by david.clarke (tested)
"Item Name (4)".replace(/\(([0-9]+)\)/, '('+(1+RegExp.$1) + ')');
But I think it is too concise
UPD: It turned out that RegExp.$1 can't be used as part of replace parameter, because it works only in Opera
'var name = "Item Name (4)"'.replace(/\(([\d]+)\)/, 1 + $1);
(untested)