Regex: Character set with special characters - javascript

Is it possible for a regex character set ([abc]) to have characters with special meanings (like $ for end of the line)? For example, is it possible to make a characters set that matches either a / or the start of the line ^?
I tried /[^\/]/g but it just checks for a literal ^ or a /.
Note: I'm using JavaScript.

No you can't create custom special meaning, but you can be clever about the way you use the regex.
When doing complicated regex I test if any of these special characters exist in the string I'm testing -
const str = '¡¢my string <>some content <>between</> fragment</> and other data...'
const markers = ['\u00A1','\u00A2','\u00A4']; // ['¡','¢','¤'] you can add others
const safeMarker = markers.find(marker => str.indexOf(marker) === -1) // ¤ - this is not in the string so I can use it as a marker
if (safeMarker) {
const replaced = str.replace(/<\/>/g, safeMarker); // output = 'my string <>some content <>between¤ fragment¤ and other data...'
// do something with this regex and so on...
// then replace the string back
}
The beauty of this is that you can convert any combination of characters into a single marker, which means you can use it in your Negation expression like this:
/<>[^\u00A4]*\u00A4/g
Which would have been equivalent to (ie get the content between the tags)
<>[^</>]*</>

Related

Having hard time with jQuery and replace string value

Im currently developing a posting [like What's on your mind] feature where im using twemoji plugin for emojis.
For some security reasons, i have to convert the emoji into its alt code/image filename before it stores to the database.
And convert it back to image when its being displayed on the feeds.
In my case I use [emoji=filename.png]
for example i have this string:
var string = "[emoji=1f938.png] [emoji=1f938-200d-2642-fe0f.png] [emoji=26f9-fe0f.png]";
string.replace(/-fe0f.png/g, '.png')
.replace(/\[emoji=(.*?)\]/g,'<img src="https://example.net/images/$1">');
the snippet above is working fine, but the only problem is it removes All -fe0f.png in the filename which causes some broken image.
What I want to achive is to remove the -fe0f.png part only when the filename length is <= 14. or maybe if the file name is consist of something like this: (char)-fe0f.png , but if it has more than (char) like (char)-(char)-(char)-fe0f.png, it should still remain the same..
the result should be:
from
[emoji=1f938.png] [emoji=1f938-200d-2642-fe0f.png] [emoji=26f9-fe0f.png]
to
[emoji=1f938.png] [emoji=1f938-200d-2642-fe0f.png] [emoji=26f9.png]
UPDATE:
I just noticed now that there are filenames like this 30-fe0f-20e3.png
but it needs to remove -fe0f in the middle.
so instead of [emoji=30-fe0f-20e3.png],
i need to have [emoji=30-20e3.png]
The file name length limit is equal to fourteen. Thus, there should be "nine" characters before the "-fe0f"
[^=] means all characters except "="
<![^=])a means there must not "=" before the "a"
<![^=]{9})a means it must not has a "=" character during the nine characters before the letter "a".
(?<![^=]{9})-fe0f.png means it must not has a "=" character during the nine characters before the "-fe0f.png".
So your new code should be like the below:
var string = "[emoji=1f938.png] [emoji=1f938-200d-2642-fe0f.png] [emoji=26f9-fe0f.png]";
string.replace(/(?<![^=]{9})-fe0f.png/g, '.png')
.replace(/\[emoji=(.*?)\]/g,'<img src="https://example.net/images/$1">');
Replacing the data in the example string:
const regex = /(\[emoji=[^\s\]\[]{0,13})-fe0f(\.png)/g;
let string = "[emoji=1f938.png] [emoji=1f938-200d-2642-fe0f.png] [emoji=26f9-fe0f.png]";
string = string.replace(regex, '$1$2');
console.log(string);
You can do the replacement in one replace call with a match and a capture group, matching 0-13 characters after emoji=
\[emoji=([^\s\]\[]{0,13})-fe0f\.png]
The pattern matches:
\[emoji= Match [emoji=
( Capture group 1
[^\s\]\[]{0,13} Match 0-13 times a non whitespace char except for [ and ]
) Close group 1
-fe0f\.png] Match literally (note to escape the dot)
regex demo
const regex = /\[emoji=([^\s\]\[]{0,13})-fe0f\.png]/g;
let string = "[emoji=1f938.png] [emoji=1f938-200d-2642-fe0f.png] [emoji=26f9-fe0f.png]";
string = string.replace(regex, '<img src="https://example.net/images/$1.png">');
console.log(string);
This should do it if you are just trying to not replace for greater than 14 chars.
if (string.length > 14) {
// do your replace here
}
Now, not sure if you are suggesting that if there's more than one "-" that you don't want to replace either.

Regexp to explode url

I have a string url like "home/products/product_name_1/details/some_options"
And i want to parse it into array with Regexp to ["home", "products","product","details","some"]
So the rule is "split by words if backslash, but if the word have underscores - take only that part that comes before first underscore"
JavaScript equivalent for this regex is
str.split("/").map(item => item.indexOf("_") > -1 ? item.split("_")[0] : item)
Please help!
you can use this pattern
(?<!\w)[^/_]+
results
['home', 'products', 'product', 'details', 'some']
python code
import re
str="home/products/product_name_1/details/some_options"
re.findall('(?<!\w)[^/_]+',str)
['home', 'products', 'product', 'details', 'some']
Try this:
input = ["home/products/product_name_1/details/some_options",
"company/products/cars_all/details/black_color",
"public/places/1_cities/disctricts/1234_something"]
let pattern = /([a-zA-Z\d]*)(?:\/|_.*?(?:\/|$))/gmi
input.forEach(el => {
let matches = el.matchAll(pattern)
for (const match of matches) {
console.log(match[1]);
}
})
Remove \d from the regex pattern if you dont want digits in the url.
I have used matchAll here, matchAll returns a iterator, use that to get each match object, inside which the first element is the full match, and the second elemnt(index: 1) is the required group.
/([a-zA-Z\d]*)(?:\/|_.*?(?:\/|$))/gmi
/
([a-zA-Z\d]*) capture group to match letters and digits
(?:\/|_.*?(?:\/|$)) non capture group to match '/' or '_' and everything till another '/' or end of the line is found
/gmi
You can test this regex here: https://regex101.com/r/B5Bo74/1
You can use:
\b[^\W_]+
\b A word boundary to prevent a partial match
[^\W_]+ Match 1+ word characters except for _
See a regex demo.
const s = "home/products/product_name_1/details/some_options";
const regex = /\b[^\W_]+/g;
console.log(s.match(regex));
If there has to be a leading / or the start of the string before the match, you can use an alternation (?:^|\/) and use a capture group for the values that you want to keep:
const s = "home/products/product_name_1/details/some_options";
const regex = /(?:^|\/)([^\W_]+)/g;
console.log(Array.from(s.matchAll(regex), m => m[1]));
Given input:
string "home/products/product_name_1/details/some_options"
Expected output:
array ["home", "products", "product", "details", "some"]
Note: ignore/exclude name, 1, options (because word occurs after 1st underscore).
Task:
split URI by slash into a set of path-segments (words)
(if the path-segment or word contains underscores) remove the part after first underscore
Regex to match
With a regex \/|_\w+ you could match the URL-path separator (slash) and excluded word-part (every word after an underscore).
Then use this regex
either as separator to split the string into its parts(excluding the regex matches): e.g. in JS split(/\/|_\w+/)
or as search-pattern in replace to prepare a string that can be easily split: e.g. in JS replaceAll(/\/|_\w+/g, ',') to obtain a CSV row which can be easily split by comma `split(',')
Beware: The regular-expression itself (flavor) and functions to apply it depend on your environment/regex-engine and script-/programming-language.
Regex applied in Javascript
split by regex
For example in Javascript use url.split(/\/|_\w*/) where:
/pattern/: everything inside the slashes is the regex-pattern
\/: a c slash (URL-path-separator)
|: the alternate junction, interpreted as boolean OR
_\w*: zero or more (*) word-characters (w, i.e. letter from alphabet, numeric digit or underscore) following an underscore
See also:
Use of capture groups in String.split()
However, this returns also empty strings (as empty split-off second parts inside underscore-containing path-segments). We can remove the empty strings with a filter where predicate s => s returns true if the string is non-empty.
Demo to solve your task:
const url = "home/products/product_name_1/details/some_options";
let firstWordsInSegments = url.split(/\/|_\w*/).filter(s => s);
console.log(firstWordsInSegments);
const urlDuplicate = "home/products/product_name_1/details/some_options/_/home";
console.log(urlDuplicate.split(/\/|_\w*/).filter(s => s)); // contains duplicates in output array
replace into CSV, then split and exclude (map,replace,filter)
The CSV containing path-segments can be split by comma and resulting parts (path-segments) can be filtered or replaced to exclude unwanted sub-parts.
using:
replaceAll to transform to CSV or remove empty strings. Note: global flag required when calling replaceAll with regex
map to remove unwanted parts after underscore
filter(s => s) to filter out empty strings
const url = "home/products/product_name_1/details/some_options";
// step by step
let pathSegments = url.split('/');
console.log('pathSegments:', pathSegments);
let firstWordsInSegments = pathSegments.map(s => s.replaceAll(/_\w*/g,''));
console.log(firstWordsInSegments);
// replace to obtain CSV and then split
let csv = "home/products/product_name_1/details/some_options/_/home".replaceAll(/\/|_\w+/g, ',');
console.log('csv:', csv);
let parts = csv.split(',');
console.log('parts:', parts); // contains empty parts
let nonEmptyParts = parts.filter(s => s);
console.log('nonEmptyParts:', nonEmptyParts); // filtered out empty parts
Bonus Tip
Try your regex online (e.g. regex101 or regexplanet). See the demo on regex101.
You could split the url with this regex
(_\w*)+|(\/)
This matches the /, _name_1 and _options.
BUT depending what you are trying to to, or which language do you use, there are way better options to do this.
You can try a pattern like \/([^\/_]+){1,} (assuming that the path starts with '/' and the components are separated by '/'); depending on language you might get an array or iterator that will give the components.
Try ^[[:alpha:]]+|(?<=\/)[[:alpha:]]+ or ^[a-zA-Z]+|(?<=\/)[a-zA-Z]+ if [[:alpha:]] is not supported , it matches one or more characters on the beginning or after slash until first non char.

How to write regexp for finding :smile: in javascript?

I want to write a regular expression, in JavaScript, for finding the string starting and ending with :.
For example "hello :smile: :sleeping:" from this string I need to find the strings which are starting and ending with the : characters. I tried the expression below, but it didn't work:
^:.*\:$
My guess is that you not only want to find the string, but also replace it. For that you should look at using a capture in the regexp combined with a replacement function.
const emojiPattern = /:(\w+):/g
function replaceEmojiTags(text) {
return text.replace(emojiPattern, function (tag, emotion) {
// The emotion will be the captured word between your tags,
// so either "sleep" or "sleeping" in your example
//
// In this function you would take that emotion and return
// whatever you want based on the input parameter and the
// whole tag would be replaced
//
// As an example, let's say you had a bunch of GIF images
// for the different emotions:
return '<img src="/img/emoji/' + emotion + '.gif" />';
});
}
With that code you could then run your function on any input string and replace the tags to get the HTML for the actual images in them. As in your example:
replaceEmojiTags('hello :smile: :sleeping:')
// 'hello <img src="/img/emoji/smile.gif" /> <img src="/img/emoji/sleeping.gif" />'
EDIT: To support hyphens within the emotion, as in "big-smile", the pattern needs to be changed since it is only looking for word characters. For this there is probably also a restriction such that the hyphen must join two words so that it shouldn't accept "-big-smile" or "big-smile-". For that you need to change the pattern to:
const emojiPattern = /:(\w+(-\w+)*):/g
That pattern is looking for any word that is then followed by zero or more instances of a hyphen followed by a word. It would match any of the following: "smile", "big-smile", "big-smile-bigger".
The ^ and $ are anchors (start and end respectively). These cause your regex to explicitly match an entire string which starts with : has anything between it and ends with :.
If you want to match characters within a string you can remove the anchors.
Your * indicates zero or more so you'll be matching :: as well. It'll be better to change this to + which means one or more. In fact if you're just looking for text you may want to use a range [a-z0-9] with a case insensitive modifier.
If we put it all together we'll have regex like this /:([a-z0-9]+):/gmi
match a string beginning with : with any alphanumeric character one or more times ending in : with the modifiers g globally, m multi-line and i case insensitive for things like :FacePalm:.
Using it in JavaScript we can end up with:
var mytext = 'Hello :smile: and jolly :wave:';
var matches = mytext.match(/:([a-z0-9]+):/gmi);
// matches = [':smile:', ':wave:'];
You'll have an array with each match found.

Allowed Characters Regex (JavaScript)

I'm trying to build a regex which allows the following characters:
A-Z
a-z
1234567890
!##$%&*()_-+={[}]|\:;"'<,>.?/~`
All other characters are invalid. This is the regex I built, but it is not working as I expect it to. I expect the .test() to return false when an invalid character is present:
var string = 'abcd^wyd';
function isValidPassword () {
var regex = /[0-9A-Za-z!##$%&*()_\-+={[}\]|\:;"'<,>.?\/\\~`]+[0-9A-Za-z!##$%&*()_\-+={[}\]|\:;"'<,>.?\/\\~`]*/g
return regex.test(string);
}
In this case, the test is always returning "true", even when "^" is present in the string.
Your regex only checks that at least one of the allowed characters is present. Add start and end anchors to your regex - /^...$/
var string = 'abcd^wyd';
function isValidPassword () {
var regex = /^[0-9A-Za-z!##$%&*()_\-+={[}\]|\:;"'<,>.?\/\\~`]+[0-9A-Za-z!##$%&*()_\-+={[}\]|\:;"'<,>.?\/\\~`]*$/g
return regex.test(string);
}
... another approach, is instead of checking all characters are good, to look for a bad character, which is more efficient as you can stop looking as soon as you find one...
// return true if string does not (`!`) match a character that is not (`^`) in the set...
return !/[^0-9A-Za-z!##$%&*()_\-+={[}\]|\:;"'<,>.?\/\\~`]/.test()
Instead of searching allowed characters search forbidden ones.
var string = 'abcd^wyd';
function regTest (string) {//[^ == not
var regex = /[^0-9A-Za-z!##$%&*()_\-+={[}\]|\:;"'<,>.?\/\\~`]/g
return !regex.test(string);//false if found
}
console.log(regTest(string));
The regex, as you've written is checking for the existence of the characters in the input string, regardless of where it appears.
Instead you need to anchor your regex so that it checks the entire string.
By adding ^ and $, you are instructing your regex to match only the allowed characters for the entire string, rather than any subsection.
function isValidPassword (pwd) {
var regex = /^[0-9A-Za-z!##$%&*()_\-+={[}\]|\:;"'<,>.?\/\\~`]+[0-9A-Za-z!##$%&*()_\-+={[}\]|\:;"'<,>.?\/\\~`]*$/g\;
return regex.test(pwd);
}
alert(isValidPassword('abcd^wyd'));
Your regexp is matching the first part of o=your string i.e. "abcd" so it is true . You need to anchor it to the start (using ^ at the beginning) and the end of the string (using $ at the end) so your regexp should look like:
^[0-9A-Za-z!##$%&*()_\-+={[}\]|\:;"'<,>.?\/\\~`]+[0-9A-Za-z!##$%&*()_\-+={[}\]|\:;"'<,>.?\/\\~`]$
That way it will need to match the entire string.
You can visualize it in the following link:
regexper_diagram
This regex will work.
var str = 'eefdooasdc23432423!##$%&*()_-+={[}]|:;"\'<,>.?/~\`';
var reg = /.|\d|!|#|#|\$|%|&|\*|\(|\)|_|-|\+|=|{|\[|}|]|\||:|;|"|'|<|,|>|\.|\?|\/|~|`/gi;
// test it.
reg.test(str); //true
I use this site to test my regex.
Regex 101

Matching special characters and letters in regex

I am trying to validate a string, that should contain letters numbers and special characters &-._ only. For that I tried with a regular expression.
var pattern = /[a-zA-Z0-9&_\.-]/
var qry = 'abc&*';
if(qry.match(pattern)) {
alert('valid');
}
else{
alert('invalid');
}
While using the above code, the string abc&* is valid. But my requirement is to show this invalid. ie Whenever a character other than a letter, a number or special characters &-._ comes, the string should evaluate as invalid. How can I do that with a regex?
Add them to the allowed characters, but you'll need to escape some of them, such as -]/\
var pattern = /^[a-zA-Z0-9!##$%^&*()_+\-=\[\]{};':"\\|,.<>\/?]*$/
That way you can remove any individual character you want to disallow.
Also, you want to include the start and end of string placemarkers ^ and $
Update:
As elclanrs understood (and the rest of us didn't, initially), the only special characters needing to be allowed in the pattern are &-._
/^[\w&.\-]+$/
[\w] is the same as [a-zA-Z0-9_]
Though the dash doesn't need escaping when it's at the start or end of the list, I prefer to do it in case other characters are added. Additionally, the + means you need at least one of the listed characters. If zero is ok (ie an empty value), then replace it with a * instead:
/^[\w&.\-]*$/
Well, why not just add them to your existing character class?
var pattern = /[a-zA-Z0-9&._-]/
If you need to check whether a string consists of nothing but those characters you have to anchor the expression as well:
var pattern = /^[a-zA-Z0-9&._-]+$/
The added ^ and $ match the beginning and end of the string respectively.
Testing for letters, numbers or underscore can be done with \w which shortens your expression:
var pattern = /^[\w&.-]+$/
As mentioned in the comment from Nathan, if you're not using the results from .match() (it returns an array with what has been matched), it's better to use RegExp.test() which returns a simple boolean:
if (pattern.test(qry)) {
// qry is non-empty and only contains letters, numbers or special characters.
}
Update 2
In case I have misread the question, the below will check if all three separate conditions are met.
if (/[a-zA-Z]/.test(qry) && /[0-9]/.test(qry) && /[&._-]/.test(qry)) {
// qry contains at least one letter, one number and one special character
}
Try this regex:
/^[\w&.-]+$/
Also you can use test.
if ( pattern.test( qry ) ) {
// valid
}
let pattern = /^(?=.*[0-9])(?=.*[!##$%^&*])(?=.*[a-z])(?=.*[A-Z])[a-zA-Z0-9!##$%^&*]{6,16}$/;
//following will give you the result as true(if the password contains Capital, small letter, number and special character) or false based on the string format
let reee =pattern .test("helLo123#"); //true as it contains all the above
I tried a bunch of these but none of them worked for all of my tests. So I found this:
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[^a-zA-Z0-9])(?!.*\s).{8,15}$
from this source: https://www.w3resource.com/javascript/form/password-validation.php
Try this RegEx: Matching special charecters which we use in paragraphs and alphabets
Javascript : /^[a-zA-Z]+(([\'\,\.\-_ \/)(:][a-zA-Z_ ])?[a-zA-Z_ .]*)*$/.test(str)
.test(str) returns boolean value if matched true and not matched false
c# : ^[a-zA-Z]+(([\'\,\.\-_ \/)(:][a-zA-Z_ ])?[a-zA-Z_ .]*)*$
Here you can match with special char:
function containsSpecialChars(str) {
const specialChars = /[`!##$%^&*()_+\-=\[\]{};':"\\|,.<>\/?~]/;
return specialChars.test(str);
}
console.log(containsSpecialChars('hello!')); // 👉️ true
console.log(containsSpecialChars('abc')); // 👉️ false
console.log(containsSpecialChars('one two')); // 👉️ false

Categories