How do you sanitize a string to pass a regex?

How do you sanitize a string to pass a regex? - javascript

I have a regex validation I need my string to pass.
/^[0-9a-zA-Z-]+$/
I want to create a function that sanitizes the string for it to pass the regex.
I thought of doing something like
string.replace(/^[0-9a-zA-Z-]+$/,"");
Except I need to invert the above regex.
I tried to look up how to invert a regex but nothing seems to show up.

Try this string.replace(/\W/g,""). Also check this web site i always use it to test regular expressions, it also has hints on the right bottom

Negate the collection using ^ inside the []
const str = `abc*ç%ABC&(/())12345=?`
const newString = str.replace(/[^0-9a-zA-Z-]/g,"");
console.log(newString)

Related

Replace params in javascript

I tried a lot to replace the query parammeter using Javascript. But its not working. Can you please share any solutions to replace the parameter
Below is the example
console.log("www.test.com?x=a".replace(new RegExp(`${"x=a"}&?`),''));
the output i am getting is www.test.com? . Is there any way to replace ? and to get only www.test.com.

If you want to remove whatever comes from the question mark including it, try this instead:
console.log("www.test.com?x=a".split("?")[0]);
That way you get only what's before the question mark.
I hope that helps you out.

You can remove all query strings using the following regex:
\?(.*)
const url = "www.test.com?x=1&b=2"
console.log(url.replace(/\?(.*)/, ''));

You could brutally replace the '?x=a' string with the JavaScript replace function or, even better, you could split the string in two (based on the index of ?) with the JavaScript split function and take the first part, e.g.:
let str = 'www.test.com?x=a';
console.log(str.replace('?x=a', ''));
console.log(str.split('?')[0]);

Regex - Match only two digitsafter substring

I'm trying to replace all elements like \u00XY in a string that can contain multiple entries like that.
'"\u00bfIdade del titular?"'
This can be a short string or a string containing json objects inside... (I know... but... old code)
I tried normalize after the string but it didn't work, so I got instructed to replace all of those elements in unicode with a '?' char.
Any ideas on a simple way for this purpose? I'm not being able to find the right regex for this.

[0-9]{1,2}[\w]{2}?
Try this or modify it.

I made a small function that replaces all the unicode.
function replace_unicode_escape_sequence($sting) {
//replace all \uxxxx for correct html equvilant
$decoded_string = mb_convert_encoding(pack('H*', $sting), 'UTF-8', 'UCS-2BE');
return $decoded_string;
}
Hope this wil help you!

Split string into array between to characters?

So I'm trying to split string that I have in some not regular way. I tried to do something with regular expressions but I'm not so much into regex to invent something like that.
Basically I have a string that look like this:
var links = "<a>4</a><b><c><d><e><f>";
And I wanna use a .split() js method to recieve them in array like so:
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
So it's obviously that I need to split them on >< characthers but if I do this:
links.split("><");
They are gonna split in the way I wanna but I'm gonna loose the > and < signs. example:
["<a>4</a", "b", "c", "d", "e", "f>"]
This is not good solution for me.
So basically my question is: Is it possible to make some kind solution with regex or something else to get array result as I imagined?

A quick and clean method would be to use match instead of split :
var matches = links.match(/<.+?>(?=<|$)/g)
For your string it gives
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
Be careful that using regular expressions with a non regular language like HTML is dangerous. It's often OK for simple things but it's also limited and you may encounter surprises, for example if you want to apply it to nested elements.

A quick'n dirty method would be to add a delimiter between the ><, with a regex replace, and then to split on that delimiter:
var links = "<a>4</a><b><c><d><e><f>";
links.replace(/></g, '>|<').split('|');
Result:
["<a>4</a>", "<b>", "<c>", "<d>", "<e>", "<f>"]
Just make sure the delimiter you chose doesn't occur in the string itself.
(You can use longer strings as delimiters, like: "|-|".)

What's wrong with this regular expression to find URLs?

I'm working on a JavaScript to extract a URL from a Google search URL, like so:
http://www.google.com/search?client=safari&rls=en&q=thisisthepartiwanttofind.org&ie=UTF-8&oe=UTF-8
Right now, my code looks like this:
var checkForURL = /[\w\d](.org)/i;
var findTheURL = checkForURL.exec(theURL);
I've ran this through a couple regex testers and it seems to work, but in practice the string I get returned looks like this:
thisisthepartiwanttofind.org,.org
So where's that trailing ,.org coming from?
I know my pattern isn't super robust but please don't suggest better patterns to use. I'd really just like advice on what in particular I did wrong with this one. Thanks!

Remove the parentheses in the regex if you do not process the .org (unlikely since it is a literal). As per #Mark comment, add a + to match one or more characters of the class [\w\d]. Also, I would escape the dot:
var checkForURL = /[\w\d]+\.org/i;

What you're actually getting is an array of 2 results, the first being the whole match, the second - the group you defined by using parens (.org).
Compare with:
/([\w\d]+)\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl"]
/[\w\d]+\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org"]
/([\w\d]+)(\.org)/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl", ".org"]
The result of an .exec of a JS regex is an Array of strings, the first being the whole match and the subsequent representing groups that you defined by using parens. If there are no parens in the regex, there will only be one element in this array - the whole match.

You should escape .(DOT) in (.org) regex group or it matches any character. So your regex would become:
/[\w\d]+(\.org)/
To match the url in your example you can use something like this:
https?://([0-9a-zA-Z_.?=&\-]+/?)+
or something more accurate like this (you should choose the right regex according to your needs):
^https?://([0-9a-zA-Z_\-]+\.)+(com|org|net|WhatEverYouWant)(/[0-9a-zA-Z_\-?=&.]+)$

Regex to match part of a string

Regex fun again...
Take for example http://something.com/en/page
I want to test for an exact match on /en/ including the forward slashes, otherwise it could match 'en' from other parts of the string.
I'm sure this is easy, for someone other than me!
EDIT:
I'm using it for a string.match() in javascript

Well it really depends on what programming language will be executing the regex, but the actual regex is simply
/en/
For .Net the following code works properly:
string url = "http://something.com/en/page";
bool MatchFound = Regex.Match(url, "/en/").Success;
Here is the JavaScript version:
var url = 'http://something.com/en/page';
if (url.match(/\/en\//)) {
alert('match found');
}
else {
alert('no match');
}
DUH
Thank you to Welbog and Chris Ballance to making what should have been the most obvious point. This does not require Regular Expressions to solve. It simply is a contains statement. Regex should only be used where it is needed and that should have been my first consideration and not the last.

If you're trying to match /en/ specifically, you don't need a regular expression at all. Just use your language's equivalent of contains to test for that substring.
If you're trying to match any two-letter part of the URL between two slashes, you need an expression like this:
/../
If you want to capture the two-letter code, enclose the periods in parentheses:
/(..)/
Depending on your language, you may need to escape the slashes:
\/..\/
\/(..)\/
And if you want to make sure you match letters instead of any character (including numbers and symbols), you might want to use an expression like this instead:
/[a-z]{2}/
Which will be recognized by most regex variations.
Again, you can escape the slashes and add a capturing group this way:
\/([a-z]{2})\/
And if you don't need to escape them:
/([a-z]{2})/
This expression will match any string in the form /xy/ where x and y are letters. So it will match /en/, /fr/, /de/, etc.
In JavaScript, you'll need the escaped version: \/([a-z]{2})\/.

You may need to escape the forward-slashes...
/\/en\//

Any reason /en/ would not work?

/\/en\// or perhaps /http\w*:\/\/[^\/]*\/en\//

You don't need a regex for this:
location.pathname.substr(0, 4) === "/en/"
Of course, if you insist on using a regex, use this:
/^\/en\//.test(location.pathname)

We Keep Coding

JavaScript is the programming language of the Web.

How do you sanitize a string to pass a regex? - javascript

Try this string.replace(/\W/g,""). Also check this web site i always use it to test regular expressions, it also has hints on the right bottom

Negate the collection using ^ inside the [] const str = `abc*ç%ABC&(/())12345=?` const newString = str.replace(/[^0-9a-zA-Z-]/g,""); console.log(newString)

Related

Replace params in javascript

Regex - Match only two digitsafter substring

Split string into array between to characters?

What's wrong with this regular expression to find URLs?

Regex to match part of a string

Categories

Resources