Regex to match part of a string

Regex to match part of a string - javascript

Regex fun again...
Take for example http://something.com/en/page
I want to test for an exact match on /en/ including the forward slashes, otherwise it could match 'en' from other parts of the string.
I'm sure this is easy, for someone other than me!
EDIT:
I'm using it for a string.match() in javascript

Well it really depends on what programming language will be executing the regex, but the actual regex is simply
/en/
For .Net the following code works properly:
string url = "http://something.com/en/page";
bool MatchFound = Regex.Match(url, "/en/").Success;
Here is the JavaScript version:
var url = 'http://something.com/en/page';
if (url.match(/\/en\//)) {
alert('match found');
}
else {
alert('no match');
}
DUH
Thank you to Welbog and Chris Ballance to making what should have been the most obvious point. This does not require Regular Expressions to solve. It simply is a contains statement. Regex should only be used where it is needed and that should have been my first consideration and not the last.

If you're trying to match /en/ specifically, you don't need a regular expression at all. Just use your language's equivalent of contains to test for that substring.
If you're trying to match any two-letter part of the URL between two slashes, you need an expression like this:
/../
If you want to capture the two-letter code, enclose the periods in parentheses:
/(..)/
Depending on your language, you may need to escape the slashes:
\/..\/
\/(..)\/
And if you want to make sure you match letters instead of any character (including numbers and symbols), you might want to use an expression like this instead:
/[a-z]{2}/
Which will be recognized by most regex variations.
Again, you can escape the slashes and add a capturing group this way:
\/([a-z]{2})\/
And if you don't need to escape them:
/([a-z]{2})/
This expression will match any string in the form /xy/ where x and y are letters. So it will match /en/, /fr/, /de/, etc.
In JavaScript, you'll need the escaped version: \/([a-z]{2})\/.

You may need to escape the forward-slashes...
/\/en\//

Any reason /en/ would not work?

/\/en\// or perhaps /http\w*:\/\/[^\/]*\/en\//

You don't need a regex for this:
location.pathname.substr(0, 4) === "/en/"
Of course, if you insist on using a regex, use this:
/^\/en\//.test(location.pathname)

Related

regex replace all backward slashes before '\?'

I have a string such as
'frontend\less\defaults\layout.css?file=\foo'
I want a regex that replaces it with
'frontend/less/defaults/layout.css?file=\foo'
I tried /\\/g, but it keeps matching stuff after a \?, which I want to avoid somehow

Following will work, use a lookahead in your regexp:
var myString="path\\to\\file.php?query=\\something"
var r=(/\?/g.test(myString))?/(\\)(?=.+[\?])/ig:/\\/ig;
.replace(r,"/")

You can do this with String.replace, with a replacement function:
str.replace(/^([^?]*)/, function (_, $1) {
return $1.replace(/\\/g, '/');
});
This will work regardless of whether the query string exists or not.
Explanation
/^([^?]*)/
([^?]*) will match and capture everything before ? (if any).
I assume the URL is valid, so there is no validation done here.
(Thanks to #Pumbaa80 for the suggestion. There is no need to match the query string part if it is going to stay the same after the replacement)

Unless you know the number of \'s in advance, I doubt you can do this with a comprehensible regex. I would:
split the string in two parts: the part before the ?, and after it
use your regex on the first part
put the two strings back together.

What's wrong with this regular expression to find URLs?

I'm working on a JavaScript to extract a URL from a Google search URL, like so:
http://www.google.com/search?client=safari&rls=en&q=thisisthepartiwanttofind.org&ie=UTF-8&oe=UTF-8
Right now, my code looks like this:
var checkForURL = /[\w\d](.org)/i;
var findTheURL = checkForURL.exec(theURL);
I've ran this through a couple regex testers and it seems to work, but in practice the string I get returned looks like this:
thisisthepartiwanttofind.org,.org
So where's that trailing ,.org coming from?
I know my pattern isn't super robust but please don't suggest better patterns to use. I'd really just like advice on what in particular I did wrong with this one. Thanks!

Remove the parentheses in the regex if you do not process the .org (unlikely since it is a literal). As per #Mark comment, add a + to match one or more characters of the class [\w\d]. Also, I would escape the dot:
var checkForURL = /[\w\d]+\.org/i;

What you're actually getting is an array of 2 results, the first being the whole match, the second - the group you defined by using parens (.org).
Compare with:
/([\w\d]+)\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl"]
/[\w\d]+\.org/.exec('thisistheurl.org')
→ ["thisistheurl.org"]
/([\w\d]+)(\.org)/.exec('thisistheurl.org')
→ ["thisistheurl.org", "thisistheurl", ".org"]
The result of an .exec of a JS regex is an Array of strings, the first being the whole match and the subsequent representing groups that you defined by using parens. If there are no parens in the regex, there will only be one element in this array - the whole match.

You should escape .(DOT) in (.org) regex group or it matches any character. So your regex would become:
/[\w\d]+(\.org)/
To match the url in your example you can use something like this:
https?://([0-9a-zA-Z_.?=&\-]+/?)+
or something more accurate like this (you should choose the right regex according to your needs):
^https?://([0-9a-zA-Z_\-]+\.)+(com|org|net|WhatEverYouWant)(/[0-9a-zA-Z_\-?=&.]+)$

JavaScript negative lookbehind issue

I've got some JavaScript that looks for Amazon ASINs within an Amazon link, for example
http://www.amazon.com/dp/B00137QS28
For this I use the following regex: /([A-Z0-9]{10})
However, I don't want it to match artist links which look like:
http://www.amazon.com/Artist-Name/e/B000AQ1JZO
So I need to exclude any links where there's a '/e' before the slash and the 10-character alphanumeric code. I thought the following would do that: (?<!/e)([A-Z0-9]{10}), but it turns out negative lookbehinds don't work in JavaScript. Is that right? Is there another way to do this instead?
Any help would be much appreciated!
As a side note, be aware there are plenty of Amazon link formats, which is why I want to blacklist rather than whitelist, eg, these are all the same page:
http://www.amazon.com/gp/product/B00137QS28/
http://www.amazon.com/dp/B00137QS28
http://www.amazon.com/exec/obidos/ASIN/B00137QS28/
http://www.amazon.com/Product-Title-Goes-Here/dp/B00137QS28/

In your case an expression like this would work:
/(?!\/e)..\/([A-Z0-9]{10})/

([A-Z0-9]{10}) will work equally well on the reverse of its input, so you can
reverse the string,
use positive lookahead,
reverse it back.

You need to use a lookahead to filter the /e/* ones out. Then trim the leading /e/ from each of the matches.
var source; // the source you're matching against the RegExp
var matches = source.match(/(?!\/e)..\/[A-Z0-9]{10}/g) || [];
var ids = matches.map(function (match) {
return match.substr(3);
});

JS/Jquery, Match not finding the PNG = match('/gif|jpg|jpeg|png/')

I have the following code which I use to match fancybox possible elements:
$('a.grouped_elements').each(function(){
var elem = $(this);
// Convert everything to lower case to match smart
if(elem.attr('href').toLowerCase().match('/gif|jpg|jpeg|png/') != null) {
elem.fancybox();
}
});
It works great with JPGs but it isn't matching PNGs for some reason. Anyone see a bug with the code?
Thanks

A couple of things.
Match accepts an object of RegExp, not a string. It may work in some browsers, but is definitely not standard.
"gif".match('/gif|png|jpg/'); // null
Without the strings
"gif".match(/gif|png|jpg/); // ["gif"]
Also, you would want to check these at the end of a filename, instead of anywhere in the string.
"isthisagif.nope".match(/(gif|png|jpg|jpeg)/); // ["gif", "gif"]
Only searching at the end of string with $ suffix
"isthisagif.nope".match(/(gif|png|jpg|jpeg)$/); // null
No need to make href lowercase, just do a case insensitive search /i.
Look for a dot before the image extension as an additional check.
And some tests. I don't know how you got any results back with using a string argument to .match. What browser are you on?

I guess the fact that it'll match anywhere in the string (it would match "http://www.giftshop.com/" for instance) could be considered a bug. I'd use
/\.(gif|jpe?g|png)$/i

You are passing a string to the match() function rather than a regular expression. In JavaScript, strings are delimited with single quotes, and regular expressions are delimited with forward slashes. If you use both, you have a string, not a regex.

This worked perfectly for me: /.+\.(gif|png|jpe?g)$/i
.+ -> any string
\. -> followed by a point.
(gif|png|jpe?g) -> and then followed by any of these extensions. jpeg may or may not have the letter e.
$ -> now the end of the string it's expected
/i -> case insensitive mode: matches both sflkj.JPG and lkjfsl.jpg

Javascript regular expression to strip out content between double quotes

I'm looking for a javascript regex that will remove all content wrapped in quotes(and the qoutes too), in a string that is the outlook format for listing email addresses. Take a look at the sample below, I am a regex tard and really need some help with this one, any help/resources would be appreciated!
"Bill'sRestauraunt"BillsRestauraunt#comcast.net,"Rob&Julie"robjules#ntelos.net,"Foo&Bar"foobar#cstone.net

Assuming no nested quotes:
mystring.replace(/"[^"]*"/g, '')

Try this regular expression:
/(?:"(?:[^"\\]+|\\(?:\\\\)*.)*"|'(?:[^'\\]+|\\(?:\\\\)*.)*')/g

Here's a regex I use to find and decompose the quoted strings within a paragraph. It also isolates several attendant tokens, especially adjacent whitespace. You can string together whichever parts you want.
var re = new RegExp(/([^\s\(]?)"(\s*)([^\\]*?(\\.[^\\]*)*)(\s*)("|\n\n)([^\s\)\.\,;]?)/g);

We Keep Coding

JavaScript is the programming language of the Web.

Regex to match part of a string - javascript

You may need to escape the forward-slashes... /\/en\//

Any reason /en/ would not work?

/\/en\// or perhaps /http\w:\/\/[^\/]\/en\//

You don't need a regex for this: location.pathname.substr(0, 4) === "/en/" Of course, if you insist on using a regex, use this: /^\/en\//.test(location.pathname)

Related

regex replace all backward slashes before '\?'

What's wrong with this regular expression to find URLs?

JavaScript negative lookbehind issue

JS/Jquery, Match not finding the PNG = match('/gif|jpg|jpeg|png/')

Javascript regular expression to strip out content between double quotes

Categories

Resources