Extracting regex group in javascript not working - javascript

I am trying to extract img's src using javascript's regexp yet I am not able to use the groups. I have came up with something like this so far :
>str
"xxxxx src="http://www.omgubuntu.co.uk/wp-content/uploads/2013/12/Multim.jpg" xxxxx"
>str.match(re)
["src="http://www.omgubuntu.co.uk/wp-content/uploads/2013/12/Multim.jpg""]
>re
/src=\"(.*?)\"/g
>str.match(re)[1]
undefined
Yet I only get the match for the whole pattern

Unsure why you'd want to do this with regex;
Simply use:
document.getElementById('idOfImgElement').src;
Or easier, with jQuery:
$('#idOfImgElement').attr(src);
But if you really want to do this with regex, use:
var str = "<img src=\"http://www.omgubuntu.co.uk/wp-content/uploads/2013/12/Multim.jpg\" />";
var matches = (/src=\"(.*?)\"/g).exec(str);
window.print(matches[1]);

How many possible img tags might be in your string? Assuming 1 then forget the global flag (and iterating through an exec) and simply make a regex that explains the whole string and use capture groups. Then you can specify, in your index of match, to return the capture group you know will represent it. Since, in these types of questions, there always seems to be a caveat not mentioned I made the expression more tight against some possibilities like other attributes in the element. So just in case it is more complex than you let on you can use this regex:
(?:<img )?[^>]+src=(["'])(.*)\1
The quote capture group is necessary so that you match up the quotes. You show double quotes but is that guaranteed? in this regexThe second capture group is always going to be the URL (contents of src to be more precise).
In code:
var str = '<img src="http://www.omgubuntu.co.uk/wp-content/uploads/2013/12/Multim.jpg" />';
var src = str.match(/(?:<img )?[^>]+src=(["'])(.*)\1/)[2]

I have achieved this with regexp without gm flags
var re_imgsrc = /.*src=\"(.*?)\".*/
imageUrl = str.replace(re_imgsrc, "$1");

Related

Extract inner text from anchor tag string using a regular expression in JavaScript

I am new to angular js . I have regex which gets all the anchor tags. My reg ex is
/<a[^>]*>([^<]+)<\/a>/g
And I am using the match function here like ,
var str = 'abc.jagadale#gmail.com'
So Now I am using the code like
var value = str.match(/<a[^>]*>([^<]+)<\/a>/g);
So, Here I am expecting the output to be abc.jagadale#gmail.com , But I am getting the exact same string as a input string . can any one please help me with this ? Thanks in advance.
Why are you trying to reinvent the wheel?
You are trying to parse the HTML string with a regex it will be a very complicated task, just use DOM or jQuery to get the links contents, they are made for this.
Put the HTML string as the HTML of a jQuery/DOM element.
Then fetch this created DOM element to get all the a elements
inside it and return their contents in an array.
This is how should be your code:
var str = 'abc.jagadale#gmail.com';
var results = [];
$("<div></div>").html(str).find("a").each(function(l) {
results.push($(this).text());
});
Demo:
var str = 'abc.jagadale#gmail.com';
var results = [];
$("<div></div>").html(str).find("a").each(function(l) {
results.push($(this).text());
});
console.log(results);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
You need to capture the group inside the anchor tags. The regular expression already matches the inner group ([^<]+) But, when matching there are different ways to extract that inner text.
When using the Match function it will return an array of matched elements, the first one, will match the whole regular expression and the following elements will match the included groups in the regular expression.
Try this:
var reg = /<a[^>]*>([^<]+)<\/a>/g
reg.exec(str)[1]
Also the match function will return an array only if the g flag is not present.
Check https://javascript.info/regexp-groups for further documentation.
Brief
Don't use regex for this. Regex is a great tool, don't get me wrong, but it's not what you're looking for. Regex cannot properly parse HTML and should only be used to do so if it's a limited, known set of HTML.
Try, for example, adding content:">" to your style attribute. You'll see your pattern now fails or gives you an incorrect result. I don't like to use this quote all the time, but I think it's necessary to use it in this case:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
Use builtin functions. jQuery makes this super easy to accomplish. See my Code section for a demonstration. It's way more legible than any regex variant.
Code
DOM from page
The following snippet gets all anchors on the actual page.
$("a").each(function() {
console.log($(this).text())
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
abc.jagadale#gmail.com
abc2.jagadale#gmail.com
DOM in string
The following snippet gets all anchors in the string (converted to DOM element)
var s = `email3#domain.com
email4#domain.com`
$("<div></div>").html(s).find("a").each(function() {
console.log($(this).text())
})
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
email1#domain.com
email2#domain.com
Given the use case of parsing a string, instead of having an actual DOM to work with, it does seem like regex is the way to go, unless you want to load the HTML into a document fragment and parse that.
One way to get all of your matches is to make use of split:
var htmlstr = "<p><a href='url'>asdf#bsdf.com</a></p>"
var matches = htmlstr.split(/<a.+?>([A-Za-z.#]+)<\/a>/).filter((t, i) => i % 2)
Using a regex with split returns all of the matches along with the text around them, then filtering by index % 2 will pare it down to just the regex matches.

javascript replace with regexp has strange behaviour

Maybe someone can give me a hint...
I have the following code and experience a strange behaviour in javascript (node.js):
var a = "img{http://my.image.com/imgae.jpg} img{http://my.image.com/imgae.jpg}"
var html = a.replace(/img\{(.*)\}/g, '<img src="$1" class="image">');
//result: <img src="http://my.image.com/imgae.jpg" class="image"">
As you can see, the occurrence in the string (a markup thing) is replaced by an img tag with source as expected.
But now something strange. In the markup are probably several elements of type img{src}
var a = "img{http://my.image.com/imgae.jpg} some text between img{http://my.image.com/imgae.jpg}"
var html = a.replace(/img\{(.*)\}/g, '<img src="$1" class="image">');
//result: <img src="http://my.image.com/imgae.jpghttp://my.image.com/imgae.jpg" class="image"">
The result is strange. in $1 all matches are stored and accumulated... And there is only one image tag.
I am confused...
Try: a.replace(/img\{(.*?)\}/g, '<img src="$1" class="image">');
I found out about adding ? makes regex non-greedy here
Use this to stop at the first closing curly bracket.
var html = a.replace(/img{([^}]*)}/g, '<img src="$1" class="image">');
I think it's probably more important that you understand how this is working. .* can be a dangerous regular expression if you don't understand what it will do because it is greedy and will consume as much as it can, and some linters will warn against it.
So if you break down your regex you will find that the img\{ part matches the first part of the string (.*) matches http://my.image.com/imgae.jpg} some text between img{http://my.image.com/imgae.jpg and the final } matches the closing } because this is the largest string that matches the expression.
The best solution is to use ([^}]*), which matches anything except } because you know that anything between the image {} will will not be a closing brace.
You can test your regex to see what it is matching:
var reg = /img\{(.*)\}/g
var a = "img{http://my.image.com/imgae.jpg} img{http://my.image.com/imgae.jpg}"
var groups = a.match(reg)
// we can see what the first group matched
// groups[0] === "http://my.image.com/imgae.jpg} img{http://my.image.com/imgae.jpg"

How to extract a particular text from url in JavaScript

I have a url like http://www.somedotcom.com/all/~childrens-day/pr?sid=all.
I want to extract childrens-day. How to get that? Right now I am doing it like this
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
url.match('~.+\/');
But what I am getting is ["~childrens-day/"].
Is there a (definitely there would be) short and sweet way to get the above text without ["~ and /"] i.e just childrens-day.
Thanks
You could use a negated character class and a capture group ( ) and refer to capture group #1. The caret (^) inside of a character class [ ] is considered the negation operator.
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
var result = url.match(/~([^~]+)\//);
console.log(result[1]); // "childrens-day"
See Working demo
Note: If you have many url's inside of a string you may want to add the ? quantifier for a non greedy match.
var result = url.match(/~([^~]+?)\//);
Like so:
var url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all"
var matches = url.match(/~(.+?)\//);
console.log(matches[1]);
Working example: http://regex101.com/r/xU4nZ6
Note that your regular expression wasn't actually properly delimited either, not sure how you got the result you did.
Use non-capturing groups with a captured group then access the [1] element of the matches array:
(?:~)(.+)(?:/)
Keep in mind that you will need to escape your / if using it also as your RegEx delimiter.
Yes, it is.
url = "http://www.somedotcom.com/all/~childrens-day/pr?sid=all";
url.match('~(.+)\/')[1];
Just wrap what you need into parenteses group. No more modifications into your code is needed.
References: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp
You could just do a string replace.
url.replace('~', '');
url.replace('/', '');
http://www.w3schools.com/jsref/jsref_replace.asp

Javascript regex expression to replace multiple strings?

I've a string done like this: "http://something.org/dom/My_happy_dog_%28is%29cool!"
How can I remove all the initial domain, the multiple underscore and the percentage stuff?
For now I'm just doing some multiple replace, like
str = str.replace("http://something.org/dom/","");
str = str.replace("_%28"," ");
and go on, but it's really ugly.. any help?
Thanks!
EDIT:
the exact input would be "My happy dog is cool!" so I would like to get rid of the initial address and remove the underscores and percentage and put the spaces in the right place!
The problem is that trying to put a regex on Chrome "something goes wrong". Is it a problem of Chrome or my regex?
I'd suggest:
var str = "http://something.org/dom/My_happy_dog_%28is%29cool!";
str.substring(str.lastIndexOf('/')+1).replace(/(_)|(%\d{2,})/g,' ');
JS Fiddle demo.
The reason I took this approach is that RegEx is fairly expensive, and is often tricky to fine tune to the point where edge-cases become less troublesome; so I opted to use simple string manipulation to reduce the RegEx work.
Effectively the above creates a substring of the given str variable, from the index point of the lastIndexOf('/') (which does exactly what you'd expect) and adding 1 to that so the substring is from the point after the / not before it.
The regex: (_) matches the underscores, the | just serves as an or operator and the (%\d{2,}) serves to match digit characters that occur twice in succession and follow a % sign.
The parentheses surrounding each part of the regex around the |, serve to identify matching groups, which are used to identify what parts should be replaced by the ' ' (single-space) string in the second of the arguments passed to replace().
References:
lastIndexOf().
replace().
substring().
You can use unescape to decode the percentages:
str = unescape("http://something.org/dom/My_happy_dog_%28is%29cool!")
str = str.replace("http://something.org/dom/","");
Maybe you could use a regular expression to pull out what you need, rather than getting rid of what you don't want. What is it you are trying to keep?
You can also chain them together as in:
str.replace("http://something.org/dom/", "").replace("something else", "");
You haven't defined the problem very exactly. To get rid of all stretches of characters ending in %<digit><digit> you'd say
var re = /.*%\d\d/g;
var str = str.replace(re, "");
ok, if you want to replace all that stuff I think that you would need something like this:
/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g
test
var string = "http://something.org/dom/My_happy_dog_%28is%29cool!";
string = string.replace(/(http:\/\/.*\.[a-z]{3}\/.*\/)|(\%[a-z0-9][a-z0-9])|_/g,"");

Javascript Regex after specific string

I have several Javascript strings (using jQuery). All of them follow the same pattern, starting with 'ajax-', and ending with a name. For instance 'ajax-first', 'ajax-last', 'ajax-email', etc.
How can I make a regex to only grab the string after 'ajax-'?
So instead of 'ajax-email', I want just 'email'.
You don't need RegEx for this. If your prefix is always "ajax-" then you just can do this:
var name = string.substring(5);
Given a comment you made on another user's post, try the following:
var $li = jQuery(this).parents('li').get(0);
var ajaxName = $li.className.match(/(?:^|\s)ajax-(.*?)(?:$|\s)/)[1];
Demo can be found here
Below kept for reference only
var ajaxName = 'ajax-first'.match(/(\w+)$/)[0];
alert(ajaxName);
Use the \w (word) pattern and bind it to the end of the string. This will force a grab of everything past the last hyphen (assuming the value consists of only [upper/lower]case letters, numbers or an underscore).
The non-regex approach could also use the String.split method, coupled with Array.pop.
var parts = 'ajax-first'.split('-');
var ajaxName = parts.pop();
alert(ajaxName);
you can try to replace ajax- with ""
I like the split method #Brad Christie mentions, but I would just do
function getLastPart(str,delimiter) {
return str.split(delimiter)[1];
}
This works if you will always have only two-part strings separated by a hyphen. If you wanted to generalize it for any particular piece of a multiple-hyphenated string, you would need to write a more involved function that included an index, but then you'd have to check for out of bounds errors, etc.

Categories