regex to select all URL by certain pattern - javascript

I'm trying to use regex for selecting all characters and words between the two ("...") in all tags, by certain pattern for example select which starts from /desktop/content.
I'm sure this is fairly simple but couldn't make it on my own, can someone help?
Example:
<img src="/desktop/content/img/illustrations/small-flower2.svg" width="138"/>
selected part should be: /desktop/content/img/illustrations/small-flower2.svg

you mean a regex like /"someQuotedString([^"]*)"/gm ?
var str = '<img src="/desktop/content/img/illustrations/small-flower2.svg" width="138"/>';
console.dir(str.match(/"\/desktop\/content([^"]*)"/gm));
console.log(str.match(/"\/desktop\/content([^"]*)"/gm)[0]);
https://regex101.com/r/ahjdCZ/1
...if you really want to make sure it's an <img... tag you could also:
/(?!<img.*)"([^"]+)"/
or within any < > tag:
/<.*"(\/desktop\/content[^"]+)".*>/

Related

How can I select or capture multiple HTML tags using regex? [duplicate]

This question already has answers here:
RegEx match open tags except XHTML self-contained tags
(35 answers)
Closed 2 years ago.
To get a certain HTML tag and its contents from a html document I am using regex -
html -
<div id="abc">content</div>
<a class="anchorclass">content</a>
<table id="table1">content</table>
<div id="div2">content</div>
<a class="anchorclass2">content</a>
<div class="divclass">content</div>
regex
/<div id="abc"[\s\S]*?<\/div>/
which returns precisely this particular div contents (div with id="abc").
I want to capture multiple html elements from above with a single regex expression, how can I do that ? is there a way to concatenate conditions or is there any "and" operator to select multiple html tags ?
I want to write a single regex expression which selects -
div with id="abc"
a with class="anchorclass"
div with class="divclass"
from above html, how would that be ?
HTML tags with attributes, here is my solution to deal with that:
// <TAG(.*?)>(.*?)</TAG>
// Example
var regex = new System.Text.RegularExpressions.Regex("<h1(.*?)>(.*?)</h1>");
var m = regex.Match("Hello <h1 style='color: red;'>World</h1> !!");
Console.Write(m.Groups[2].Value); // will print -> World
If you already know the class/id you can use the following:
yourHtml = '<html><bod><a id="yourID">Some more text</a><div id="yourID">Some text here</div></body></html>'
regex = /<(?:div|a) id="(?:abc|anchorclass|divclass)">(.*?)<\/(?:div|a)>/g
while((result = regex.exec(yourHtml)) !== null) {
console.log(result[1]);
}
(?:div|a) matches divs and links. Just add anything you like.
But I would not recommend using regex for this! It is way to error-prone and no fun at all when debugging. Instead I would propose parsing the html as a whole an then search it.
yourHtml = '<html><bod><div id="yourID">Some text here</div></body></html>'
parser = new DOMParser();
parsedHtml = parser.parseFromString(yourHtml,"text/html");
console.log(parsedHtml.getElementById("yourID").innerText)
This way you will be able to use any of the standard js functions on parsedHtml without creating a new regex each time. This is a way more elegant solution.

Replace anything in certain position in string using javascript regular expression

I am trying to remove empty all paragraph tags, regardless of what style attributes might be in the p tag, from a string. I want to remove, for example, all of these and replace with an empty string.
<p style="margin-left:0px"></p>
<p></p>
<p style="margin-left:1cm; margin-right:1cm"></p>
So far, to deal with one situation I have, I am doing this:
str = str.replace(/<p style=\"margin:0cm 0cm 10pt\"><\/p>/g,'')
which is working in that particular situation. How can I write it so it removes
<p AnythingHereInThisTag></p>
and replaces it with an empty string?
Edit - further to answer below - if I do this:
str = str.replace(/<p(.*)><\/p>/g,'')
it is replacing the whole string which might look like
<p>Hello</p><p>Some text in the middle</p><p>Goodbye</p>
It needs to look at each pair of tags
Replace Any charecter without a char " has the regex as [^\"]
var reg=/\<p( [a-zA-Z]*=((\"[^\"]*\")|(\'[^\']*\')))*\>\<\/p\>/g;
console.log('<p style=\'margin:0cm 0cm 10pt\'></p>'.replace(reg,''));
console.log('<p style=\"margin:0cm 0cm 10pt\"></p>'.replace(reg,''));
console.log('<p style=\"margin:0cm 0cm 10pt\" class=\"test\"></p>'.replace(reg,''));
console.log('<p></p>'.replace(reg,''));
Something like this?
str = 'asd<p style="margin-left:0px"></p><p></p><p style="margin-left:1cm; margin-right:1cm"></p>'
str.replace(/<p(.*)><\/p>/g,'') // "asd"
Reading the question again, it is unclear if you wanted to remove only the attributes within the tag, or the tag completely. Please clarify.
You can read more about regular expression here.

How to inject JavaScript into an internet forum that parses URLs

Hypothetical scenario
There's an interenet forum where typing a regular expression matched as a URL, such as
http://somesite.com
will become be made into a link in the HTML of the forum post once it is submitted. So the above link becomes
http://somesite.com
If possible, I want to exploit this to get JavaScript into the href of the an a tag, i.e.
href="javascript:(function(){alert('Yo, dawg');}())"
The key is that I somehow need to get the expression
javascript:(function(){alert('Yo, dawg');}())
to do the equivalent, but to be recognized by the parser as a URL. How I can do that? Is there some way of doing it with escape characters, unicode or something else?
Not sure if this JS Fiddle is what you are asking for.
HTML:
<div id="test">
http://www.example.com<br>
http://google.com<br>
google<br>
object.property<br>
http://www.ex-am-ple.com<br>
www.test<br>
http://text
</div>
JS:
var div = document.getElementById("test");
var divContent = div.innerHTML;
div.innerHTML = divContent.replace(/http:\/\/(.*)\.(\w+)/g, '$1.$2');

Javascript regex not working as intended

I have the HTML from a page in a variable as just plain text. Now I need to remove some parts of the text. This is a part of the HTML that I need to change:
<div class="post"><a name="6188729"></a>
<div class="igmline small" style="height: 20px; padding-top: 1px;">
<span class="postheader_left">
RuneRifle
op 24.08.2012 om 21:41 uur
</span>
<span class="postheader_right">
Citaat Bewerken
</span>
<div style="clear:both;"></div>
</div>
<div class="text">Testforum</div>
<!-- Begin Thank -->
<!-- Thank End -->
</div>
These replaces work:
pageData = pageData.replace(/href=\".*?\"/g, "href=\"#\"");
pageData = pageData.replace(/target=\".*?\"/g, "");
But this replace does not work at all:
pageData = pageData.replace(
/<span class=\"postheader_right\">(.*?)<\/span>/g, "");
I need to remove every span with the class postheader_right and everything in it, but it just doesn't work. My knowledge of regex isn't that great so I'd appreciate if you would tell me how you came to your answer and a small explanation of how it works.
The dot doesn't match newlines. Use [\s\S] instead of the dot as it will match all whitespace characters or non-whitespace characters (i.e., anything).
As Mike Samuel says regular expressions are not really the best way to go given the complexity allowed in HTML (e.g., if say there is a line break after <a), especially if you have to look for attributes which may occur in different orders, but that's the way you can do it to match the case in your example HTML.
I need to remove every span with the class postheader_right and everything in it, but it just doesn't work.
Don't use regular expressions to find the spans. Using regular expressions to parse HTML: why not?
var allSpans = document.getElementsByClassName('span');
for (var i = allSpans.length; --i >= 0;) {
var span = allSpans[i];
if (/\bpostheader_right\b/.test(span.className)) {
span.parentNode.removeChild(span);
}
}
should do it.
If you only need to work on newer browsers then getElementsByClassName makes it even easier:
Find all div elements that have a class of 'test'
var tests = Array.filter( document.getElementsByClassName('test'), function(elem){
return elem.nodeName == 'DIV';
});

How to check what is after hyphen with jQuery's "Attribute Contains Prefix Selector"

I've got some html like:
<a class="link" href="#>link</a>
<a class="link-0" href="#>link</a>
<a class="link-1 enabled" href="#>link</a>
<a class="link-2" href="#>link</a>
I can select all of those links by:
$('[class|="link"]');
but I find it very difficult to check what is after hyphen, I think about getting classes by attr('class') splitting with split(' ') and checking each class, if it starts with "link" and splitting again with split('-').
Anyone knows better way to do this?
You can use a regular expression:
var matches = element.attr("class").match(/^link-?(\d*)$/);
var whichLink = matches[1];
Interesting that other variations of this selector doesn't match it - perhaps a bug with jquery with hyphens.
$('a[class^="link-"]')
Turns out others work to, just not "word selectors"
$('a[class*="link-"]')

Categories