I have to programmatically remove all iframes form a string that contais raw html base on the source of that iframe.
Eg: "... "
This is in a javascript string.
I thought about using String.replace based on some regex that excludes or some other approach.
Just need an example or some ideas, hope someone can help me because I am stuck with this and can't think about a propper solution.
Eg:
const regex = /Some weird regex to select the iframe piece/g
//Which I don't know how to write
// Happy to use something that removes all iFrames from the code as a quickfix regardless of the url
const raw_html = "\<html\> ... \<iframe title="title" src="prohibited_url" width="100%" height="500" frameborder="0"\>\</iframe\> ... \</html\>"
raw_html.replace(regex, '')
//This would remove whatever was on that regex with an empty piece of string.
If anyone has any ideas, they are welcome.
Thanks in advance
I've tried something like the example above, but couldn't write a working regex expression
Try this ↓↓↓
let html = '<html\> ... \<iframe title="title" src="prohibited_url" width="100%" height="500" frameborder="0"\>\</iframe\> <h2>Just another element to test the regex</h2> <p1>hello</p1> ... \</html\>';
let regex = /\n*\s*<iframe.*?\\?>.*?<\/iframe\\?>\s*\n*/gi;
html = html.replace(regex, '');
console.log(html);
If you will ever need to remove a different element, you can use this function:
function removeElementFromHtmlString(element, htmlStr) {
return htmlStr.replace(new RegExp(`\\s*<${element}.*>(?:.*<\\/${element}>)?`, 'g'), '');
}
let html = '<html\> ... \<iframe title="title" src="prohibited_url" width="100%" height="500" frameborder="0"\>\</iframe\> <h2>Just another element to test the regex</h2> <p1>hello</p1> ... \</html\>';
console.log(removeElementFromHtmlString('h2', html));
console.log(removeElementFromHtmlString('p1', html));
Related
I'm dealing with a strange formatting bug I can't figure out. With the help of some others, I have a small function that replaces all the #[user] in a string with a link to the actual twitter user. Its working pretty well, but once I add it to my webpage it starts to freak out a little bit.
Heres an example of the string.
If you want these foams Ima give you the chrome #WillThaRapper
However, once I append it to a <p> element to display on the page, it looks like this.
If you want these foams Ima give you the chrome <a href="https://twitter.com/WillThaRapper" target="_blank">#WillThaRapper</a>
I'm having a lot of trouble figuring this one out.
The issue seems to be that you're adding a string of HTML entities to your HTML. To add actual HTML you can use a DOMParser to convert the entities to HTML which can be rendered on the DOM:
const decodeEntities = str => {
const doc = new DOMParser().parseFromString(str, "text/html");
return doc.documentElement.textContent;
}
// input = the text you want to append to the page
const input = "If you want these foams Ima give you the chrome <a href="https://twitter.com/WillThaRapper" target="_blank">#WillThaRapper</a>";
document.getElementById("output").innerHTML += decodeEntities(input);
<p id="output"></p>
Hypothetical scenario
There's an interenet forum where typing a regular expression matched as a URL, such as
http://somesite.com
will become be made into a link in the HTML of the forum post once it is submitted. So the above link becomes
http://somesite.com
If possible, I want to exploit this to get JavaScript into the href of the an a tag, i.e.
href="javascript:(function(){alert('Yo, dawg');}())"
The key is that I somehow need to get the expression
javascript:(function(){alert('Yo, dawg');}())
to do the equivalent, but to be recognized by the parser as a URL. How I can do that? Is there some way of doing it with escape characters, unicode or something else?
Not sure if this JS Fiddle is what you are asking for.
HTML:
<div id="test">
http://www.example.com<br>
http://google.com<br>
google<br>
object.property<br>
http://www.ex-am-ple.com<br>
www.test<br>
http://text
</div>
JS:
var div = document.getElementById("test");
var divContent = div.innerHTML;
div.innerHTML = divContent.replace(/http:\/\/(.*)\.(\w+)/g, '$1.$2');
I am using a markdown parser that works great if I pass it a string like this:
el.innerHTML = marked('#Introduction:\nHere you can write some text.');
But if I have that string inside HTML and send it to parser like
el.innerHTML = marked(otherEl.innerHTML);
it does not get parsed. Why is this? Does the string format of .innerHTML do something I am missing?
jsFiddle: http://jsfiddle.net/5p8be1b4/
My HTML:
<div id="editor">
<div class="contentTarget"></div>
<div class="contentSource">#Introduction:\nHere you can write some text.</div>
</div>
div.contentTarget should receive HTML, parsed markdown. But it receives a un-parsed string only.
In the image bellow is the jsFiddle output. A unformated div.contentTarget, the original div.contentSource where I get the innerHTML to use in div.contentTarget and in the bottom, a working parsed div#tester which received a string directly into the parser.
The issue is around your newlines. When you put \n inside a string in javascript, you're putting an actual newline character in it.
The same \n inside your HTML content is just that, \n. It is not a newline. If you change your HTML to this (with an actual newline), it works as expected:
<div class="contentSource">#Introduction:
Here you can write some text.</div>
Updated fiddle
Alternatively, if you change your javascript string to:
test.innerHTML = marked('#Introduction:\\nHere you can write some text.');
So that the string actually contains \n rather than a newline, you see the same erroneous behaviour.
Got it.
In your html, you have \n, but it's supposed to be a line-break, and you should use br becasue this is supposed to be html.
<div class="contentSource">#Introduction:<br/>Here you can write some text.</div>
instead of:
<div class="contentSource">#Introduction:\nHere you can write some text.</div>
When you debug the code, if you send the innerHTML to marked, it shows this as a function parameter:
#Introduction:\nHere you can write some text.
But when you send the string in the js, it shows the parameter like this:
#Introduction:
Here you can write some text.
Hope this helps.
JsFiddle: http://jsfiddle.net/gbrkj901/11/
Your HTML is rendering differently because Javascript automatically interprets \n as a newline.
Consider the following:
alert('a\ntest');
Which will have an alert with 2 lines.
And now, consider the following:
<span>a\ntest</span>
<script>
alert(document.getElementsByTagName('span')[0].innerHTML);
</script>
This will show a\ntest.
To fix it, use this:
el.innerHTML = marked(otherEl.innerHTML.replace(/\\n/g,'\n'));
Or, a more general and secure way:
el.innerHTML = marked(
otherEl
.innerHTML
.replace(
/\\([btnvfr"'\\])/g,
function(_,c){
return {
b:'\b',
t:'\t',
v:'\v',
n:'\n',
r:'\r',
'"':'"',
"'":"'",
'\\':'\\'
}[c];
}
)
);
Or, if you like it minimal and you are ready to have cthulhu knocking on your front door, use this:
el.innerHTML = marked(otherEl.innerHTML.replace(/\\([btnvfr])/g,function(_,c){return eval('return "\\'+c+'"');}));
Just to give an Idea what i'm trying to do here's an example code:
$(function(){
if ($('.maybe > div > a.link:contains(".JPG, .jpg, .gif, .GIF")').length) {
alert('hello');
});
I want to check if the content of some links are containing the dot and the letters of all image extensions, like
<div class="maybe">
<div>
<a class="link" href="someURL">thisIsAnImage.jpg</a>
<a class="link" href="someURL">thisIs**NOT**AnImage.pdf</a>
</div>
</div>
<div class="maybe">
<div>
<a class="link" href="someURL">thisIs**NOT**AnImage.zip</a>
<a class="link" href="someURL">thisIsAnotherImage.png</a>
</div>
</div>
The div's and links are generated dynamically by php, so there's no way to know how many links and div's there will be once the page is generated.
How to write the code in a properply way?
Thanks a lot for helping me to resolve the problem.
Here's my first instinct:
$('.maybe .link').each(function () {
if ($(this).text().toLowerCase().match(/\.(jpg|png|gif)/g)) {
console.log("yay I did it");
}
});
Use toLowerCase() on the link text so you don't have to check both lower and upper case. Then use String.match(regex) with a regex group to match all the file extensions.
Hope this helps!
Edit: here's an example in jsfiddle. Open your javascript console to see the output of the console.log statement. http://jsfiddle.net/9Q5yu/1/
I'd suggest:
// selects all the 'a' elements, filters that collection:
var imgLinks = $('a').filter(function(){
// keeps *only* those element with an href that ends in one of the
// file-types (this is naive, however, see notes):
return ['png','gif','jpg'].indexOf(this.href.split('.').pop()) > -1;
});
// I have no idea what you were doing, trying to do, wanting to do or why,
// but please don't use 'alert()', it's a horrible UI:
if (imgLinks.length) {
console.log('hello');
}
The above is a relatively simple, and naive, check; in that it simply splits the href on the . characters and then tests the last element from the array (returned by split()) is equal to one of the elements of the array. This will fail for any image that has a query string, for example, such as http://example.com/image2.png?postValue=1234
Given the clarification in the comments, I'd amend the above to:
var fileTypes = ['png','gif','jpg','gif'],
imgLinks = $('a').filter(function(){
return (new RegExp(fileTypes.join('|') + '$', 'gi')).test($(this).text());
});
References:
JavaScript:
Array.prototype.indexOf().)
[RegExp.prototype.test()](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/test
String.prototype.split().
jQuery:
filter()
I'm trying to work out what regular expression I would need to change this string
html = '<img style="width: 311px; height: 376px;" alt="test" src="/img/1268749322.jpg" />';
to this
html = '<img width="311" height="376" alt="test" src="/img/1268749322.jpg" />';
with the help of Javascript.replace.
This is my start:
html = html.replace(/ style="width:\?([0-9])px*"/g, "width=\"$1\"");
Can anyone help me?
THANKS
It's generally considered a Bad Thing to do HTML parsing with RegExs.
Why not edit the DOM from JavaScript?
I'm not an expert on CSS, but isn't using style a better idea than width/height attributes?
You forgot the whitespace after : (\s*). You don't want ? there since it will miss >1 space or a tab
html = '<img style="width: 311px; height: 376px;" alt="test" src="/img/1268749322.jpg" />';
var d = document.createElement("div");
d.innerHTML = html;
var img = d.firstChild;
var width = parseInt(img.style.width, 10);
var height = parseInt(img.style.height, 10);
img.setAttribute("width", width);
img.setAttribute("height", height);
img.removeAttribute("style");
alert(d.innerHTML);
Of course, things get slightly easier if you don't start with a string ;-)
Gumbo is right, you should use DOM.
But to your regex: What is \? (question mark) supposed to mean? I would insert \s* (any number of whitespaces) at that position.
Yeah, normally I would do it with DOM, but in this special case I need to parse HTML-Code given from a WYSIWYG-Editor whish filters style-attributes. In general this is good, but for the pictures I would like to keep the set sizes with this html.replace-Hack.
This is the answer:
html = html.replace(/ style=".*?width:\s?(\d+).*?height:\s?(\d+).*?"/g, " width=\"$1\" height=\"$2\"");
Thanks for your help :-)
I can make it more elaborate if you wish, let me know how this works:
html = html.replace(/style=".*?width:\s?(\d+).*?height:\s?(\d+).*?"/g, "width=\"$1\" height=\"$2\"");
Edit: escaped qq