Can I inject text links within the body of a web-page

Can I inject text links within the body of a web-page - javascript

I am coding a proof of concept for my boss, I am a backend developer and haven't done javascript in years so I don't know much about same origin policy and other obstacles.
He basically has a chrome plugin and wants to replace matching keywords with links to his service, this will happen in any domain the user visits and not just one, similar to in-text advertising.
I was also wondering if it's possible to do this with an iframe, without the need of a chrome extension.

You will need a Chrome extention for that and permisions for content scripts.
It takes a lot of improvement but basically what you want to do is this:
document.onload = function(){
var textToParse = document.body.innerHTML,
linkHTML = 'Your Boss\'s Service Reference',
newText = '';
textToParse.split('Your Boss\'s Service Reference').join(linkHTML);
document.body.innerHTML = newText;
}
WARNING: The above code is a basic example of how to do it but might cause problems if the search in textToParse matches something that is HTML code. You should use jQuery and other tools for an easier/more secure way to edit text within HTML elements.

Related

Injecting HTML into existing web pages

I'm interested in the concept of injecting a bit of HTML into existing web pages to perform a service. The idea is to create an improved bookmarking system - but I digress, the specific implementation is unimportant. I'm quite new to web development and so I have no definite idea as to how to accomplish this, thought I have noticed a couple of possibilities.
I found out I can right click > 'inspect element' and proceed to edit my browser's version of the HTML corresponding with the webpage I'm viewing. I assume that this means I can edit what I see and interact with. Could I possibly create a script that ran from a button on bookmarks bar that injected an Iframe which linked to a web service of my making? (And deleted itself after being used).
Could I possibly use a chrome extension to accomplish this? I have no experience with creating extensions and so I have no clue what they're capable of - though I wouldn't be against learning.
Which of these would be best? If they are even valid ideas. Or is there another way that I've yet to know of?
EDIT: The goal is to have a user click a button in the browser if they would like to save this page. They are then presented an interface visually independent of the rest of the page that allows them to categorize this webpage according to their interests. It would take the current link, add some information such as a comment, rating, etc. and add it to the user's data. This is meant as a sort of side-service to a website whose purpose would be to better organize and display the browsing information of the user.

Yes, you can absolutely do this. You're asking about Bookmarklets.
A bookmarklet is just a bookmark where the URL is a piece of JavaScript instead of a URL. They are very simple, yet can be capable of doing anything to a web page. Full JavaScript access.
A bookmarklet can be engaged on any web page -- the user simply has to click the bookmark(let) to launch it on the current page.
Bookmark = "http://chasemoskal.com/"
Bookmarklet = "javascript:(function(){ alert('I can do anything!') })();"
That's all it is. You can create a bookmarklet link which can be clicked-and-dragged onto a bookmark bar like this:
Bookmarklet
Bookmarklets can be limited in size, however, you can load an entire external script from the bookmarklet.

You can do what you refer to as like an <iframe>, so here are some steps that may help you, simply put:
Create an XMLHttpRequest object and make a request for a page trough it.
Make the innerHTML field of an element to hold the resultString of the previous request, aka the HTML structure.
Lets assume you have an element with the id="Result" on your html. The request goes like this:
var req = new XMLHttpRequest();
req.open('GET', 'http://example.com/mydocument.html', true);
req.onreadystatechange = function (aEvt) {
if (req.readyState == 4 && req.status == 200) {
Result.innerHTML = req.responseText;
}
};
req.send(null);
Here's an improved version in the form of a fiddle.
When you're done, you can delete that injected HTML by simply:
Result.innerHTML = '';
And then anything inside it will be gone.
However, you can't make request to other servers due to request policies. They have to be under the same domain or server. Take a look at this: Using XMLHttpRequest on MDN reference pages for more information.

JS append to child only works on some webpages

Context: I have a Chrome extension that is not an app where chrome.extension.isInstalled (or whatever that code is) will work. Therefore I'm detecting the presence of whether or not my extension is installed by having the extension append a blank div into the webpage. This way, I can check if that div is present, and if so, not allow user to install again.
The problem is, both these variations of code below:
var isInstalled = document.createElement('div');
isInstalled.id = 'extension-is-installed';
document.body.appendChild(isInstalled);
document.body.innerHTML += "<div id='extension-is-installed-2'>"
Work on some pages and not others. They work on Google, REI, etc. but not on eBay, Amazon, etc. Incidentally, the one page I need it to work on http://www.projectborrow.com, it does not.
Any thoughts on why? I included my site above so that someone can try to make the append-ment work.
Thanks!

In the manifest.json file for the Chrome Extension, you have to specify which websites the extention will operate on. Then in the code for the actual extension itself, you need to specify how the extension will operate on each site.

Reading document.links from an IFrame

EDIT:
Just a quick mention as to the nature of this program. The purpose of this program is for web inventory. Drawing different links and other content into a type of hierarchy. What I'm having trouble with is pulling a list of links from a webpage within an IFrame.
I get the feeling this one is gonna bite me hard. (other posts indicate relevance to xss and domain controls)
I'm just trying something with javascript and Iframes. Basically I have a panel with an IFrame inside that goes to whatever website you want it to. I'm trying to generate a list of links from the webpage within the Iframe. Its strictly read only.
Yet I keep coming up against the permission denied problem.
I understand this is there to stop cross site scripting attacks and the resolution seems to be to set the document domain to the host site.
JavaScript permission denied. How to allow cross domain scripting between trusted domains?
However I dont think this will work if I'm trying to go from site to site.
Heres the code I have so far, pretty simple:
function getFrameLinks()
{
/* You can all ignore this. This is here because there is a frame within a frame. It should have no effect ont he program. Just start reading from 'contentFrameElement'*/
//ignore this
var functionFrameElem = document.getElementById("function-IFrame");
console.log("element by id parent frame ");
console.log(functionFrameElem);
var functionFrameData = functionFrameElem.contentDocument;
console.log("Element data");
console.log(functionFrameData);
//get the content and turn it into a doc
var contentFrameElem = functionFrameData.getElementById("content-Frame")
console.log(contentFrameElem);
var contentFrameData = contentFrameElem.contentDocument;
console.log(contentFrameData);
//get the links
//var contentFrameLinks = contentFrameData.links;
var contentFrameLinks = contentFrameData.getElementsByTagName('a');
Goal: OK so due to this being illegal and very similar to XSS. Perhaps someone could point out a solution as to how to locally store the document. I dont seem to have any problems accessing document.links with internal pages in the frame.
Possibly some sort of temp database of cache. The simpler the solution the better.

If you want to read it just for your self and in your browser, you can write a simple proxy with php in your server. the most simple code:
<?php /* proxy.php */ readfile($_GET['url']); ?>
now set your iframe src to your proxy file:
<iframe src="http://localhost/proxy.php?url=http://www.google.com"
id="function-IFrame"></iframe>
now you can access the iframe content from your (local) server.
if you want set the url with a program remember to encode the url (urlencode in php or encodeURIComponent in js)

Here is a bookmarklet you can run on any page (assuming the links are not in an iframe)
javascript:var x=function(){var lnks=document.links,list=[];for (var i=0,n=lnks.length;i<n;i++) {var href = lnks[i].href; list.push(href)};if (list.length>0) { var w=window.open('','_blank');w.document.write(list.length+' links found<br/><ul><li>'+list.sort().join('</li><li>')+'</ul>');w.document.close()}};void(x());
the other way is for you (on Windows) to save your HTML with extension .HTA
Then you can grab whatever lives in the iFrame

You might be interested in using the YQL (Yahoo Query Language) to retrieve filtered results from remote urls..
example of retrieving all the links from the yahoo.com domain

How to get page source of page at another domain (in Javascript!)

I am trying to get the source page of a webpage on a different domain. I know this is easily done with PHP for example but I would like to do it in Javascript because I am getting results from a page and if I use a server-side language, the original website will block the calls since they come from the same IP. However, if the calls are done on the client side, it is like the user request the results each time (different user, different IP, no original site blocking me). Is there a way to do that (even if not in Javascript but client-side).
To clarify the code I want will be applied to an HTML page so I can get the results, style them, add/delete, etc then display them to the user.
Thank you.

Modern browsers support cross-domain AJAX calls but the target site has to allow them by using special headers in the reply. Apart from that, there is no pure Javascript solution AFAIK.

Could you use an iframe? You wont have direct access to the markup to due to cross-domain restrictions, but you can still display the third-party page to the user...
window.onload = function() {
var i = document.createElement("IFRAME");
i.setAttribute("name", "ID_HERE");
i.setAttribute("id", "ID_HERE");
i.setAttribute("src", "URL_HERE");
i.style.maxHeight = "300px";
i.style.width = "100%";
document.body.appendChild(i);
}

On Windows you can use HTA
An HTA can access anything crossdomain in an iframe for example

Align the WMD editor's preview HTML with server-side HTML validation (e.g. no embedded JavaScript code)

There are many Stack Overflow questions (e.g. Whitelisting, preventing XSS with WMD control in C# and WMD Markdown and server-side) about how to do server-side scrubbing of Markdown produced by the WMD editor to ensure the HTML generated doesn't contain malicious script, like this:
<img onload="alert('haha');"
src="http://www.google.com/intl/en_ALL/images/srpr/logo1w.png" />
But I didn't find a good way to plug the hole on the client side too. Client validation isn't a replacement for scrubbing validation on the server of course, since anyone can pretend to be a client and POST you nasty Markdown. And if you're scrubbing the HTML on the server, an attacker can't save the bad HTML so no one else will be able to see it later and have their cookies stolen or sessions hijacked by the bad script. So there's a valid case to be made that it may not be worth enforcing no-script rules in the WMD preview pane too.
But imagine an attacker found a way to get malicious Markdown onto the server (e.g. a compromised feed from another site, or content added before an XSS bug was fixed). Your server-side whitelist applied when translating markdown to HTML would normally prevent that bad Markdown from being shown to users. But if the attacker could get someone to edit the page (e.g. by posting another entry saying the malicious entry had a broken link and asking someone to fix it), then anyone who edits the page gets their cookies hijacked. This is admittedly a corner case, but it still may be worth defending against.
Also, it's probably a bad idea to allow the client preview window to allow different HTML than your server will allow.
The Stack Overflow team has plugged this hole by making changes to WMD. How did they do it?
[NOTE: I already figured this out, but it required some tricky JavaScript debugging, so I'm answering my own question here to help others who may want to do ths same thing].

One possible fix is in wmd.js, in the pushPreviewHtml() method. Here's the original code from the Stack Overflow version of WMD on GitHub:
if (wmd.panels.preview) {
wmd.panels.preview.innerHTML = text;
}
You can replace it with some scrubbing code. Here's an adaptation of the code that Stack Overflow uses in response to this post, which restricts to a whitelist of tags, and for IMG and A elements, restricts to a whitelist of attributes (and in a specific order too!). See the Meta Stack Overflow post What HTML tags are allowed on Stack Overflow, Server Fault, and Super User? for more info on the whitelist.
Note: this code can certainly be improved, e.g. to allow whitelisted attributes in any order. It also disallows mailto: URLs which is probably a good thing on Internet sites but on your own intranet site it may not be the best approach.
if (wmd.panels.preview) {
// Original WMD code allowed JavaScript injection, like this:
// <img src="http://www.google.com/intl/en_ALL/images/srpr/logo1w.png" onload="alert('haha');"/>
// Now, we first ensure elements (and attributes of IMG and A elements) are in a whitelist,
// and if not in whitelist, replace with blanks in preview to prevent XSS attacks
// when editing malicious Markdown.
var okTags = /^(<\/?(b|blockquote|code|del|dd|dl|dt|em|h1|h2|h3|i|kbd|li|ol|p|pre|s|sup|sub|strong|strike|ul)>|<(br|hr)\s?\/?>)$/i;
var okLinks = /^(<a\shref="(\#\d+|(https?|ftp):\/\/[-A-Za-z0-9+&##\/%?=~_|!:,.;\(\)]+)"(\stitle="[^"<>]+")?\s?>|<\/a>)$/i;
var okImg = /^(<img\ssrc="https?:(\/\/[-A-Za-z0-9+&##\/%?=~_|!:,.;\(\)]+)"(\swidth="\d{1,3}")?(\sheight="\d{1,3}")?(\salt="[^"<>]*")?(\stitle="[^"<>]*")?\s?\/?>)$/i;
text = text.replace(/<[^<>]*>?/gi, function (tag) {
return (tag.match(okTags) || tag.match(okLinks) || tag.match(okImg)) ? tag : ""
})
wmd.panels.preview.innerHTML = text; // Original code
}
Also note that this fix is not in the Stack Overflow version of WMD on GitHub-- clearly the change was made later and not checked back into GitHub.
UPDATE: in order to avoid breaking the feature where hyperlinks are auto-created when you type in a URL, you also will need to make changes to showdown.js, like below:
Original code:
var _DoAutoLinks = function(text) {
text = text.replace(/<((https?|ftp|dict):[^'">\s]+)>/gi,"$1");
// Email addresses: <address#domain.foo>
/*
text = text.replace(/
<
(?:mailto:)?
(
[-.\w]+
\#
[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+
)
>
/gi, _DoAutoLinks_callback());
*/
text = text.replace(/<(?:mailto:)?([-.\w]+\#[-a-z0-9]+(\.[-a-z0-9]+)*\.[a-z]+)>/gi,
function(wholeMatch,m1) {
return _EncodeEmailAddress( _UnescapeSpecialChars(m1) );
}
);
return text;
}
Fixed code:
var _DoAutoLinks = function(text) {
// use simplified format for links, to enable whitelisting link attributes
text = text.replace(/(^|\s)(https?|ftp)(:\/\/[-A-Z0-9+&##\/%?=~_|\[\]\(\)!:,\.;]*[-A-Z0-9+&##\/%=~_|\[\]])($|\W)/gi, "$1<$2$3>$4");
text = text.replace(/<((https?|ftp):[^'">\s]+)>/gi, '$1');
return text;
}

It is not a security issue to allow the local user to execute scripts in the page context as long as it's impossible for any third party to provide the script.
Without the editor doing it, the user could always enter a javascript: url while on your page or use Firebug or something similar.

We Keep Coding

JavaScript is the programming language of the Web.