I am trying to gather line-ups from football/soccer reports. I decided to web-scrape the data from a reports provider, but their websites are loaded with javascript.
To be more specific, let's take this link to a flashscores.co.uk match.
First, they restrict CORS, which means I used allorigins.me to avoid it and then I used this code:
function readurl(url, elementID){
var url = "http://allorigins.me/get?url=" + encodeURIComponent(url) + "&callback=?";
var xhttp = new XMLHttpRequest();
xhttp.onreadystatechange = function() {
if (this.readyState == 4 && this.status == 200) {
document.getElementById(elementID).innerHTML = this.responseText;
}
};
xhttp.open("GET", url, true);
xhttp.send();
}
The result was something like this and it looks the same all the way down (still \n and \t, not the real content). I guess the problem is that the flashscores website is using javascript to load the data, but allorigins.me did not "wait" until the whole website was loaded. Here is another look, where it seems that is being loaded with javascript.
The desired result is to gather the starting elevens of both teams (Allonso M., Arrizabalaga K., Azpilicueta C.,...). I inspected the website and found, that every name is inside a HTML tag: <div class="name">PLAYER'S NAME HERE</div>.
Any idea how to avoid both problems at once?
CORS restriction
The delay before the web is "filled" with data from javascript
I am trying to use client-side languages (no PHP).
Thank you :)
There are a few problems with your question:
CORS is used to protect resources on the server side, and you need the client side resources, which are mostly public, so you do not need a way to avoid it.
The problem is not "waiting" until the page will load, the problem is you need to run these scripts yourself.
I recommend you use something like JSDom with Node.js for this task, should be quite simple.
A great blog post about web scraping with Node.js (without script execution): here
official JSDom npm page: here
Good Luck !
I'm developing a web application and since it has access to a database underneath, I require the ability to disable the developer tools from Safari, Chrome, Firefox and Internet Explorer and Firebug in Firefox and all similar applications. Is there a way to do this?
Note: The AJAX framework provided by the database requires that anything given to the database to be in web parameters that can be modified and that anything it returns be handled in JavaScript. Therefore when it returns a value like whether or not a user has access to a certain part of the website, it has to be handled in JavaScript, which developer tools can then access anyway. So this is required.
UPDATE: For those of you still thinking I'm making bad assumptions, I did ask the vendor. Below is their response:
Here are some suggestions for ways of mitigating the risk:
1) Use a javascript Obfuscator to obfuscate the code and only provide
the obfuscated version with the sold application; keep the non
obfuscated version for yourself to do edits. Here is an online
obfuscator:
How can I obfuscate (protect) JavaScript?
http://en.wikipedia.org/wiki/Obfuscated_code
http://javascriptobfuscator.com/default.aspx
2) Use a less descriptive name; maybe 'repeatedtasks.js' instead of
'security.js' as 'security.js' will probably stand out more to anyone
looking through this type of information as something important.
No you cannot do this.
The developer menu is on the client side and is provided by the user's browser.
Also the browser developer should have nothing to do with your server side database code, and if it does, you need some maaaaaajor restructuring.
If your framework requires that you do authorization in the client, then...
You need to change your framework
When you put an application in the wild, where users that you don't trust can access it; you must draw a line in the sand.
Physical hardware that you own; and can lock behind a strong door. You can do anything you like here; this is a great place to keep your database, and to perform the authorization functions to decide who can do what with your database.
Everything else; Including browsers on client computers; mobile phones; Convenience Kiosks located in the lobby of your office. You cannot trust these! Ever! There's nothing you can do that means you can be totally sure that these machines aren't lying to cheat you and your customers out of money. You don't control it, so you can't ever hope to know what's going on.
In fact this is somehow possible (how-does-facebook-disable-developer-tools), but this is terribly bad idea for protecting your data. Attacker may always use some other (open, self written) engines that you don't have any control on. Even javascript obfuscation may only slow down a bit cracking of your app, but it also gives practically no security.
The only reasonable way to protect your data is to write secure code on server side.
And remember, that if you allow someone to download some data, he can do with it whatever he wants.
There's no way your development environment is this brain-dead. It just can't be.
I strongly recommend emailing your boss with:
A demand for a week or two in the schedule for training / learning.
A demand for enough support tickets with your vendor to figure out how to perform server-side validation.
A clear warning that if the tool cannot do server-side validation, that you will be made fun of on the front page of the Wall Street Journal when your entire database is leaked / destroyed / etc.
No. It is not possible to disable the Developer Tools for your end users.
If your application is insecure if the user has access to developer tools, then it is just plain insecure.
Don't forget about tools like Fiddler. Where even if you lock down all the browsers' consoles, http requests can be modified on client, even if you go HTTPS. Fiddler can capture requests from browser, user can modify it and re-play with malicious input. Unless you secure your AJAX requests, but I'm not aware of a method how to do this.
Just don't trust any input you receive from any browser.
you cannot disable the developer tool. but you can annoys any one who try to use the developer tool on your site, try the javascript codes blow, the codes will break all the time.
(function () {
(function a() {
try {
(function b(i) {
if (('' + (i / i)).length !== 1 || i % 20 === 0) {
(function () { }).constructor('debugger')()
} else {
debugger
}
b(++i)
}
)(0)
} catch (e) {
setTimeout(a, 5000)
}
}
)()
}
)();
Update at the time (2015) when this answer was posted, this trick was possible. Now (2017) browsers are mature. Following trick no longer works!
Yes it is possible. Chrome wraps all console code in
with ((console && console._commandLineAPI) || {}) {
<code goes here>
}
... so the site redefines console._commandLineAPI to throw:
Object.defineProperty(console, '_commandLineAPI',
{ get : function() { throw 'Nooo!' } })
This is the main trick!
$('body').keydown(function(e) {
if(e.which==123){
e.preventDefault();
}
if(e.ctrlKey && e.shiftKey && e.which == 73){
e.preventDefault();
}
if(e.ctrlKey && e.shiftKey && e.which == 75){
e.preventDefault();
}
if(e.ctrlKey && e.shiftKey && e.which == 67){
e.preventDefault();
}
if(e.ctrlKey && e.shiftKey && e.which == 74){
e.preventDefault();
}
});
!function() {
function detectDevTool(allow) {
if(isNaN(+allow)) allow = 100;
var start = +new Date();
debugger;
var end = +new Date();
if(isNaN(start) || isNaN(end) || end - start > allow) {
console.log('DEVTOOLS detected '+allow);
}
}
if(window.attachEvent) {
if (document.readyState === "complete" || document.readyState === "interactive") {
detectDevTool();
window.attachEvent('onresize', detectDevTool);
window.attachEvent('onmousemove', detectDevTool);
window.attachEvent('onfocus', detectDevTool);
window.attachEvent('onblur', detectDevTool);
} else {
setTimeout(argument.callee, 0);
}
} else {
window.addEventListener('load', detectDevTool);
window.addEventListener('resize', detectDevTool);
window.addEventListener('mousemove', detectDevTool);
window.addEventListener('focus', detectDevTool);
window.addEventListener('blur', detectDevTool);
}
}();
https://github.com/theajack/disable-devtool
This tool just disabled devtools by detecting if its open and then just closing window ! Very nice alternative. Cudos to creator.
I found a way, you can use debugger keyword to stop page works when users open dev tools
(function(){
debugger
}())
Yeah, this is a horrible design and you can't disable developer tools. Your client side UI should be sitting on top of a rest api that's designed in such a way that a user can't modify anything that was already valid input anyways.
You need server side validation on inputs. Server side validation doesn't have to be verbose and rich, just complete.
So for example, client side you might have a ui to show required fields etc. But server side you can just have one boolean set to true, and set it to false if a field fails validation and then reject the whole request.
Additionally your client side app should be authenticated. You can do that 100 thousand ways. But one I like to do is use ADFS passthrough authentication. They log into the site via adfs which generates them a session cookie. That session cookie get's passed to the rest api (all on the same domain) and we authenticate requests to the rest api via that session cookie. That way, no one that hasn't logged in via the login window can call the rest api. It can only be called form their browser context.
Developer tool wise, you need to design your app in such a way that anything that a user can do in the developer console is just a (feature) or a breaking thing. I.e. say they fill out all the fields with a js snippet, doesn't matter, that's valid input. Say they override the script and try to send bad data to the api calls. Doesn't matter, your server side validation will reject any bad input.
So basically, design your app in such a way that developer tool muckery either brakes their experience (as it won't work), or lets them make their lives a little easier, like auto selecting their country every time.
Additionally, you're not even considering extensions... Extensions can do anything and everything the developer console can do....
I am just throwing a random Idea maybe this will help.
If someone tries to open the developer tool just redirect to some other site.
I don't know how much this is gonna effective for you but at least they can't perform something on your site.
You can not block developer tools, but you can try to stop the user to enter them. You can try to customize a right-click menu and block the keystrokes for developer tools.
You can't disable developer tools
However...
I saw one website uses a simple trick to make devtools unusable. It worked like this - when the user opens devtools the whole page turns into blank page, and the debugger in devtools is stuck in a loop on a breakpoint. Even page refresh doesn't get you out of that state.
Yes. No one can control client browser or disable developer tool or debugger tool.
But you can build desktop application with electron.js where you can launch your website. Where you can stop debugger or developer tool.
Our team snippetbucket.com had build plenty of solution with electron.js, where similar requirement was their. as well restructure and protect website with many tools.
As well with electron.js many web solution converted and protected in well manner.
You can easily disable Developer tools by defining this:
Object.defineProperty(console, '_commandLineAPI', { get : function() { throw 'Nooo!' } })
Have found it here: How does Facebook disable the browser's integrated Developer Tools?
I am currently writing a program that collects information from a sports website. (it contains the history of some basketball matches) The problem is that the website uses Angular.js for dynamical HTML binding. Consequently, the HTML source code involves lots of variables.
I need to find out the values of the variables in order to make my program work as I want. Is there any library or framework that could help me?
Edit: I am not limited by anything, but I prefer a web app (MEAN, JS frameworks with node-webkit). If it can't be done, I can also code it in C++ or Java (or extend it further to Android with NDK or SDK)
Disclaimer: This is not grey-hat stuff. I just need to do some web-scraping.
PhantomJS is a headless browser. It will allow you to use JavaScript to get the information you want.
Details:
It will browse to the page you want, execute the JavaScript like any browser and have access to the page as if it was displayed to a normal user using a normal browser. Using JavaScript DOM traversal, you will be able to get the information you need. This is almost the same as automatizing the task of opening a console in a browser and executing javascript which will get the information from the page.
While the below example is really simple, it can do much more than just getting the page results... it can click buttons, navigate to other pages, extract only relevant information, extract the page as an image... Do not hesitate referring to its Quick start documentation to learn more about it.
Example script returning the complete HTML page after waiting 10 seconds for the AngularJS to have finished calculating the page:
Command line usage: phantomjs-1.9.1 this_script.js
this_script.js (PhantomJS 2.0 may have different syntax in some cases):
var url = phantom.args[0]
function getDocumentElementAsHTML(page) {
return page.evaluate(function() {
return document.documentElement.innerHTML
})
}
var page = new WebPage()
page.settings.userAgent = "PhantomJS"
//page.onConsoleMessage = function (msg) { console.log(msg); }
page.open(url, function (status) {
if (status !== 'success') {
console.log('Unable to access network')
phantom.exit()
} else {
setTimeout(function(){
console.log(getDocumentElementAsHTML(page))
phantom.exit()
},10000)
}
});
PS: Waiting 10 seconds is not always a great solution, I used to periodically test the existence of the elements I wanted to get information from to be sure the JavaScript finished loading instead.
Source: grey-hat stuff I did in the past
I'd say you'd want to look at http://phantomjs.org/, http://www.slimerjs.org/, and/or http://casperjs.org/.
Phantom & Slimer give you API access to Webkit and Gecko respectively. Casper adds a more user friendly API over the top.
I'm interested in the concept of injecting a bit of HTML into existing web pages to perform a service. The idea is to create an improved bookmarking system - but I digress, the specific implementation is unimportant. I'm quite new to web development and so I have no definite idea as to how to accomplish this, thought I have noticed a couple of possibilities.
I found out I can right click > 'inspect element' and proceed to edit my browser's version of the HTML corresponding with the webpage I'm viewing. I assume that this means I can edit what I see and interact with. Could I possibly create a script that ran from a button on bookmarks bar that injected an Iframe which linked to a web service of my making? (And deleted itself after being used).
Could I possibly use a chrome extension to accomplish this? I have no experience with creating extensions and so I have no clue what they're capable of - though I wouldn't be against learning.
Which of these would be best? If they are even valid ideas. Or is there another way that I've yet to know of?
EDIT: The goal is to have a user click a button in the browser if they would like to save this page. They are then presented an interface visually independent of the rest of the page that allows them to categorize this webpage according to their interests. It would take the current link, add some information such as a comment, rating, etc. and add it to the user's data. This is meant as a sort of side-service to a website whose purpose would be to better organize and display the browsing information of the user.
Yes, you can absolutely do this. You're asking about Bookmarklets.
A bookmarklet is just a bookmark where the URL is a piece of JavaScript instead of a URL. They are very simple, yet can be capable of doing anything to a web page. Full JavaScript access.
A bookmarklet can be engaged on any web page -- the user simply has to click the bookmark(let) to launch it on the current page.
Bookmark = "http://chasemoskal.com/"
Bookmarklet = "javascript:(function(){ alert('I can do anything!') })();"
That's all it is. You can create a bookmarklet link which can be clicked-and-dragged onto a bookmark bar like this:
Bookmarklet
Bookmarklets can be limited in size, however, you can load an entire external script from the bookmarklet.
You can do what you refer to as like an <iframe>, so here are some steps that may help you, simply put:
Create an XMLHttpRequest object and make a request for a page trough it.
Make the innerHTML field of an element to hold the resultString of the previous request, aka the HTML structure.
Lets assume you have an element with the id="Result" on your html. The request goes like this:
var req = new XMLHttpRequest();
req.open('GET', 'http://example.com/mydocument.html', true);
req.onreadystatechange = function (aEvt) {
if (req.readyState == 4 && req.status == 200) {
Result.innerHTML = req.responseText;
}
};
req.send(null);
Here's an improved version in the form of a fiddle.
When you're done, you can delete that injected HTML by simply:
Result.innerHTML = '';
And then anything inside it will be gone.
However, you can't make request to other servers due to request policies. They have to be under the same domain or server. Take a look at this: Using XMLHttpRequest on MDN reference pages for more information.
I have multiple <head> references to external js and css resources. Mostly, these are for things like third party analytics, etc. From time to time (anecdotally), these resources fail to load, often resulting in browser timeouts. Is it possible to detect and log on the server when external JavaScript or CSS resources fail to load?
I was considering some type of lazy loading mechanism that when, upon failure, a special URL would be called to log this failure. Any suggestions out there?
What I think happens:
The user hits our page and the server side processes successfully and serves the page
On the client side, the HTML header tries to connect to our 3rd party integration partners, usually by a javascript include that starts with "http://www.someothercompany.com...".
The other company cannot handle our load or has shitty up-time, and so the connection fails.
The user sees a generic IE Page Not Found, not one from our server.
So even though my site was up and everything else is running fine, just because this one call out to the third party servers failed, one in the HTML page header, we get a whole failure to launch.
If your app/page is dependent on JS, you can load the content with JS, I know it's confusing. When loading these with JS, you can have callbacks that allow you to only have the functionality of the loaded content and not have to worry about what you didn't load.
var script = document.createElement("script");
script.type = "text/javascript";
script.src = 'http://domain.com/somefile.js';
script.onload = CallBackForAfterFileLoaded;
document.body.appendChild(script);
function CallBackForAfterFileLoaded (e) {
//Do your magic here...
}
I usually have this be a bit more complex by having arrays of JS and files that are dependent on each other, and if they don't load then I have an error state.
I forgot to mention, obviously I am just showing how to create a JS tag, you would have to create your own method for the other types of files you want to load.
Hope maybe that helps, cheers
You can look for the presence of an object in JavaScript, e.g. to see if jQuery is loaded or not...
if (typeof jQuery !== 'function') {
// Was not loaded.
}
jsFiddle.
You could also check for CSS styles missing, for example, if you know a certain CSS file sets the background colour to #000.
if ($('body').css('backgroundColor') !== 'rgb(0, 0, 0)') {
// Was not loaded.
}
jsFiddle.
When these fail, you can make an XHR to the server to log these failings.
What about ServiceWorker? We can use it to intercept all http requests and get response code to log whether the external resource fails to load.
Make a hash of the js name and session cookie and send both js name in plain and the hash. Server side, make the same hash, if both are same log, if not, assume it's abuse.