I have a nodejs app that runs a web server and works as a web API. Simple GETs to interact with it. My goal is to be as accessible as possible by any language or programmatic scenario. My biggest problem is that javascript run in the browser can't hit it because the browser (specifically chrome) prevent cross site scripting. I'm open to any ideas that allow this. I want any site to be able to make requests against the url, sort of like how twitter has a javascript API.
I've tried using jQuery's ajax with JSONP but I was having all sorts of problems. Either it wouldn't go through or if it did go through I wouldn't get the response.
If there's a pure javascript way, I'd prefer that because of it having fewer dependencies.
Related
I am trying to access the HaveIBeenPwned web API for breached sites and emails, but I am being blocked by cloudflare's anti-DDoS protection. I've found that there are ways to get around this with Python and JavaScript, but I haven't been successful with my React/Rails app.
This post has the following quote: "Currently, they check if the client supports JavaScript, which can be spoofed." I haven't been able to find any other documentation of this behavior.
I need this information in the backend, so calling HIBP from the frontend is not ideal. Any idea how to hit the API from Rails?
If they want it to be used on frontend only, using it on backend can be tricky. You will need to create your own wrapper with, for example, puppeteer. And from Rails side execute command that will do some work in background. But keep in mind, it doesn't work very fast (can take up to 5 seconds per request), and it will block your Rails process.
I would start with single node.js app that will accept command-line parameters. Ruby isn't very good at advanced web scraping, so there are no any gentle solutions. Also, keep in mind that you don't have any guarantees. One day it can just stop working.
I have been trying for sometime figure out how the js at yt login page works to send the appropriate form data with requests python to log in a youtube account but I still could not after multiple tries to read the code, any suggestions?
Modern web site logins are increasingly dependent on Javascript, cookies, and multiple page requests. They're rarely a matter of simply sending form data, and the methods change with time. Without running the Javascript code, it can be tricky to make this work and keep it working.
For scripting Youtube access with Python, I recommend using Chrome in the headless mode (--headless option). If you start it with the Remote Debugging Protocol, it's easy to send commands to the browser from Python.
I wonder if its possible to scrape an external (cross-domain) page through the user's IP?
For a shopping comparison site, I need to scrape pages of an e-com site but several requests from the server would get me banned, so I'm looking for ways to do client-side scraping — that is, request pages from the user's IP and send to server for processing.
No, you won't be able to use the browser of your clients to scrape content from other websites using JavaScript because of a security measure called Same-origin policy.
There should be no way to circumvent this policy and that's for a good reason. Imagine you could instruct the browser of your visitors to do anything on any website. That's not something you want to happen automatically.
However, you could create a browser extension to do that. JavaScript browser extensions can be equipped with more privileges than regular JavaScript.
Adobe Flash has similar security features but I guess you could use Java (not JavaScript) to create a web-scraper that uses your user's IP address. Then again, you probably don't want to do that as Java plugins are considered insecure (and slow to load!) and not all users will even have it installed.
So now back to your problem:
I need to scrape pages of an e-com site but several requests from the server would get me banned.
If the owner of that website doesn't want you to use his service in that way, you probably shouldn't do it. Otherwise you would risk legal implications (look here for details).
If you are on the "dark side of the law" and don't care if that's illegal or not, you could use something like http://luminati.io/ to use IP adresses of real people.
Basically browsers are made to avoid doing this…
The solution everyone thinks about first:
jQuery/JavaScript: accessing contents of an iframe
But it will not work in most cases with "recent" browsers (<10 years old)
Alternatives are:
Using the official apis of the server (if any)
Try finding if the server is providing a JSONP service (good luck)
Being on the same domain, try a cross site scripting (if possible, not very ethical)
Using a trusted relay or proxy (but this will still use your own ip)
Pretends you are a google web crawler (why not, but not very reliable and no warranties about it)
Use a hack to setup the relay / proxy on the client itself I can think about java or possibly flash. (will not work on most mobile devices, slow, and flash does have its own cross site limitations too)
Ask google or another search engine for getting the content (you might have then a problem with the search engine if you abuse of it…)
Just do this job by yourself and cache the answer, this in order to unload their server and decrease the risk of being banned.
Index the site by yourself (your own web crawler), then use your own indexed website. (depends on the source changes frequency)
http://www.quora.com/How-can-I-build-a-web-crawler-from-scratch
[EDIT]
One more solution I can think about is using going through a YQL service, in this manner it is a bit like using a search engine / a public proxy as a bridge to retrieve the informations for you.
Here is a simple example to do so, In short, you get cross domain GET requests
Have a look at http://import.io, they provide a couple of crawlers, connectors and extractors. I'm not pretty sure how they get around bans but they do somehow (we are using their system over a year now with no problems).
You could build an browser extension with artoo.
http://medialab.github.io/artoo/chrome/
That would allow you to get around the same orgin policy restrictions. It is all javascript and on the client side.
I have an asp.net website with client javascript making lots of ajax calls back to the server. Is there any way I can prevent a google chrome extension from calling my ajax endpoints or to detect when they are being made by the chrome extension code and not my own javascript code. So far I have tested using the referer, httponly cookies, but there is no difference between the 2 calls. Any ideas would be appreciated.
No, there is not.
Chrome extensions have elevated permissions. They 'out-permit' your website JavaScript code and may manipulate and call it.
Even if you add something like an anti CSRF token, an extension could still read it and bypass that protection. They can run JavaScript code on your site and make modifications to your own code on the site on the fly without notifying your or your users.
The only thing you can do is not trust the client with anything critical, treat all requests you receive as hostile and require clients to authenticate before making requests to your server.
(I'm assuming you mean a chrome extension running on your site)
I'm making a mobile app using the PhoneGap framework, which is to say that the entire app is written in HTML, CSS, and JavaScript.
Part of the app requires me to fetch some information from a remote database.
I've spent the last hour reading up on how to make an XMLHttpRequest() to a remote domain, and I can't figure it out for the life of me.
As a bonus, since the goal of the request is to retrieve some database content, I need to send 3 parameters to the server for querying with.
I keep seeing things about the same-origin policy, but I can't find anything clearly saying whether it would apply to a phonegap app which has no actual host. I've also seen about 6 fairly overcomplicated workarounds. Before I go to the trouble of implementing one of those, I'd like to confirm that there isn't nowadays some simple way of doing this. Can anyone show an example, if so?
The same origin policy does not apply when you are running your XHR from the file:// protocol of the mobile device. Here is a small example I used to show how to make a XHR request to twitter.
http://simonmacdonald.blogspot.ca/2011/12/on-third-day-of-phonegapping-getting.html