Using javascript or flash, is it possible to log into websites and than pull content from that page?
I am trying to make a phone app that displays content from a site that requires log in.
To the extent that this is possible, it's almost certainly not a good idea. Your phone app then has brittle dependencies on the structure of the site from which you're pulling data, and you have no way except notification of failure to detect when those dependencies change.
Does the site from which you want to pull information provide an API? That would be a better solution.
Theoretically, yes, though depending on the browser, it may not allow cross-site scripting. The only way I can think of on a mobile site to do this is either with an ajax call to submit login credentials and grab data, or with an iframe. iframe would probably work better since ajax calls would probably not set the right cookies for session id maintenance. I dont know how elegant this solution would be.
Related
I've managed to scrape websites that require no login using js only and a little help from websites that allow me to pass the CORS issues(like allorigins), but I just couldn't manage to get pass through the login problem.
I've seen many posts discussing of doing it using node.js and python beautifulsoup, but none on how to do it with javascript.
So how do I go about it?
Is it even possible doing it purely on client-side?
I'm willing to do all the learning and searching needed, but I need some direction in this vast subject.
Assuming you meant using in-browser JavaScript, how did you get around CORS? And if you did, then once the page refreshed after a successful login you code would stop running anyway unless you were a browser extension.
If you mean on your computer, then Node is what you're looking for, but unless you use a project like Headless Chrome then you'll run into the issue of saving the cookies between requests which is what keeps track of your session and actually keeps you logged in.
Login requires a direct interaction with your browser, like saving a cookie, returning a security token etc.
If you use JavaScript from a html page, it would theoretically require to visit the other page, at least inside a iFrame. There is a limit of how much you can do with javascript inside a iFrame.
With other words you try to imitate something like Selenium. Give it a try. It works with Java. You can control you browser, telling what to do, like a real user, and fetch the results, make even screenshots.
When I am using iframes or frames (older sites), as a extra security precaution I use the JavaScript function:
<SCRIPT LANGUAGE="JavaScript1.1">
if (top == self) self.location.href = "../index.cfm";
</SCRIPT>
then another hidden check to see if the page is being called correctly....
<cfif (HTTP_REFERER DOES NOT CONTAIN "referer_page.cfm")
<cfabort>
</cfif>
It works great to keep visitors (hackers?) from opening and/or trying to post to the page.
The problem is that the JavaScript displays in source code and the less they know...
I know the JS is client side but is there anyway to create the function in the server side CF or otherwise hide from prying eyes?
I a running cf9 on my and most of my client sites.
Thank in advance
No, it is not possible for any server side language to tell if the client that requested a page intends on displaying it inside of a frame. The only way to tell that is to ask the browser once your page reaches it.
What's the concern with the Javascript being visible?
There is literally nothing you can do to permanently avoid clients from seeing your source HTML and/or Javascript. Any attempt at security on the client side is in the end futile. You will keep out casual (i.e. non-web developer or programmer) users, but that is all. Anyone with a rudimentary knowledge of HTML and access to Google (or Alta Vista or ask jeeves for that matter) will be able to circumvent your barriers.
The use of HTTP_REFERER is suspect here as well (I know I know... I'm a negative Nellie :). That CGI var is dependent on the browser and web server working together. It will not be reliable overall because it is dependent on the client side. Someone up to no good will have no problem circumventing your barrier by simply constructing requests with the appropriate referrer.
If you want server side security then you are forced to use some form of authentication and session. This is a growing field what with oAuth and the use of Google, FB, Twitter etc as federated authenticaiton services. But plain old usernames and passwords tied to login sessions works too :)
To be clear, #Luke is saying that some users properly using your site, viewing iframe content, may have problems if they have a security setting, like an anonymity program, blocking their data, like cgi variables.
The only real solution is proper authentication and filtering on every page. If a list shows content for a user and loads details into an iframe, the iframe's page must also check that the user has access. At that point, it doesn't matter if they can get at the url.
For instance, if you get a list of user images like this.
<cfquery name="getImageList">
select imageid,imagefilename_mini
from images
where userid = <cfqueryparam value="#session.userid#">
</cfquery>
Which loads an iframe to load full sized images, you still need the and subclause
<cfquery name="getThisImage">
select imagefilename from images
where imageID = ...
and userID = ...
</cfquery>
That way, even if someone changes the image id in the url, it still only lets them see content bound to the userID.
Also, modern browsers make altering the source of a live page all too easy. I don't mean that browsers can alter the server side file, I mean that contents of the DOM as delivered to the browser. It's an incredibly useful tool for developing and debugging, but it does make mischievous/malevolent activity easier.
In chrome and firefox, you can inspect an element, change the attributes and the page will change before your eyes. Here, that works for iframe src's, so it still is within the DOM it expects to be in.
You should regard client side UI as how you'd like the page to be presented, and the way it works best but use server side safeguarding (proper validation) because it's too easy to get around client-controlled data/elements.
I have a web site with following functionality: An user comes to www.mysite.com/page.php. Javascript on that page makes ajax API call to www.mysite.com/api.php and shows results on the same page www.mysite.com/page.php
I'm afraid of situation where somebody starts to use my api.php on own software, because using www.mysite.com/api.php costs me a bit money. Therefore I want that only users that have visited the page www.mysite.com/page.php can get valid results from www.mysite.com/api.php . There won't be any way for users to log in to my web site.
What would be the right way to do this? I guess I could start a session when an user comes to page.php and then somehow maybe first check on api.php that a session with valid session id exists?
If you just want the user to visit page.php before using api.php, the session is the way to go.
Typically, if you want a "soft" protection you use the POST verb to get results from your site. Then, if the user goes the the URL in their browser and just types the api.php call they will not get a result. This doesn't protect your site but it keeps search engines away from that url reasonably well and accidental browsing to it.
Otherwise, there are lots of authentication plugins for php.
http://www.homeandlearn.co.uk/php/php14p1.html for example.
You can check the request in several ways such as Token validation, Session validation or even by Server 'HTTP_REFERER' variable
Check the referrer with $_SERVER['HTTP_REFERER'] if its outside the domain block it.
Beware that people can alter their REFERER so its not secure.
Another better solution might be a CAPTCHA like this one from google https://www.google.com/recaptcha/intro/index.html
Cookies, HTTP-Referer, additional POST-Data or some form data, that you send in an hidden input field aren't secure enough to be sure, that the user comes from your site.
Everything of it can be easily changed by user, by modifying the http-headerdata (or if you use cookies, by changing the cookie-file on the client machine).
I would prefer the PHP-Session combined with an good protection against bots (ex. a Honeypot), because it's not so easy to hi-jack, if you use them properly.
Please note: If there is a bot especially for your site, you lost anyway. So there isn't a 100% protection.
I have a question. I know facebook accommodates for it, but if I am creating an app using javascript and want to use an api, whats the best way to deal with the redirect_uri?
I thought of using an iframe, passing the data back into the parent and then closing the iframe but I'm having problems with that. Obviously the whole cross domain is a thing that'll probably render this useless but even before that when I run it as a normal site (with the iframe on the same domain) I get this odd occurance when using the instagram api. (I'm presuming its generic)
if I'm logged in to instagram online it runs and gets the data but doesn't seem to send the data to the parent
if I'm not logged in it won't do anything, the php redirect to the instagram login page doesn't go there at all.
I'm far more interested in a possible solution without using iframes though.
Thoughts?
ok... im looking to have a good round of brainstorming here...
say i was google... the adword/adsense/analytics division. i would be getting a little worried about the future, when users start to disable cookies (or at least delete them on a regular basis), use private browsing, roam on multiple devices. how could google alternatively track users without the benefits of cookies?
some ideas to get started (please elaborate on these and any others):
-track users using some other persistent local/client side storage
-use user-agent string fingerprinting
-test cache response - if user 304's an image, they were here
-track mac address
-any random/out of the box ideas?
Take a look at http://samy.pl/evercookie/, it's a JS API for ultra-persistent cookies, but you can take idea(s) from it's mechanism to find storage for your data.
I think you could do it using custom urls. You would basically ecrypt a cookie and attach it as part of the URL you send to the browser. When it returns, your web server would be smart enough to decode it and track whoever sent it.
I believe the Spring framework can do this in fact.
If your site requires user tracking, then I would have it fail to work if cookies are disabled. Then focus your time and effort on making it a fantastic site for the vast majority of your visitors, and don't worry about the ones who, for whatever reason, have made the explicit decision to disable cookies.
(Made this a CW answer because this is a subjective question that's likely to be closed.)
Information about browser/system/display through js and IP of cause;
Java Applet provide a lot of info about user;
Flash also (e.g. installed fonts);
Modern browsers also provide a lot of information about users (e.g. installed extensions) and provide new ways to save information on client-side (e.g. html5 storage).
altogether: http://panopticlick.eff.org/
you can always resort back to good ol way, the HIT COUNTER.
on page, use tag and link to external image on your server
on your server, when image is fetched, redirect it to php script through .htaccess and record header info about device id etc. {similar code as disabling the hotlinking of image}
Now you have all info, use php_session() to keep a track of it
you can always use js for the same purpose, but using tag will ensure that js is not required and the script will run on all browsers