Webscraping an internal site that requires authentication w/ Javascript alert

Webscraping an internal site that requires authentication w/ Javascript alert - javascript

I've been trying to scrape some raw XML data from an internal company site (url excluded for security purposes). I am currently using selenium and beautifulsoup to do so (but am open to any other options). When accessing the site manually, I am prompted with a javascript browser alert for a username and password (see picture). My attempt to automatically validate credentials is below (does not pass authentication):
def main():
#gets specified list of direct reports
# username:password#
url ="http://{username}:{password}#myURL.com"
driver.get(url)
html = driver.page_source
soup = BeautifulSoup(html, "lxml")
# parsing logic follows ...
However, when the script runs I still have to manually enter the username and password in the browsing window controlled by chromedriver and then the rest of the program runs as expected..
Is there a way avoid this manually entry? I've also tried solutions around driver.alert and sending keys & credentials to the browser to no avail.. (I know this may be difficult because the site is not accessible outside of the network, any insight is appreciated!)
Edit: I should mention this method was working a couple weeks ago, but following a chrome update no longer does..

Your login process is likely returning an access token of some kind, either a value in the response body or a header with a token, possibly an Authorization header or a Set-Cookie header.
In most cases, you will need to send that token with every request, either as an authorization header, a body parameter, or whatever the page expects.
Your job is to find that token by inspecting the response from the server when you authenticate, store it somewhere, and send it back each time you make a page request to the server.
How you send it back is dictated by the requirements of the server in question. It may want a request body param or a header, those are the two most likely cases.

Related

CORS fetch authentication using Browser's session cookie

I have a server that stores session cookies and you can log onto it using a page (foo.com/login.html) that runs in the browser. The browser then stores a session cookie for this domain.
Now I want another page (bar.com) upon initialization to make a GET request using JavaScript to the first page (foo.com/authenticate) which should check if a session cookie exists in the browser and validate it, if correct he should respond with the session's username (however this is retrieved from the cookie). Of course I cannot check in bar.com's JavaScript if there exists a session cookie for foo.com.
Trying to solve this I ran into a few problems, one of which is of course CORS. I managed to avoid this problem by placing a reverse proxy in front of foo.com that adds all required CORS headers to the response. besides adding the headers, the proxy only tunnels requests through (eg. rev-proxy.com/authenticate -> foo.com/authenticate)
Now when I call the handler through the rev proxy from just another browser window directly (eg. rev-proxy.com/authenticate), I get the correct response. The handler from foo.com's backend finds the session cookie, reads out the username and passes it back. BUT when I try to make the same call from JavaScript inside bar.com (fetch("rev-proxy.com/authenticate")), I receive null, meaning he did not find the cookie (note that the request itself has status 200, meaning it did reach the backend of foo.com).
I have the feeling I am missing a crucial point in how cookies are used by browsers but I cannot find any useful information on my specific problem since I believe it is a rather unusual one.

See the MDN documentation:
fetch won’t send cookies, unless you set the credentials init option. (Since Aug 25, 2017. The spec changed the default credentials policy to same-origin. Firefox changed since 61.0b13.)

I need a more detailed understanding of precisely how cookies work

I can build a full stack app using Ruby on Rails, JavaScript, React, HTML and CSS. Yet, I feel I don't understand completely how cookies actually work and what they are precisely. Below I write what I think they are, and ask that someone confirm or correct what is written.
An HTTP request contains an HTTP method, a path, the HTTP protocol version, headers, and a body.
An HTTP response contains the HTTP protocol version, a status code, a status message, headers, and a body.
Both are simply text (which means that they are simply sequences of encoded characters), but when this text is parsed it contains useful structure. Is there one single structure that an HTTP request is usually parsed into (an array, a hash)? What about an HTTP response?
Cookies represent some content associated with a specific header in an HTTP request, specifically the "Cookie" header.
When building an HTTP response, the server sets the 'Set-Cookie' header. This header needs the following information: a name for the cookie, a path, and the actual content of the cookie. The path is a description of the range of URLs for which this cookie should be sent from client to server.
Does the browser keep a list of cookies (ie, a list of elements that are each text of some sort), and it only sends the right ones to the right sites (say a google cookie to google.com)?
Let's say I visit site A and then site B and authenticate on both. Session management just adds a specific element in the cookies (perhaps a hash named Session inside another hash that corresponds to the totality of the cookie stored in Cookie), correct? How do sites alter my cookies? Do they append new information, do they ask my browser to append information?

A cookie is a string (with a specific format) that your browser stores. It can be set by a server when it sends a http-response, by the 'Set-Cookie' header. Each http-request that your browser sends that matches the cookie's path will contain that cookie in the 'Cookie' header.
The server cannot tell the browser to append data to the cookie. It can only get the current cookie value, add to it the new information, and then reset it.

How to find out if user is still logged in using session based authentication?

I know this has been asked countless times, but none of the answers I found described the actual connection to backend.
I have a one-page JS app that communicates with small backend (Django) API. I use session based authentication. User info is cached on first load. If session expires, I need to change page header and flush user info from cache. However, most of my API resources are public and return always 200. Several other resources are private and return 403 if user isn't logged in, which is great as this gives me excatly the information I need. The problem is, some pages access public resources only. In case session is suddenly deleted on backend and user navigates to url that accesses only public resources, user info isn't flushed and I have an UX problem.
My initial idea was to request private user resource (let's call it /users/self/) on every url change which returns 200 in case user is authenticated and 403 in case they aren't. This however requires 1 extra request before every other request for each url change, which isn't really ideal.
Are there any easier techniques I could use in this case? I don't mind even switching to other type of authentication if that would solve the problem.

What i have done and seen for such scenarios is to use some type of http interceptor that intercept all http requests done by Angular and if it finds a response status of 401, such interceptors raise an event using $rootScope.
See one library here https://github.com/witoldsz/angular-http-auth
To use it, one needs to subscribe to the events raise using some type of root controller, which can redirect the user to login page.
See an example here https://medium.com/opinionated-angularjs/7bbf0346acec

Instead of sending a additional auth request, just check in your backend in every request, if the session didnt expire. If the user is not auth, then return a status code.
In angularjs we used a httpResponse interceptor, who intercepts every response and checks against this status code.

Your backend could add a header to the response if the user is still logged in, regardless if the requested resource is public or not. The client can then check the presence of that header and act accordingly.
On both sides this is done with some kind of filter or interceptor. In angular this would be a $http interceptor.

We at work do what others have already told you: use an HttpInterceptor.
We have every response sent from our backend structured in the same way: an object with two fields: a responseCode and the actual response. We vary the responseCode according to what happened in the backend, being success, security alert, or authentication required for that given action the most common cases.
Then the interceptor reacts in the appropriate way according to each responseCode we have defined. In the case of an authentication required, we redirect to the login page, you could do whatever you need. It's working great for us.

Ruby on Rails: Difference of Authenticity Token being in Header or POST

I've just noticed it doesn't matter where I put my Authenticity Token when submitting a request via AJAX. I can either append it to the form as POST data, or put it into the Header.
Is there any difference? Especially regarding security?
Additionally:
I didn't encode the Token in Javascript. Am I exposed to something now?
Thanks in advance.
EDIT:
form.on("sending", function(file, xhr, formData) {
xhr.setRequestHeader('X-CSRF-Token', AUTH_TOKEN);
// formData.append('authenticity_token', AUTH_TOKEN);
});
This is my Javascript adding the token to the Header or (commented out) to the POST data. AUTH_TOKEN is the raw key. I did not encode it in any way.

Part one
There is totally no difference if you pass authenticity token via GET params, POST data or request headers (POST/GET params are virtually the same in Rails).
Let's look at the code (not the best code I've ever seen but...)
def verified_request?
!protect_against_forgery? || request.get? || request.head? ||
form_authenticity_token == params[request_forgery_protection_token] ||
form_authenticity_token == request.headers['X-CSRF-Token']
end
Request if valid if (any of following)
protect_against_forgery? is false
request is GET
request is HEAD
token in params equals one stored in session
token in headers equals one stored in session
I should add that token is generated for every request and stored in session for later inspection (if subsequent request is POST/PUT/PATCH/DELETE)
So as you see both ways of passing authenticity token are valid.
Part two
Is passing raw auth token in AJAX dangerous? No, as much as passing it in a form is totally not dangerous. To explain further I will quote an excellent answer in another SO question
Why this happens: Since the authenticity token is stored in the
session, the client can not know its value. This prevents people from
submitting forms to a rails app without viewing the form within that
app itself. Imagine that you are using service A, you logged into the
service and everything is ok. Now imagine that you went to use service
B, and you saw a picture you like, and pressed on the picture to view
a larger size of it. Now, if some evil code was there at service B, it
might send a request to service A (which you are logged into), and ask
to delete your account, by sending a request to
http://serviceA.com/close_account. This is what is known as CSRF
(Cross Site Request Forgery).
original answer: https://stackoverflow.com/a/1571900/2422778
I still consider this question laziness/lack of patience on your side as all I wrote is very well explained both in Rails Guides and on Stack Overflow. Hope next time you will be more persistent in looking for answers before posting here.
Anyway I am glad I could help.

You can see the difference when you use some tool like https://www.owasp.org/index.php/Category:OWASP_WebScarab_Project or http://www.charlesproxy.com/
That are proxies, which you can turn on locally to fiddle with your HTTP requests and responses.
Very useful for web development.
Good luck.

How can I suppress the browser's authentication dialog?

My web application has a login page that submits authentication credentials via an AJAX call. If the user enters the correct username and password, everything is fine, but if not, the following happens:
The web server determines that although the request included a well-formed Authorization header, the credentials in the header do not successfully authenticate.
The web server returns a 401 status code and includes one or more WWW-Authenticate headers listing the supported authentication types.
The browser detects that the response to my call on the XMLHttpRequest object is a 401 and the response includes WWW-Authenticate headers. It then pops up an authentication dialog asking, again, for the username and password.
This is all fine up until step 3. I don't want the dialog to pop up, I want want to handle the 401 response in my AJAX callback function. (For example, by displaying an error message on the login page.) I want the user to re-enter their username and password, of course, but I want them to see my friendly, reassuring login form, not the browser's ugly, default authentication dialog.
Incidentally, I have no control over the server, so having it return a custom status code (i.e., something other than a 401) is not an option.
Is there any way I can suppress the authentication dialog? In particular, can I suppress the Authentication Required dialog in Firefox 2 or later? Is there any way to suppress the Connect to [host] dialog in IE 6 and later?
Edit
Additional information from the author (Sept. 18):
I should add that the real problem with the browser's authentication dialog popping up is that it give insufficient information to the user.
The user has just entered a username and password via the form on the login page, he believes he has typed them both correctly, and he has clicked the submit button or hit enter. His expectation is that he will be taken to the next page or perhaps told that he has entered his information incorrectly and should try again. However, he is instead presented with an unexpected dialog box.
The dialog makes no acknowledgment of the fact he just did enter a username and password. It does not clearly state that there was a problem and that he should try again. Instead, the dialog box presents the user with cryptic information like "The site says: '[realm]'." Where [realm] is a short realm name that only a programmer could love.
Web broswer designers take note: no one would ask how to suppress the authentication dialog if the dialog itself were simply more user-friendly. The entire reason that I am doing a login form is that our product management team rightly considers the browsers' authentication dialogs to be awful.

I encountered the same issue here, and the backend engineer at my company implemented a behavior that is apparently considered a good practice : when a call to a URL returns a 401, if the client has set the header X-Requested-With: XMLHttpRequest, the server drops the www-authenticate header in its response.
The side effect is that the default authentication popup does not appear.
Make sure that your API call has the X-Requested-With header set to XMLHttpRequest. If so there is nothing to do except changing the server behavior according to this good practice...

The browser pops up a login prompt when both of the following conditions are met:
HTTP status is 401
WWW-Authenticate header is present in the response
If you can control the HTTP response, then you can remove the WWW-Authenticate header from the response, and the browser won't popup the login dialog.
If you can't control the response, you can setup a proxy to filter out the WWW-Authenticate header from the response.
As far as I know (feel free to correct me if I'm wrong), there is no way to prevent the login prompt once the browser receives the WWW-Authenticate header.

I don't think this is possible -- if you use the browser's HTTP client implementation, it will always pop up that dialog. Two hacks come to mind:
Maybe Flash handles this differently (I haven't tried yet), so having a flash movie make the request might help.
You can set up a 'proxie' for the service that you're accessing on your own server, and have it modify the authentication headers a bit, so that the browser doesn't recognise them.

I realize that this question and its answers are very old. But, I ended up here. Perhaps others will as well.
If you have access to the code for the web service that is returning the 401. Simply change the service to return a 403 (Forbidden) in this situation instead 401. The browser will not prompt for credentials in response to a 403. 403 is the correct code for an authenticated user that is not authorized for a specific resource. Which seems to be the situation of the OP.
From the IETF document on 403:
A server that receives valid credentials that are not adequate to
gain access ought to respond with the 403 (Forbidden) status code

In Mozilla you can achieve it with the following script when you create the XMLHttpRequest object:
xmlHttp=new XMLHttpRequest();
xmlHttp.mozBackgroundRequest = true;
xmlHttp.open("GET",URL,true,USERNAME,PASSWORD);
xmlHttp.send(null);
The 2nd line prevents the dialog box....

What server technology do you use and is there a particular product you use for authentication?
Since the browser is only doing its job, I believe you have to change things on the server side to not return a 401 status code. This could be done using custom authentication forms that simply return the form again when the authentication fails.

In Mozilla land, setting the mozBackgroundRequest parameter of XMLHttpRequest (docs) to true suppresses those dialogs and causes the requests to simply fail. However, I don't know how good cross-browser support is (including whether the the quality of the error info on those failed requests is very good across browsers.)

jan.vdbergh has the truth, if you can change the 401 on server side for another status code, the browser won't catch and paint the pop-up.
Another solution could be change the WWW-Authenticate header for another custom header. I dont't believe why the different browser can't support it, in a few versions of Firefox we can do the xhr request with mozBackgroundRequest, but in the other browsers?? here, there is an interesting link with this issue in Chromium.

I have this same issue with MVC 5 and VPN where whenever we are outside the DMZ using the VPN, we find ourselves having to answer this browser message. Using .net I simply handle the routing of the error using
<customErrors defaultRedirect="~/Error" >
<error statusCode="401" redirect="~/Index"/>
</customErrors>
thus far it has worked because the Index action under the home controller validates the user. The view in this action, if logon is unsuccessful, has login controls that I use to log the user in using using LDAP query passed into Directory Services:
DirectoryEntry entry = new DirectoryEntry("LDAP://OurDomain");
DirectorySearcher Dsearch = new DirectorySearcher(entry);
Dsearch.Filter = "(SAMAccountName=" + UserID + ")";
Dsearch.PropertiesToLoad.Add("cn");
While this has worked fine thus far, and I must let you know that I am still testing it and the above code has had no reason to run so it's subject to removal... testing currently includes trying to discover a case where the second set of code is of any more use. Again, this is a work in progress, but since it could be of some assistance or jog your brain for some ideas, I decided to add it now... I will update it with the final results once all testing is done.

I'm using Node, Express & Passport and was struggling with the same issue. I got it to work by explicitly setting the www-authenticate header to an empty string. In my case, it looked like this:
(err, req, res, next) => {
if (err) {
res._headers['www-authenticate'] = ''
return res.json(err)
}
}
I hope that helps someone!

I recently encountered the similar situation while developing a web app for Samsung Tizen Smart TV. It was required to scan the complete local network but few IP addresses were returning "401 Unauthorized" response with "www-authenticate" header attached. It was popping up a browser authentication pop requiring user to enter "Username" & "Password" because of "Basic" authentication type (https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication).
To get rid from this, the simple thing which worked for me is setting credentials: 'omit' for Fetc Api Call (https://developer.mozilla.org/en-US/docs/Web/API/Fetch_API/Using_Fetch). Official documentation says that:
To instead ensure browsers don’t include credentials in the request, use credentials: 'omit'
fetch('https://example.com', {
credentials: 'omit'
})

For those unsing C# here's ActionAttribute that returns 400 instead of 401, and 'swallows' Basic auth dialog.
public class NoBasicAuthDialogAuthorizeAttribute : AuthorizeAttribute
{
protected override void HandleUnauthorizedRequest(AuthorizationContext filterContext)
{
base.HandleUnauthorizedRequest(filterContext);
filterContext.Result = new HttpStatusCodeResult(400);
}
}
use like following:
[NoBasicAuthDialogAuthorize(Roles = "A-Team")]
public ActionResult CarType()
{
// your code goes here
}
Hope this saves you some time.

We Keep Coding

JavaScript is the programming language of the Web.