I wrote an Android app that should 'connect' to a (private) forum using HTTP GET (and sometimes POST) requests. The basic idea is as such:
Login page where users submit their credentials. Login is performed by doing a HTTP POST (tried GET too, same result) to the Login page of the forum, with their username and password as the parameters. The request should return some cookies that I store in a BasicCookieStore.
Every page of the forum they want to visit is retrieved using HTTP GET. I parse the HTML source that I obtain and show them only the relevant info. In order to authenticate the users, the same BasicCookieStore that I used for login (step 1) is set as the cookiestore for the HttpClient.
This method has been working all the time during my testing, and has worked for my beta testers too. Now that I released the app, it became apparent that many users were having issues, especially on mobile connections (Wifi seems to be no problem).
By logging the HTML source that was returned in all the HTTP GET requests, I have a strong suspicion that the actual login works fine, but somehow the cookies don't get returned or stored or something in that direction. The problem is that the HTML source of the first page they will receive should be the list of forums. In the case of users with problems however, they get served a page that basically reads "You must enable Javascript to view this page".
The strange thing is, I don't receive that page when testing, nor do many of my users. Even worse: some users are now reporting it worked fine for them for days or weeks, and has now stopped working. Others have the exact opposite: not working for days, suddenly working now. One user has reported he was in Greece for 2 weeks, where it worked flawlessly, then he got back to Germany, and it stopped working again.
There seems to be a random component at play here.
I have tried various things, mostly with the way I do the HTTP GET requests. I started out using the normal DefaultHttpClient, with various settings, such as this:
HttpClient httpClient = new DefaultHttpClient();
// Define parameters
HttpParams httpParams = httpClient.getParams();
HttpConnectionParams.setConnectionTimeout(httpParams, TIMEOUT);
HttpConnectionParams.setSoTimeout(httpParams, TIMEOUT);
HttpProtocolParams.setVersion(httpParams, HttpVersion.HTTP_1_1);
// Set cookiestore (getCookieStore returns the same cookiestore)
HttpContext localContext = new BasicHttpContext();
localContext.setAttribute(ClientContext.COOKIE_STORE, getCookieStore());
HttpGet http = new HttpGet(url);
http.addHeader("Accept", ACCEPT_STRING);
http.addHeader("Content-Type", "application/x-www-form-urlencoded; charset=utf-8");
// Execute
HttpResponse response = httpClient.execute(http, localContext);
//... Process result (omitted)
Now I have switched to using AndroidHttpClient instead, with the rest of the code basically unchanged, and seem to get the same result.
I have also tried using the AsyncHttpClient library, which works quite differently, but once again the same result. I tried using its PersistentCookieStore as well, and you guessed it - same result.
I am clueless at this point. Am I looking in the wrong direction? The fact that a website would respond with "you need to enable Javascript" for some users but not for all seems to indicate an issue with cookies. I don't know how a website determines if javascript is enabled, but surely with a HTTP GET request there is no javascript at play. So why do I (and many other users) get to the page without any problems, while others get the 'no javascript' message? The only reason I can think of is cookies, but I have no clue what the problem exactly is.
Any help would be much appreciated!
I doubt the problem is cookies. More likely is a network configuration problem.
For example, your user might have connected to a wifi hotspot with a captive portal page (which uses javascript to make you sign in before you can use the hotspot). In this case they should first open the browser, try to browse to (e.g.) http://google.com, get redirected, sign in, and then launch your app.
Or, your user might be connecting through a proxy. Many mobile carriers around the world will proxy their users' HTTP connections, sometimes doing horrible things to the content. Switching to HTTPS might help with that.
Related
I'm having an issue getting Login Kit to work. Similar to the question asked here I have the correct redirect domain listed in tiktok settings and the redirect_uri is basically just "domain/tiktok" but no matter what I do I get the same error message:
Below is my backend code - it's basically exactly the same as what is listed in the tiktok docs. Any help on this would be much appreciated!
const CLIENT_KEY = 'my_key'
const DOMAIN = 'dev.mydomain.com'
const csrfState = Math.random().toString(36).substring(2);
res.cookie('csrfState', csrfState, { maxAge: 60000 });
const redirect = encodeURIComponent(`https://${DOMAIN}/tiktok`)
let url = 'https://www.tiktok.com/auth/authorize/';
url += '?client_key=' + CLIENT_KEY;
url += '&scope=user.info.basic,video.list';
url += '&response_type=code';
url += '&redirect_uri=' + redirect;
url += '&state=' + csrfState;
res.redirect(url);
UPDATE 8/13/2022
I submitted the app for review and was approved so the status is now "Live in production" instead of "staging". The issue is still there - still showing error message no matter what domain / callback URL I use
UPDATE 8/16/2022
OK so I've made some progress on this.
First off - I was able to get the authentication/login screen to finally show up. I realized to do this you need to:
Make sure that the status of your app is "Live in production" and not "Staging". Even though when you create a new app you may see client_key and client_secret show up don't let that fool you - Login Kit WILL NOT WORK unless your app is submitted and approved
The redirect_uri you include in your server flow must match EXACTLY to whatever value you entered in "Registered domains" in the Settings page. So if you entered "dev.mydomain.com" in Settings then redirect_uri can only be "dev.mydomain.com" not "dev.mydomain.com/tiktok".
I think I might know what the issue is. My guess is that before - on the Settings page you had to enter the FULL redirect URL (not just the domain) and whatever redirect uri was included in the authorization query was checked against this value which was saved in TikTok's database (whatever was entered in the Settings page when path/protocol were allowed). At some point recently, the front-end business logic was changed such that you could only enter a domain (e.g., mydomain.com) on the Settings page without any protocols - however TikTok's backend logic was never updated so during the Login flow they are still checking against an EXACT match for whatever was saved in their DB as the redirect uri - this would explain why an app that was previously using the API with a redirect uri that DOES include protocols (e.g., for Later.com their redirect uri is https://app.later.com/users/auth/tiktok/callback) continues to work and why for any app attempting to save redirect WITH protocols are getting the error message screen. My gut feeling is telling me that the error is not on my part and this is actually a bug on TikTok's API - my guess is it can be addressed either by changing the front-end on the Settings page to allow for path/protocols (I think this is the ideal approach) or to change their backend so that any redirect uri is checked such that it must include 1 of the listed redirect domains.
I've been emailing with the TikTok team - their email is tiktokplatform#tiktok.com - and proposed the two solutions I mentioned above. I suggest if you're having the same issue you email them as well and maybe even link this StackOverflow question so that maybe it will get higher priority if enough people message them about it.
If you're looking for a shot-term hack I'd recommend creating a dedicated app on AWS or Heroku with a clean domain (e.g., https://mydomain-tiktok.herokuapp.com) and then redirect to either your dev or production environment by appending a prefix to the "state" query (e.g., "dev_[STATE_ID]"). I'll just reiterate I consider this a very "hacky" approach handling callbacks and would definitely not want to use something like this in production.
In my case, the integration worked after doing following steps:
In TikTok developers page:
Like #eugene-blinn said: make sure your app is in Live in production status (I couldn't find anything in the documentation about why Staging apps don't work);
Add the Login Kit product to your app and set the Redirect domain field with your host domain, for example: mywebsite.com.
In your code:
From my tests, I could add whanever url path I wanted, the only constraint was that the domain should match with step 2. So, yes, you can add https://mywebsite.com/whatever/path/you/want in redirect_url parameter.
That's it. It should work with these 3 steps.
Additionally, I got other issue related to use specific features in the scope property (like upload or read videos, etc), so here the solution as well:
Only add Video Kit product to the TikTok app and set video.upload or video.list in the scope authorize request won't work unless you also add the TikTok API product in your TikTok app as well. Btw, it neeeds to be approved too.
TikTok fixed the bug that resulted in URL mismatch with redirect domain from working. However, they fixed it only for paths (e.g., /auth/tiktok) but PORT additions still result in an error - so www.domain.com:8080/auth/tiktok won't work but www.domain.com/auth/tiktok WILL work
UPDATE 10/3/2022
Got the following response directly from TikTok engineering team:
At this point, we only support production integrations with TikTok for Developers and require that you have a URL without port number. However, we understand from your communication that this makes it harder for you to build, test, and iterate your integration with us. Unfortunately, at this time, we do not have a timeline for when this additional support for development servers will be added. We request that you only redirect to URLs without port numbers. Thank you for the feedback.
The frontend of the developer's dashboard still rejects protocol and path in validation. However, the backend skips the path validation.
To be able to update the "Redirect domain" simply:
Open dev tools in chrome and go to the "Network" tab.
Clic on "Save changes" button on the dashboard.
Right clic on the "publish" request that appeared and copy as cURL.
Modify the "redirect_domains" field in the request before pasting it in the terminal.
I believe the app still needs to be approved and in production to get it to work. I'm still waiting for approval and it has been a couple of weeks.
UPDATE 9/17/2022
Just like #mauricio-ribeiro, my app worked after it was approved to production. Setting up the redirect domain without path and scheme works just fine.
I had the same problem, my solution:
1.- In my TikTok App dashboard, the “redirect_uri” is: mydomain.com, without http/https and without path (/my-redirect-url). Also you can add subdomains using this rule
2.- In my code, I have to add http or https to the redirect_uri, and feel free to use path (/my-redirect-uri)
I hope this help you
I've read quite a lot of documentation about Webpush, and so far I've understood that push subscription should have a read-only propery expirationTime. Also, I understand how should I react if the browser decides that subscription is outdated (handle event in service worker, etc.). But is it possible to somehow set expiration date manually, without implementing complex client side logic? I guess that this is an ordinary problem for apps that have authentification.
My problem is that if user gets logged out automatically, webpush endpoint stays valid. I know multiple ways this can be solved with workarounds, but I guess that's not the optimal way for a relatively basic problem.
It's been a long time ago that I've fixed this, but I guess sharing my solution can be helpful.
The solution was to make a HTTP request from service worker to the app using fetch('/path') , because all cookies from the app are also attached to requests made from SW.
So, if user is not logged in, you are redirected to login page.
My code:
fetch('/path', {method: 'GET', redirect:'error'}).then(function(result) {
... //some code specific for my app
}).catch(function(e) {
registration.unregister(); //error on redirect to login
});
I am trying to set up authentication and http sender script for Open API scanning project.
At some point in time I have reached a state where authentication script for oauth2 is working correctly (produces valid token obtained from remote endpoint) and http_sender appends authorization header for requests. It turned out later that I mistyped required header so I changed it's name, saved the script and rerun the script (via scan). Turned out now both headers are appended to outgoing http requests: mistyped version and correct one. Behavior does not change after restarting ZAP and reloading session, but mistyped header disappears when I create a new session, I can't find where can I clean it up and it does not look right to recreate session when one minor change is needed in the script.
The second issue that I have is that authentication script just stopped working without any modifications to it. I switch between environments occasionally but code remains the same. I even moved hardcoded values from context to script and it still does not work. I have set up a parallel script in python to fetch the token and it works (all parameters being the same), but in ZAP I get authentication failure (recreating session does not help). I don't own the oauth endpoint so it's not possible for me to take a look at it directly, but I suspect that both problems have something in common. Looks like some data is residing in the shadows and affects how the scripts are run.
First version of sender script:
function sendingRequest(msg, initiator, helper) {
var loginToken = org.zaproxy.zap.extension.script.ScriptVars.getGlobalVar("logintoken");
msg.getRequestHeader().setHeader("Autentication", "Bearer " + loginToken);
}
Second version of sender script:
function sendingRequest(msg, initiator, helper) {
var loginToken = org.zaproxy.zap.extension.script.ScriptVars.getGlobalVar("logintoken");
msg.getRequestHeader().setHeader("Authorization", "Bearer " + loginToken);
}```
Authentication function is just tuned version of the zap template to send oauth2 parameters in the body of the POST request and actually worked for some time. It would really help to have some troubleshooting capabilities during scripting.
Re the first issue, it all depends on how the script is being used. The ZAP session is a record of all of the requests and responses. For some of the old responses you used the wrong header. That hapenned, you cant take it back. If you reuse those requests then ZAP will send the wrong header unless you remove it. If you create new requests that are not based on historic ones then the header should not be present.
Re the second issue, authentication is hard and can fail for what seems like minor differences :( One good option would be to proxy your python script through ZAP. Hopefully it will still work, and then you can compare the working request with the failing one.
I'm using Puppeteer for Web Scraping and I have just noticed that sometimes, the website I'm trying to scrape asks for a captcha due to the amount of visits I'm doing from my computer. The captcha form looks like this one:
So, I would need help about how to handle this. I have been thinking about sending the captcha form to the client-side since I use Express and EJS in order to send the values to my index website, but I don't know if Puppeteer can send something like that.
Any ideas?
This is a reCAPTCHA (version 2, check out demos here), which is shown to you as the owner of the page does not want you to automatically crawl the page.
Your options are the following:
Option 1: Stop crawling or try to use an official API
As the owner of the page does not want you to crawl that page, you could simply respect that decision and stop crawling. Maybe there is a documented API that you can use.
Option 2: Automate/Outsource the captcha solving
There is an entire industry which has people (often in developing countries) filling out captchas for other people's bots. I will not link to any particular site, but you can check out the other answer from Md. Abu Taher for more information on the topic or search for captcha solver.
Option 3: Solve the captcha yourself
For this, let me explain how reCAPTCHA works and what happens when you visit a page using it.
How reCAPTCHA (v2) works
Each page has an ID, which you can check by looking at the source code, example:
<div class="g-recaptcha form-field" data-sitekey="ID_OF_THE_WEBSITE_LONG_RANDOM_STRING"></div>
When the reCAPTCHA code is loaded it will add a response textarea to the form with no value. It will look like this:
<textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="... display: none;"></textarea>
After you solved the challenge, reCAPTCHA will add a very long string to this text field (which can then later be checked by the server/reCAPTCHA service in the backend) when the form is submitted.
How to solve the captcha yourself
By copying the value of the textarea field you can transfer the "solved challenge" from one browser to another (this is also what the solving services to for you). The full process looks like this:
Detect if the page uses reCAPTCHA (e.g. check for .g-recaptcha) in the "crawling" browser
Open a second browser in non-headless mode with the same URL
Solve the captcha yourself
Read the value from: document.querySelector('#g-recaptcha-response').value
Put that value into the first browser: document.querySelector('#g-recaptcha-response').value = '...'
Submit the form
Further information/reading
There is not much public information from Google how exactly reCAPTCHA works as this is a cat-and-mouse game between bot creators and Google detection algorithms, but there are some resources online with more information:
Official docs from Google: Obviously, they just explain the basics and not how it works "in the back"
InsideReCaptcha: This is a project from 2014 which tries to "reverse-engineer" reCAPTCHA. Although this is quite old, there is still a lot of useful information on the page.
Another question on stackoverflow: This question contains some useful information about reCAPTCHA, but also many speculative (and very likely) outdated approaches on how to fool a reCAPTCHA.
You should use combination of following:
Use an API if the target website provides that. It's the most legal way.
Increase wait time between scraping request, do not send mass request to the server.
Change/rotate IP frequently.
Change user agent, browser viewport size and fingerprint.
Use third party solutions for captcha.
Resolve the captcha by yourself, check the answer by Thomas Dondorf. Basically you need to wait for the captcha to appear on another browser, solve it from there. Third party solutions does this for you.
Disclaimer: Do not use anti-captcha plugins/services to misuse resources. Resources are expensive.
Basically the idea is to use anti-captcha services like (2captcha) to deal with persisting recaptcha.
You can use this plugin called puppeteer-extra-plugin-recaptcha by berstend.
// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
const puppeteer = require('puppeteer-extra')
// add recaptcha plugin and provide it your 2captcha token
// 2captcha is the builtin solution provider but others work as well.
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
puppeteer.use(
RecaptchaPlugin({
provider: { id: '2captcha', token: 'XXXXXXX' },
visualFeedback: true // colorize reCAPTCHAs (violet = detected, green = solved)
})
)
Afterwards you can run the browser as usual. It will pick up any captcha on the page and attempt to resolve it. You have to find the submit button which varies from site to site if it exists.
// puppeteer usage as normal
puppeteer.launch({ headless: true }).then(async browser => {
const page = await browser.newPage()
await page.goto('https://www.google.com/recaptcha/api2/demo')
// That's it, a single line of code to solve reCAPTCHAs 🎉
await page.solveRecaptchas()
await Promise.all([
page.waitForNavigation(),
page.click(`#recaptcha-demo-submit`)
])
await page.screenshot({ path: 'response.png', fullPage: true })
await browser.close()
})
PS:
There are other plugins, even I made a very simple one because captcha is getting harder to solve even for a human like me. You can read the code here.
I am strongly not affiliated with 2Captcha or any other third party services mentioned above.
I had created my own solution which is similar to the other answer by Thomas Dondorf, but gave up soon since Captcha is getting more ridiculous and I do not have mental energy to resolve them.
Proxy servers can be used so that the destination site does not detect a load of responses from a single IP address.
(Translated into Google Translate)
I tried #Thomas Dondorf suggestion, but I think the problem with the steps described in "How to solve the captcha yourself" section is that the token of the CAPTCHA it's valid only one time.
I'll try to explain everything in detail below.
WHAT I'M USING
I'm using as first browser (the one that will not solve the captcha) Google Chrome, and as a second browser (the one where i solve the captcha and i take the token) Firefox.
STEPS
I manually solve the captcha on this site https://recaptcha-demo.appspot.com/recaptcha-v2-checkbox.php
I type the following code document.querySelector('#g-recaptcha-response').value in the google chrome console, but I get an error (VM22:1 Uncaught TypeError: Cannot read property 'value' of null
at :1:48), so I just search the token by opening Elements in Google Chrome and searching g-recaptcha-response with CTRL+F
I copy the token of the recaptcha (here is an image to show where the token is, after the text highlighted in green)
I type the following code document.querySelector('#g-recaptcha-response').value = '...'in the firefox console, replacing the "..." with the recaptcha token just copied
I get the following error and, if you then click on the documentation linked, you'll read that the error is due to the fact that a token can be used only one time, and it has of course already been used for the CAPTCHA you just solved to obtain the token itself (so it seems that the only objective of the token it's to say that the CAPTCHA has already been solved, it seems a sort of defense measurement to prevent replay attacks, as said here in the official documentation of the recaptcha.
My users, when on a SPA page, are getting logged out after a couple of hours. Though, if they use the older postback forms, they never time out. So you have context, I have included enough code to provide context for the description of the issue on the bottom.
Web.config for authentication
<authentication mode="Forms">
<forms loginUrl="~/Account/Login" timeout="480" slidingExpiration="true" defaultUrl="~" ticketCompatibilityMode="Framework40"/>
</authentication>
My api controller
namespace my.Controllers
{
public class ApiMotionController : ApiController
{
[Authorize(Roles = "Mover"]
public IQueryable<Motions> Get()
JavaScript code
(function () {
'use strict';
angular.module('app')
.controller('MotionManager', ['$scope', '$http', buildMotionManager]);
function buildMotionManager($scope, $http) {
/*Static Members*/
$scope._whoami = 'MotionManager'; //Used for troubleshooting controller
/*Initialization Code*/
getMotions($scope, $http)();
/*Scope methods*/
$scope.refreshMotionsList = getMotions($scope, $http);
$scope.addMotion = addMotion($scope, $http);
$scope.playMotion = playMotion($scope, $http);
}
function getMotions($scope, $http){
return function(){
$http.get('/api/getMotions')
.succeed(function(data){
$scope.motionList = data;
})
.error(function(data){
console.log('FAIL', data);
});
};
}
function addMotion($scope, $http){
//stub. Code not shown here.
};
function playMotion($scope, $http){
//stub. Code not shown here.
};
})();
There my be typos in the above code, since I retyped it from my original while sanitizing.
The code does work as expected, but the problem is that after hours of working, suddenly all web API calls are failing with a 401 error. That is, they are all acting like the user is now de-authenticated.
As above, I cannot duplicate this issue when I am using web forms, or even MVC forms, and re-posting whole pages. It is only when I am using SPA style coding. I haven't tried other SPA frameworks, since I have 6 months of angular directed code in this project, switching isn't an option.
I have considered putting an iframe, with a timer to fire off in the background against a form object, just to trick the browser into generating a proper form postback. I want to avoid doing that, because it seems to hacky.
The only other key issue I have found is that I am seeing a bunch of schannel errors being logged into my application event log on the IIS server. They are all 10,10 which isn't well documented. The 10 series is well documented outside of 10,10. But none of those suggestions seem to work, or are even relevant.
Server is IIS 7.5 and I have tried this on IIS 8.
Application Log Errors:
A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal error code is 10. The Windows SChannel error state is 10.
Error State: 10, Alert Description: 10
A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal error code is 40. The Windows SChannel error state is 1205.
An TLS 1.2 connection request was received from a remote client application, but none of the cipher suites supported by the client application are supported by the server. The SSL connection request has failed.
Discovery
Error Code 40 means that there is a handshake issue. Since State Management is custom for my platform, I decided to change it to inproc. So far, I have seen the error log reduce in new error frequency, but disappear. However, I am still testing for the 401 issue.
Post discovery follow up
Had the certs re-issued, and the schannel errors cleared, but the problem remained.
I had started exploring the header information with a fine tooth comb, even if it means that I had to add custom header information to accompany my server calls.
I have now included in all $http calls withCredentials: true, which has brought my failure rate down to around 15%. that means that the failures are down to once or twice a day.
I started watching my 'auth' cookie on the client, and something confusing happens occasionally. The cookie will change without prompt, then it has changed back. Almost like the session is bouncing from current, to a new one, then back to current. So I have killed my cleanup process on the session table on the server, and see what I am getting there.
I had also been checking the system logs for exceptions, or SQL timeouts, and nothing.
Started to convert all controllers to MVC controllers, but I have hit conversion problems after conversion problems, including the use of jSON serializer. I still don't understand the decision to stick with the MS serializer when the JSON.NET one work so much better.
Current Status
The last change I made was to add filters.Add(new AuthorizeAttribute()); to my FilterConfig.RegisterGlobalFilters function.
Everything is still failing. After investigating the IIS logs I am still seeing everything getting de-authenticated.
FF on Windows - Fail
Chrome on Windows - Fail
Chrome on Droid - Fail
Safari on iPad - Fail
IE on Windows - Fail
12/10 Discovery
I have found the real problem. The authentication in MVC controllers are just not compatible with the web API controllers. So when I authenticate with the MVC controller, the web API controllers basically ignore it, and eventually time out on the authentication.
Latest Discovery
Apparently when the asp.net worker process shut down, and restarted, it would get a false flag that the database schema didn't exists. So I removed the check, and all reads and writes started working fine. It is interesting that the api controller would forge a new cookie when the mvc controller would fail the authentication. It was like it was creating a new provider instance. However, I couldn't find a 2nd instance, so I have to assume the existing provider was being duplicated.
Fix that is being tested
Now that I have removed the DB test, I am now testing the issue in long run tests. Each long run is longer than the worker process stays alive, but shorter than the session timeout.
Cornerstone of finding this bug
Apparently IIS Express was hiding the bug in that it seems to act without an external worker process. So I moved the test environment to my local IIS server.
It looks like there are several issues that were causing my problem, each one broken down here:
IIS Express wasn't closing sessions the same way that full IIS would.
So I moved the application to my local IIS, and added logging to everything.
ASP.NET worker process would launch new provider instance every time the API Controllers were called.
This would cause a new schema check per call. MVC controllers would only cause this check once per initial launch.
Since my provider is marries to my application schema, I just disabled the schema check.
Angular must be told to marshal the cookies.
So I added: cfg: { withCredentials: true, responseType: "json" }
the response type was to cover the occasional issue where I would see 'text/text'. Now I always see 'application/json'. This seems to be a browser issue, mostly with IE.
I also had to add config.MapHttpAttributeRoutes(); to my register method of my WebApiConfig class.
Using all of this, I was able to discover that the core of the problem was that every api call was causing my security provider to re-test the schema, which my MVC controllers are set to suppress that test after first load. The test always fails, because I had to expand a couple of tables, but I didn't need the models changed.
Resolution: I removed the test from the provider. Since the provider is strongly tied into the rest of the application, it didn't seem logical to keep treating it as a typical ASP.NET Membership provider. And that was the top feature that I didn't need.
Second benefit, I gained back a little bit of performance.