Related
I needed to submit an approved-account access to Unsplash API, so as to access certain links for access approval. Given that the replies from the support team has taken more than a few days, I would just like to seek out additional help to resolve in retrieving the access_token for new requests-submissions via GET / POST methods.
The original website was working perfectly, till when I had wanted to get ready for submission for production stage and had wanted to prepare potential increases in requests to the Unsplash API.
However, the approval process entailed certain setup criterial, which I totally missed during my development phase and sought to iron out as soon as possible. One of the key component is to resolve your UTM links, which you may find here as the ideal reference: https://help.unsplash.com/en/articles/2511315-guideline-attribution.
My challenge then was that I had attempted to use the official javascript API, Unsplash-Javascript-API (https://github.com/unsplash/unsplash-js#authorization), in an effort to make the authentication / request processes simpler for my webapp to call.
Though most GET requests do work, given that a specific URL of links via "download_location" (https://help.unsplash.com/en/articles/2511258-guideline-triggering-a-download), has to be used instead, it will then require an authenticated request per new submission request by the webapp.
The final challenge then is that apparently it is not clear how the official Unsplash-Javascript-API actually pulls the "authenticated" request, as I was unable to find it on the website, so that I may retrieve the current-access_token for requests' usage.
The basic codes I am using via the API is the following, however I am confused what is the actual maximum request I may pull per page, I am hoping to get 100 returned images' details, but only gotten a maximum of 30 per time. Anyone can also help to confirm is there a workaround to increase this 30 to 100?
Retrieving a Collection of Photos
unsplash.collections.getCollectionPhotos(urlAPI, 1, 100, "Popular")
.then(toJson)
.then(jsonData => {
console.log("jsonData", jsonData);
});
So, currently my website is unable to launch for nearly 1 week plus, as I am just awaiting the final confirmation or additional help from the customer support end of the official Unsplash Team.
Hopeful that someone may help to assist me in clarifying the codes so that at least I can get one step closer to sorting this "official authenticated" process out, and take away one lesser step to getting my approval access for production ready.
Thank you in advance!
Given multiple tries. I wasn't able to retrieve the Access_Token reply, given that there is a pre-authorization step that I wasn't able to find any working solution to.
The current and clear limitations to the API are:
Maximum of 30 images request per GET request.
The official javascript API, Unsplash-Javascript-API (https://github.com/unsplash/unsplash-js#authorization) works but there is not clear or easy way to retrieve the "Access_Token" for a session usage.
Multiple async AXIOS / FETCH requests may not be "compiled successfully" when using ReactJS ContextProvider function prior to the first render. Therefore, an empty array will be shown instead on the final initial render.
Ultimately, my chosen solution is current to break down the images list to the most priority, with the limitation of only 30 images on retrieval, and still store into the original collection and retrieve it.
The other alternative is to actually download and load the images to your own server to load it, which may also be a faster route.
Sadly enough, the Unsplash API team doesn't response as frequently to assistance and my last contact was roughly 1 month ago, though I have attempted to update to their requirements but there were no feedback thereafter.
Thus, it will tentatively be better for you to just build an alternative solution than to rely on the team for a feedback, unless you are a paying client.
Good luck to the others on this! Cheers!
I'm using Puppeteer for Web Scraping and I have just noticed that sometimes, the website I'm trying to scrape asks for a captcha due to the amount of visits I'm doing from my computer. The captcha form looks like this one:
So, I would need help about how to handle this. I have been thinking about sending the captcha form to the client-side since I use Express and EJS in order to send the values to my index website, but I don't know if Puppeteer can send something like that.
Any ideas?
This is a reCAPTCHA (version 2, check out demos here), which is shown to you as the owner of the page does not want you to automatically crawl the page.
Your options are the following:
Option 1: Stop crawling or try to use an official API
As the owner of the page does not want you to crawl that page, you could simply respect that decision and stop crawling. Maybe there is a documented API that you can use.
Option 2: Automate/Outsource the captcha solving
There is an entire industry which has people (often in developing countries) filling out captchas for other people's bots. I will not link to any particular site, but you can check out the other answer from Md. Abu Taher for more information on the topic or search for captcha solver.
Option 3: Solve the captcha yourself
For this, let me explain how reCAPTCHA works and what happens when you visit a page using it.
How reCAPTCHA (v2) works
Each page has an ID, which you can check by looking at the source code, example:
<div class="g-recaptcha form-field" data-sitekey="ID_OF_THE_WEBSITE_LONG_RANDOM_STRING"></div>
When the reCAPTCHA code is loaded it will add a response textarea to the form with no value. It will look like this:
<textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="... display: none;"></textarea>
After you solved the challenge, reCAPTCHA will add a very long string to this text field (which can then later be checked by the server/reCAPTCHA service in the backend) when the form is submitted.
How to solve the captcha yourself
By copying the value of the textarea field you can transfer the "solved challenge" from one browser to another (this is also what the solving services to for you). The full process looks like this:
Detect if the page uses reCAPTCHA (e.g. check for .g-recaptcha) in the "crawling" browser
Open a second browser in non-headless mode with the same URL
Solve the captcha yourself
Read the value from: document.querySelector('#g-recaptcha-response').value
Put that value into the first browser: document.querySelector('#g-recaptcha-response').value = '...'
Submit the form
Further information/reading
There is not much public information from Google how exactly reCAPTCHA works as this is a cat-and-mouse game between bot creators and Google detection algorithms, but there are some resources online with more information:
Official docs from Google: Obviously, they just explain the basics and not how it works "in the back"
InsideReCaptcha: This is a project from 2014 which tries to "reverse-engineer" reCAPTCHA. Although this is quite old, there is still a lot of useful information on the page.
Another question on stackoverflow: This question contains some useful information about reCAPTCHA, but also many speculative (and very likely) outdated approaches on how to fool a reCAPTCHA.
You should use combination of following:
Use an API if the target website provides that. It's the most legal way.
Increase wait time between scraping request, do not send mass request to the server.
Change/rotate IP frequently.
Change user agent, browser viewport size and fingerprint.
Use third party solutions for captcha.
Resolve the captcha by yourself, check the answer by Thomas Dondorf. Basically you need to wait for the captcha to appear on another browser, solve it from there. Third party solutions does this for you.
Disclaimer: Do not use anti-captcha plugins/services to misuse resources. Resources are expensive.
Basically the idea is to use anti-captcha services like (2captcha) to deal with persisting recaptcha.
You can use this plugin called puppeteer-extra-plugin-recaptcha by berstend.
// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
const puppeteer = require('puppeteer-extra')
// add recaptcha plugin and provide it your 2captcha token
// 2captcha is the builtin solution provider but others work as well.
const RecaptchaPlugin = require('puppeteer-extra-plugin-recaptcha')
puppeteer.use(
RecaptchaPlugin({
provider: { id: '2captcha', token: 'XXXXXXX' },
visualFeedback: true // colorize reCAPTCHAs (violet = detected, green = solved)
})
)
Afterwards you can run the browser as usual. It will pick up any captcha on the page and attempt to resolve it. You have to find the submit button which varies from site to site if it exists.
// puppeteer usage as normal
puppeteer.launch({ headless: true }).then(async browser => {
const page = await browser.newPage()
await page.goto('https://www.google.com/recaptcha/api2/demo')
// That's it, a single line of code to solve reCAPTCHAs 🎉
await page.solveRecaptchas()
await Promise.all([
page.waitForNavigation(),
page.click(`#recaptcha-demo-submit`)
])
await page.screenshot({ path: 'response.png', fullPage: true })
await browser.close()
})
PS:
There are other plugins, even I made a very simple one because captcha is getting harder to solve even for a human like me. You can read the code here.
I am strongly not affiliated with 2Captcha or any other third party services mentioned above.
I had created my own solution which is similar to the other answer by Thomas Dondorf, but gave up soon since Captcha is getting more ridiculous and I do not have mental energy to resolve them.
Proxy servers can be used so that the destination site does not detect a load of responses from a single IP address.
(Translated into Google Translate)
I tried #Thomas Dondorf suggestion, but I think the problem with the steps described in "How to solve the captcha yourself" section is that the token of the CAPTCHA it's valid only one time.
I'll try to explain everything in detail below.
WHAT I'M USING
I'm using as first browser (the one that will not solve the captcha) Google Chrome, and as a second browser (the one where i solve the captcha and i take the token) Firefox.
STEPS
I manually solve the captcha on this site https://recaptcha-demo.appspot.com/recaptcha-v2-checkbox.php
I type the following code document.querySelector('#g-recaptcha-response').value in the google chrome console, but I get an error (VM22:1 Uncaught TypeError: Cannot read property 'value' of null
at :1:48), so I just search the token by opening Elements in Google Chrome and searching g-recaptcha-response with CTRL+F
I copy the token of the recaptcha (here is an image to show where the token is, after the text highlighted in green)
I type the following code document.querySelector('#g-recaptcha-response').value = '...'in the firefox console, replacing the "..." with the recaptcha token just copied
I get the following error and, if you then click on the documentation linked, you'll read that the error is due to the fact that a token can be used only one time, and it has of course already been used for the CAPTCHA you just solved to obtain the token itself (so it seems that the only objective of the token it's to say that the CAPTCHA has already been solved, it seems a sort of defense measurement to prevent replay attacks, as said here in the official documentation of the recaptcha.
I am building a simple support chat for my website using Ajax. I would like to check if the user that I am currently chatting with left the browser.
At the moment I have build in that feature by setting interval function at customer side that creates the file with name: userId.txt
In the admin area I have created an interval function that checks if userId.txt exists. If it exists, it deletes it. If the file is not recreated by the custom interval function - next time the admin function will find out that file is not there it mark customer with this userId as inactive.
Abstract representation:
customer -> interval Ajax function -> php [if no file - create a new file]
admin -> interval Ajax function -> php [if file exists - delete the file] -> return state to Ajax function and do something
I was wondering if there is any better way to implement this feature that you can think of?
My solution is to use the jquery ready and beforeunload methods to trigger an ajax post request that will notify when the user arrives and leaves.
This solution is "light" because it only logs twice per user.
support.html
<!DOCTYPE html>
<html>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.3/jquery.min.js"></script>
<script>
//log user that just arrived - Page loaded
$(document).ready(function() {
$.ajax({
type: 'POST',
url: 'log.php',
async:false,
data: {userlog:"userid arrived"}
});
});
//log user that is about to leave - window/tab will be closed.
$(window).bind('beforeunload', function(){
$.ajax({
type: 'POST',
url: 'log.php',
async:false,
data: {userlog:"userid left"}
});
});
</script>
</head>
<body>
<h2>Your support html code...</h2>
</body>
</html>
log.php
<?php
//code this script in a way that you get notified in real time
//in this case, I just log to a txt file
$userLog = $_POST['userlog'];
file_put_contents("userlog.txt", $userLog."\n", FILE_APPEND );
//userid arrived
//userid left
Notes:
1 - Tested on Chrome, FF and Opera. I don't have a mac so I couldn't test it on Safari but it should work too.
2 - I've tried the unload method but it wasn't as reliable as beforeunload.
3 - Setting async to false on the ajax request means that the statement you are calling has to complete before the next statement, this ensures that you'll get notified before the window/tab is closed.
#Gonzalon makes a good point but using a normal DB table or the filesystem for constantly updating user movement would be exhaustive to most hard disks. This would be a good reason for using shared memory functions in PHP.
You have to differentiate a bit between the original question "How do i check in real-time, if a user is logged in?" and "How can i make sure, if a user is still on the other side (in my chat)?".
For a "login system" i would suggest to work with PHP sessions.
For the "is user still there" question, i would suggest to update one field of the active session named LAST_ACTIVITY. It is necessary to write a timestamp with the last contact with the client into a store (database) and test whether that is older than X seconds.
I'm suggesting sessions, because you have not mentioned them in your question and it looks like you are creating the userID.txt file manually on each Ajax request, right? Thats not needed, unless working cookie and session-less is a development requirement.
Now, for the PHP sessions i would simply change the session handler (backend) to whatever scales for you and what makes requesting information easy.
By default PHP uses the session temp folder to create session files,
but you might change it, so that the underlying session handler becomes a mariadb database or memcache or rediska.
When the users sessions are stored into a database you can query them: "How many users are now logged in?", "Who is where?".
The answer for "How can I check in real time if a user is logged in?" is, when the user session is created and the user is successfully authenticated.
For real-time chat application there are a lot of technologies out there, from "php comet", "html5 eventsource" + "websockets" / "long polling" to "message queues", like RabbitMq/ActiveMq with publish/subscribe to specific channels.
If this is a simple or restricted environment, maybe a VPS, then you can still stick to your solution of intervalic Ajax requests. Each request might then update $_SESSION['LAST_ACTIVITY'] with a server-side timestamp. Referencing: https://stackoverflow.com/a/1270960/1163786
A modification to this idea would be to stop doing Ajax requests, when the mouse movement stops. If the user doesn't move the mouse on your page for say 10 minutes, you would stop updating the LAST_ACTIVITY timestamp. This would fix the problem of showing users who are idle as being online.
Another modification is to reduce the size of the "iam still here" REQUEST to the server by using small GET or HEADER requests. A short HEADER "ping" is often enough, instead of sending long messages or JSON via POST.
You might find a complete "How to create an Ajax Web Chat with PHP, jQuery" over here. They use a timeout of 15 seconds for the chat.
Part 1 http://tutorialzine.com/2010/10/ajax-web-chat-php-mysql/
Part 2 http://tutorialzine.com/2010/10/ajax-web-chat-css-jquery/
You can do it this way, but it'll be slow, inefficient, and probably highly insecure. Using a database would be a noticeable improvement, but even that wouldn't be particularly scalable, depending on how "real-time" you want this to be and how many conversations you want it to be able to handle simultaneously.
You'd be much better off using a NoSQL solution such as Redis for any actions that you'll need to run frequently (ie: "is user online" checks, storing short-term conversation updates, and checking for conversation updates at short intervals).
Then you'd use the database for more long-term tasks like storing user information and saving active conversations at regular intervals (maybe once per minute, for example).
Why Ajax and not Websockets? Surely a websocket would give you a considerably faster chat system, wouldn't require generating and checking a text file, would not involve a database lookup and you can tell instantly if the connection is dropped.
I would install the https://github.com/nrk/predis library. So at the time the user authenticates, It publishes a message to Redis server.
Then you can set-up a little node server on the back-end - something simple like:
var server = require('http').Server();
var io = require('socket.io')(server);
var Redis = require('ioredis');
var redis = new Redis();
var authenticatedUsers = [];
// Subscribe to the authenticatedUsers channel in Redis
redis.subscribe('authenticatedUsers');
// Logic for what to do when a message is received from Redis
redis.on('message', function(channel, message) {
authenticatedUsers.push(message);
io.emit('userAuthenticated', message);
});
// What happens when a client connects
io.on('connection', function(socket) {
console.log('connection', socket.id);
socket.on('disconnect', function(a) {
console.log('user disconnected', a);
});
});
server.listen(3000);
Far from complete, but something to get you started.
Alternatively, take a look at Firebase. https://www.firebase.com/ if you dont want to bother with the server-side
I would suggest using in built HTML5 session storage for this purpose. This is supported by all modern browsers so we will not face issues for the same.
This will help us to be efficient and quick to recognize if user is online. Whenever user moves mouse or presses keys update session storage with date and time. Check it periodically to see if it is empty or null and decide user left the site.
Depending on your resources you may opt for websockets or the previous method called long pool request. Both ensure a bidirectional communication between the server and the client. But they may be expensive on resources.
Here is an good tutorial on the websocket:
http://www.binarytides.com/websockets-php-tutorial/
I would use a callback that you (admin) can trigger. I use this technique in web app and mobile apps to (All this is set on the user side from the server):
Send a message to user (like: "behave or I ban you").
Update user status/location. (for events to know when attendants is arriving)
Terminate user connections (e.g. force log out if maintenance).
Set user report time (e.g. how often should the user report back)
The callback for the web app is usually in JavaScript, and you define when and how you want the user to call home. Think of it as a service channel.
Instead of creating and deleting files you can do the same thing with cookie benefits of using cookie are
You do not need to hit ajax request to create a file on server as cookies are accessible by javascript/jquery.
Cookies have an option to set the time interval so would automatically delete themselves after a time, so you will not need php script to delete that.
Cookies are accessible by php, so when ever you need to check if user is still active or not, you can simply check if the cookie exist
If it were aspnet I would say signalR... but for php perhaps you could look into Rachet it might help with a lot of what you are trying to accomplish as the messages could be pushed to the clients instead of client polling.
Imo, there is no need for setting up solutions with bidirectional communications. You only want to know if a user is still logged in or attached to the system. If I understand you right, you only need a communication from server to client. So you can try SSE (server sent events) for that. The link gives you an idea, how to implement this with PHP.
The idea is simple. The server knows if user is attached or not. He could send something like "hey, user xyz is still logged in" or "hey, user xzy seems not to be logged in any more" and the client only listens to that messages and can react to the messages (e.g. via JavaScript).
The advantage is: SSE is really good for realtime applications, because the server only has to send data and the client has only to listen, see also the specification for this.
If you really need bidirectional communications or can't go with the two dependencies mentioned in the specs, it's not the best decision to use SSE, of course.
Here is a late Update with a nice chat example (written in Java). Probably it's also good to get an idea how to implement this in PHP.
I wrote an Android app that should 'connect' to a (private) forum using HTTP GET (and sometimes POST) requests. The basic idea is as such:
Login page where users submit their credentials. Login is performed by doing a HTTP POST (tried GET too, same result) to the Login page of the forum, with their username and password as the parameters. The request should return some cookies that I store in a BasicCookieStore.
Every page of the forum they want to visit is retrieved using HTTP GET. I parse the HTML source that I obtain and show them only the relevant info. In order to authenticate the users, the same BasicCookieStore that I used for login (step 1) is set as the cookiestore for the HttpClient.
This method has been working all the time during my testing, and has worked for my beta testers too. Now that I released the app, it became apparent that many users were having issues, especially on mobile connections (Wifi seems to be no problem).
By logging the HTML source that was returned in all the HTTP GET requests, I have a strong suspicion that the actual login works fine, but somehow the cookies don't get returned or stored or something in that direction. The problem is that the HTML source of the first page they will receive should be the list of forums. In the case of users with problems however, they get served a page that basically reads "You must enable Javascript to view this page".
The strange thing is, I don't receive that page when testing, nor do many of my users. Even worse: some users are now reporting it worked fine for them for days or weeks, and has now stopped working. Others have the exact opposite: not working for days, suddenly working now. One user has reported he was in Greece for 2 weeks, where it worked flawlessly, then he got back to Germany, and it stopped working again.
There seems to be a random component at play here.
I have tried various things, mostly with the way I do the HTTP GET requests. I started out using the normal DefaultHttpClient, with various settings, such as this:
HttpClient httpClient = new DefaultHttpClient();
// Define parameters
HttpParams httpParams = httpClient.getParams();
HttpConnectionParams.setConnectionTimeout(httpParams, TIMEOUT);
HttpConnectionParams.setSoTimeout(httpParams, TIMEOUT);
HttpProtocolParams.setVersion(httpParams, HttpVersion.HTTP_1_1);
// Set cookiestore (getCookieStore returns the same cookiestore)
HttpContext localContext = new BasicHttpContext();
localContext.setAttribute(ClientContext.COOKIE_STORE, getCookieStore());
HttpGet http = new HttpGet(url);
http.addHeader("Accept", ACCEPT_STRING);
http.addHeader("Content-Type", "application/x-www-form-urlencoded; charset=utf-8");
// Execute
HttpResponse response = httpClient.execute(http, localContext);
//... Process result (omitted)
Now I have switched to using AndroidHttpClient instead, with the rest of the code basically unchanged, and seem to get the same result.
I have also tried using the AsyncHttpClient library, which works quite differently, but once again the same result. I tried using its PersistentCookieStore as well, and you guessed it - same result.
I am clueless at this point. Am I looking in the wrong direction? The fact that a website would respond with "you need to enable Javascript" for some users but not for all seems to indicate an issue with cookies. I don't know how a website determines if javascript is enabled, but surely with a HTTP GET request there is no javascript at play. So why do I (and many other users) get to the page without any problems, while others get the 'no javascript' message? The only reason I can think of is cookies, but I have no clue what the problem exactly is.
Any help would be much appreciated!
I doubt the problem is cookies. More likely is a network configuration problem.
For example, your user might have connected to a wifi hotspot with a captive portal page (which uses javascript to make you sign in before you can use the hotspot). In this case they should first open the browser, try to browse to (e.g.) http://google.com, get redirected, sign in, and then launch your app.
Or, your user might be connecting through a proxy. Many mobile carriers around the world will proxy their users' HTTP connections, sometimes doing horrible things to the content. Switching to HTTPS might help with that.
I'm making a simple application where users can rate items.
I want to make the application very easy to use and would like to avoid a login, even if it means less accurate ratings.
I found this article on recognizing a user based on browser characteristics:
http://www.mediapost.com/publications/?fa=Articles.showArticle&art_aid=128563
How can I implement something like that in JS/Node.js?
Rather than doing a lot of trickery based on browser characteristics which may or may not be available, you could just use a cookie. Browsers may change/upgrade over time. You won't be able to avoid a browser change causing a new user in either case. But, a cookie will be maintained over browser upgrades. Just set the cookie to some (semi)unique value (such as time including milliseconds + IP address) and you'll be all set. At the point that you have so many users that the (semi)unique values have issues, you'll be rearchitecting your site anyway (and probably have a team of people working for you).
If for some reason you want to avoid cookies, you could use PHP to get the client's IP address:
<?php
echo ' Client IP: ';
if ( isset($_SERVER["REMOTE_ADDR"]) ) {
echo '' . $_SERVER["REMOTE_ADDR"] . ' ';
} else if ( isset($_SERVER["HTTP_X_FORWARDED_FOR"]) ) {
echo '' . $_SERVER["HTTP_X_FORWARDED_FOR"] . ' ';
} else if ( isset($_SERVER["HTTP_CLIENT_IP"]) ) {
echo '' . $_SERVER["HTTP_CLIENT_IP"] . ' ';
}
?>
You could add a function that asks for a user name if the ip address isn't on file, and associate the new IP with old user names, etc. Cookies work much better, of course :)
Another option, easier than cookies would be localStorage:
Give the client a UUID:
localStorage.setItem('user',UUID);
Get client's UUID:
localStorage.getItem('user');
This is a bit better than using cookies, for example in Firefox (as per MDC):
DOM Storage can be cleared via "Tools -> Clear Recent History ->
Cookies" when Time range is "Everything" (via
nsICookieManager::removeAll)
But not when another time range is specified: (bug 527667)
Does not show up in Tools -> Options -> Privacy -> Remove individual
cookies (bug 506692)
DOM Storage is not cleared via Tools -> Options -> Advanced ->
Network -> Offline data -> Clear Now.
Doesn't show up in the "Tools -> Options -> Advanced -> Network ->
Offline data" list, unless the site also uses the offline cache. If
the site does appear in that list, its DOM storage data is removed
along with the offline cache when clicking the Remove button.
but it only works with HTML 5.
I agree with evan it is much easier to do it using cookies.
if you would like to write something like that you would need to get data from the server and from a browser like (ip,browser,flash,java,cookies...): weight this data , create rules of changes like browser upgrades flash upgrades which would increase or decrease the weights, than create neuron neural network , gather loads of training data and teach your network. (You could take other approach not using Neural networks)
This is a nice project but it seems to be like using a Tank or a Battleship to kill a mouse
I think that the difference between using simple cookies and this browser characteristics gathering would be around 10% so go for cookies.
You can take a look here:
http://www.w3schools.com/js/js_browser.asp
But i strongly recommend using cookies for this purpose.
Also keep in mind that cookies may be modified by the user.
If you can - just use something like a PHP $_SESSION
I would look for particular object detection in js instead of browser sniffing... check this link out