jQuery post big text data transfer (eventual load)

jQuery post big text data transfer (eventual load) - javascript

The problem:
I have a jquery ajax (post) based website, where the website doesn't refresh every time user navigates to another page. Which means I have to pull data with ajax and present it to the user. Now for pulling small text data, this system works great. However once the text data is huge (let's say over 200,000 words), the load time is quite high (especially for mobile users). What I mean to say is, ajax tries to load full text information and displays it after it is done loading all text. So the user has to wait quite a bit to get the information.
If you look at a different scenario, let's say wikipedia. There are big pages in wikipedia. However, a user doesn't feel he/she has to wait a lot because the page loads step by step (top to bottom). So even if the page is large, the user is already kept busy with some information. And while the user is processing those, rest of the page keeps loading.
Question:
So is it possible to display, via ajax, information on real time load? Meaning keep showing whatever is loaded and not wait for the full document to be loaded?

Ajax (xmlhttprequest) is a really great feature in html5, for the same thing, ajax is better than socket, by that, I mean non-persistant connection but as soon as the connection is persistant (impossible for xmlhttprequest)socket is fastest.
The simplest way is to use web socket is socket.io but you need a JavaScript server to use this library and there is one host where you can get one for free with admin tools: heroku.
You can use PHP server if you dont want to use JavaScript server with the socketme library but it is a bit more complex.
Also, you can think diferently, you try to send a lot of data.
200 000 words is something like 70ko (I try a lorem ipsum), the upload speed is relative to data and connection speed/ping. You can compress by any way your data before sending and uncompress server-side. There is probably thousand way to do this but I think the simpliest way is to find a JavaScript library to compress/uncompress data and simply use your jquery ajax to send what's compressed.
EDIT — 21/03/14:
I misunderstood the question, you want to display the current upload progress ?
Yes, it is possible by using the onprogress event, in jQuery you must follow this simple exemple: jQuery ajax progress

Related

CheerioCrawler: How do I ensure a page is fully loaded before extracting the data?

I have spend the last two days perfecting a list of urls from a site that I want to crawl. My script (which is basically identical to the example for CheerioCrawler except for the data extraction) is working, but there is a problem. Some of the documents that are saved locally in the data store are incomplete. From the example script, this would be, for instance, that the title field is blank in some of the saved data. In others it's all saved. The only field to get saved every time is url: request.url.
My best guess is that the domain I'm crawling is very slow, with multiple scripts loaded from other domains, and Cheerio is just blasting through and not waiting for the whole page to be fully loaded before it extracts whatever data it can find, and moving on.
The total number of pages to crawl is about 2500, so I don't mind if the process is slow, but I'd like to make sure it's complete.
How can I ensure the page is fully loaded before it's extracted? I thought that the async function would do that automatically.

The potential problem is that the webpage loads some content using asynchronous XHR calls made with JavaScript. With the CheerioScraper you will get data from the first request on that site. If you want to load asynchronous content, you need to use the browser to open the page.
You can do it merely with using PuppeteerCrawler. It has quite a similar interface as CheerioCrawler. It opens webpage for each request. You can use there various waitFor functions from puppeteer page interface to wait for the content you want to get.

CheerioCrawler uses the Cheerio library, which is a simple HTML parser. It cannot execute JavaScript, download additional assets or make AJAX requests to fetch additional data.
If you're seeing incomplete results, it means that the page you're trying to scrape loads data dynamically and therefore the data are not available in the initial HTML that Cheerio parses. Sadly, this is a limitation of the technology. To render pages and wait for them to load, you can use a browser that will do the heavy lifting for you. See PuppeteerCrawler.

Simplest way to display API results

Im kind of new to this and looking to expand pulling API results and displaying them on page, whether it's from a blog resource or content generation.
For example, I want to pull from VirusTotal's API to display returned content. What is the best way to capture that in an input tag and display it in a DIV. And what if it were an option to pull from different API's based on drop down selection?
An example of the API to pull content would be here https://developers.virustotal.com/reference#api-responses under the /file/report section.

To call the data from the API, you need to send a request. However, there is a problem with CORS. Basically, you can't call the website from inside your web browser from a page on your local machine, because your browser blocks the request. The web browser will only allow calls to and from the same server, except for a few exceptions.
There's two ways to approach this.
The simplest one is to make a program that calls the API and outputs an HTML file. You can then open that HTML file to read the contents. If you want to update the info, you would need to run that program once again manually. You could easily do this building off the python they provided.
The other, little bit more complex way, is where you host a server on your PC. When you go to the webpage on that server, it sends a request to the website, and then provides the latest information. There's tons of frameworks and ways to do this. For an absolute beginner on this subject, ExpressJS is a good start. You can make a hello world program, and once you do that you can figure out how to call the API whenever a page is loaded, and display the results.

Background job on heroku how does the web know it's finished

So, I'm creating this application that sometime it require pulling the feed and it's always timeout on heroku because of the xml parser takes time. So, I change to be asynchronous load via Ajax every time the page is loaded. I still get H12 error from my Ajax call. Now I'm thinking of using Resque to run the job in background. I can do that no problem but how would I know that the job is finished so I can pull the processed feed on to the html page via AJAX?
Not sure if my question is clear, so how would the web layer knows that the job is done and it should signal e.g (onComplete in javascript) to populate the content on the page?

There are a number of ways to do this
The JavaScript can use AJAX to poll the server asking for the results and the server can respond with 'not yet' or the results. You keep asking until you get the results.
You could take a look at Juggernaut (http://juggernaut.rubyforge.org/) which lets your server push to the client
Web Sockets are the HTML5 way to deal with the problem. There are a few gems around to get you started Best Ruby on Rails WebSocket tool

You have an architecture problem here. The reason for the H12 is so that the user is not sat there for more than 30 seconds.
By moving the long running task into a Resque queue, you are making it disconnected to the front end web process - there is no way that the two can communicate due to process isolation.
Therefore you need to look at what you are doing and how. For instance, if you are pulling a feed, are you able to do this at some point before the user needs to see the output and cache the results in some way - or are you able to take the request for the feed from the user and then email them when you have the data for them to look at etc etc.
The problem you have here is that your users are asking for something which takes longer than a reasonable amount of time to complete, so therefore you need to have a good look at what you are doing and how.

How to performance tune AJAX heavy application

I'm developing kind of image/profile search application, that is based almost exclusively on AJAX. Main page basically displays profile images and allows user to filter/search and paginate through them.
Pagination works when user scrolls, so the interface has to be very very fast. There will be only 6 (maybe 9, but definitely not more) images displayed on the main page, so users will scroll a lot. I'm currently using very simple JS cache to store results of all the requests in case user decides to go back ... in that case, I simply pull everything out of the cache instead of querying the server.
Client cache
One option that I thought of is to pre-load say 10 pages in front and store them in the cache.
But my biggest issue is filtering/searching, since that completely changes the type of query that goes to the server. My filters aren't very complex, only around 6-7 string/number/enum attributes.
Now if I wanted to do all the filtering in the cache, I would have to duplicate all the search logic and fetch all the data from the server (not just the data I'm displaying), so I could filter the results on client side.
Here raises a question, should I make the cache somehow persistent? Store it into a cookie maybe?
Server cache?
One suggestion might be to use memcached on the server and just store everything there. I'm definiely going to cache away all the results I can, but that doesn't save the server from handling loads and loads of AJAX requests.
I'm developing this application on Rails 3, and even though I love it, I wouldn't say it's the fastest thing in the world. Another option that this gives me is to create separate Rack/Sinatra application to handle only the AJAX requests. By this I mean requests from the main query, not all AJAX.
S3 for images?
Big part of this application are images, even though they're mostly small thumbnails (unless user wants to display it bigger).
At the moment, I don't have problems with bandwidth. My VPS host provides me with 200GB, which should be more than enough (I hope). The problem is loading speed. Would it help if I uploaded all the images to S3 and load them from there, or is this worth doing only for larger files? I'm going to load a lot of 100x150px images, which are generally under 50kB.

Have you looked at SlickGrid. It has an interesting idea of only building the list as users scroll down but then removing the list as the users scroll out of that range.

How to update asynchron pages upon event from other client?

I'm currently fooling around with AJAX. Right now, I created a Markdown previewer that updates on change of a textarea. (I guess you know that from somewhere... ;-) ).
Now, I'm trying to figure out, how to update a page upon an event is fired from another client. So to say an asynchron message board. A user writes something, an event is called, the post is written.
But on the other clients' pages, the new post is of course not yet available until they reload and get the updated list of posts from the database.
Now, how can you get this to work asynchronously? So in that moment when one client does something, the other clients all get to know that he did something?
I don't think this can be done completely in AJAX, but I also have no idea whatsoever how to implement this on server-side, as it would require a page reload to inform the other clients of the event.
I'm thinking of creating a file or database entry that hashes the current state of data. Whenever a client loads the page, he saves this hash. Then, a timer (does this exist in JavaScript?) checks for the hash every few seconds.
As soon as anyone changes the databse, the hash is recalculated. If the script sees that the hash was changed and is different to the one saved, it reloads the contents form the database and saves the new hash.
Is that even going to work?

Polling that is light as possible is really the best solution here. Even if you did use a socket or something... That's still basically a live connection waiting around that will likely have to poll itself (albeit in a more effecient way).
20 queries in 10 minutes that have responses like {"updates":false} shouldn't even be putting a dent in your application. I mean imagine someone browsing your site requesting 20 pages and the related images/scripts/etc (even if some caching is involved), there could easily be hundreds of requests requiring all sorts of wasted database queries to information to be displayed on the page they don't actually care about.

You could use polling. For example each client might be sending continuous AJAX requests to the server say each 30 seconds to see if new posts are available and if yes, show them:
setInterval(function() {
// TODO: Send an AJAX request here to the server and fetch new posts.
// if new posts are available update the DOM
}, 30 * 1000);
On the other hand when someone decides to write a new post you send an AJAX (or not AJAX) request to the server to store this post in the database.
Another less commonly used approach is the concept of Comet and the HTML 5 WebSockets implementation which allow the clients to be notified by the server of changes using push.

We Keep Coding

JavaScript is the programming language of the Web.