I have a django webapp that displays stock data.
How it works:
I make requests to a stock data API and get data every 15 mins when the US stock market is open. This is a background periodic task using Celery.
I have a database where I update this data as soon as I get it from the API.
Then I send the updated data from the database to a view, where I can visualize it in a HTML table.
Using jQuery I refresh the table every 5mins to give it a feel of "real time", although it is not.
My goal is to get the HTML table updated (or item by item) as soon as the database gets updated too, making it 100% real time.
The website will have registered users (up to 2500-5000 users) that will be visualizing this data at the same time.
I have googled around and didn't find much info. There's django channels (websockets), but all the tutorials I've seen are focused on building real time chats. I'm not sure how efficient websockets are since I have zero experience with them.
The website is hosted on Heroku Hobby for what its worth.
My goal is to make a fully real time webapp and make is as much efficient as possible.
Related
I am fairly new to javascript, I do know basics. I am looking to build my own (from scratch) java script library just like google analytics.js that will track user behavior on websites. Basically I'm looking to collect data like
Click through data
Dwell time
Page hits etc..
I spent lot of time trying to find website/tutorials to get me started on this but I keep ending up on google analytics.js or some private tools.
What I am looking for :
Is there any good starting point/resource/website which can help me build this js library
Are there reference for archetecture of end to end system including back-end?
Any open-source library that I can directly use?
Some things I already looked into
Chaoming build your own analytics tool
Splunk BYO analytics
At it's most basic, the architecture of such an application would only require a client, server, and database.
You can use basic javascript functions to record specific user actions on the frontend and then push them to your server. To identify your users you can set a cookie with a unique id. Then, everytime you send data to your server, you will get the specific user request as well so you can keep track of their actions. (Be careful of privacy laws first though).
For page hits, simply send a response to the server everytime someone opens your site - so call this function as soon as your Javascript loads. On the server, send a request to increment the appropriate value in your database.
For user dwell time, write a function that records the date when the user first hits your site and then count how long they stay there. Push your data to the server every so often and save updates to the user record by adding the new time spent to the current time spent. You could also watch for when a user is about to exit out of the site and then send the data all at once that way - although this method is more fragile.
For clicks and hovers, set up onclick and mouseover event handlers on your links or whatever elements you want to track. Then push the url of the link they clicked or whatever data you want - like "Clicked navbar after 200 seconds on site and after hovering over logo`.
If you want suggestions on specific technologies, then I suggest Node.js for your server side code and MongoDB for your database. There are many tutorials out there on how to use these technologies together. Look up javascript events for a list of the different things you can watch for on the frontend.
These are the building blocks you need. Now you just have to work on defining the data you want and using these technologies to get it.
I've built an API that delivers live data all at once when a user submits a search for content. I'd like to take this API to the next level by delivering the API content to a user as the content is received instead of waiting for all of the data to be received before displaying.
How does one go about this?
Easiest way to do in Django is using Django Endless Pagination
I think the better way to apply it is setting limit in your query. For example, If you have 1000 of records in your database, then retrieving all data at once takes time. So, if a user search a word 'apple', you initially send the database request with limit 10. And, you can set pagination or scroll feature at your front-end. If the user click next page or scroll your page, you can again send the database request with another limit 10 so that the database read action will not take more time to read the limited data.
From your explanation
We're pulling our data from multiple sources with each user search.
Being directly connected to the scrapers for those sources, we display
the content as each scraper completes content retrieval. I was
originally looking to mimic this in the API, which is obviously quite
different from traditional pagination - hope this clarifies.
So you in your API, you want to
take query from user
initiate live scrapers
get back the data to the user when scrapers finish the job !
(correct me if im wrong)
My Answer
This might feel little complicated, but this is the best one I can think of.
1. When user submits the query:
1. Initiate the live scrapers in to celery queue (take care of the priority).
2. Once the queue is finished, get back to the user with the information you have via sockets(this is how facebook or any website sends users notifications`. But in your case you will send the results html data in the socket.
3. Since you will have the data already, moved into the db as you scraped, you will can paginate it like normal db.
But this approach gives you a lag of a few seconds or a minute to reply back to the user, meanwhile you keep theuser busy with something on the UI front.
I am trying to write a web application that displays to users a hierchical tree. Users can add,delete, and update the tree but the tree should look the same for all users. My first thought to save the state of the tree (i'm using JSON to represent the data in the tree) in a database but, what happens if there are a million/billion/etc number of people using the application? How do you make sure that all users are physically seeing the same thing if additions/updates/deletes could be going on simultaneously?
Something like signalr would would help:
http://signalr.net/
What can you do with ASP.NET SignalR? SignalR can be used to add any
sort of "real-time" web functionality to your ASP.NET application.
While chat is often used as an example, you can do a whole lot more.
Any time a user refreshes a web page to see new data, or the page
implements Ajax long polling to retrieve new data, is candidate for
using SignalR.
It also enables completely new types of applications, that require
high frequency updates from the server, e.g. real-time gaming.
I'm trying to build a single page web app using Backbone. the app looks and behaves like a mobile app running on a tablet.
The web app is built to help event organizers manage their lists of people attending their events, and this includes the ability to search and filter those lists of attendees.
I load all attendees list when the user opens the attendees screen. and whenever the user starts to search or filter the attendees, the operation happens on the client side.
This way always works perfectly when the event has about ~400 attendees or less, but when the number of attendees gets bigger than that (~1000), the initial download time takes longer (makes sense) .. but after all data is loaded, searching and filtering is still fast relatively.
I originally decided to go with the option of fully loading all the data each time the app is loaded; to do all search operations on the client side and save my servers the headache and make search results show up faster to the user.
I don't know if this is the best way to build a web/mobile app that processes a lot data or not.
I wish there's a known pattern for dealing with these kinds of apps.
In my opinion your approach to process the data on the client side makes sense.
But what do you mean with "fully loading all the data each time the app is loaded"?
You could load the data only once at the beginning and then work with this data throughout the app lifecycle without reloading this data every time.
What you also could do is store the data which you have initially fetched to HTML5 localstorage. Then you only have to refetch the data from the server if something changed. This should reduce your startup time.
Sorry for the somewhat confusing title. Not sure really how to title this. My situation is this- I have an academic simulation tool, that I in the process of developing a web front-end for. While the C++-based simulator is computationally quite efficient (several hundredths to a tenth of a second runtime) for small systems, it can generate a significant (in web app terms) amount of data (~4-6mb).
Currently the setup is as follows-
User accesses index.html file. This page on the left side has an interactive form where the user can input simulation parameters. On the right side is a representation of the system they are creating, along with some greyed out tabs for various plots of the simulation data.
User clicks "Run simulation." This submits the requested sim parameters to a runSimulation.php file via an AJAX call. runSimulation.php creates an input file based on the submitted data, then runs the simulator using this input file. The simulator spits out 4-6mb of data in various output files.
Once the simulation is done running, the response to the browser is another javascript function which calls a file returnData.php. This php script packages the data in the output files as JSON data, returns the JSON data to the browser, then deletes the data files.
This response data is then fed to a few plotting objects in the browser's javascript, and the plot tabs become active. The user can then open and interact with the plotted data.
This setup is working OK, however I am running into two issues:
The return data is slow- 4-6mb of data coming back can take a while to load. (That data is being gzipped, which reduces its side considerably, but it still can take 20+ seconds on a slower connection)
The next goal is to allow the user to plot multiple simulation runs so that they can compare the results.
My thought is that I might want to keep the data files on the server, while the users session is active. This would enable the ability to only load up the data for the plot the user wants to view (and perhaps loading other data in the background as they view the results of the current plot). For the multiple runs, I can have multiple data sets sitting on the server, ready for the user to download if/when they are needed.
However, I have a big issue with this line of thinking- how do I recognize (in php) that the user has left the server, and delete the data? I don't want the users to take over the drive space on the machine. Any thoughts on best practices for this kind of web app?
For problem #1, you don't really have any options. You are already Gzip'ing the data, and using JSON, which is a relatively lightweight format. 4~6 MB of data is indeed a lot. BTW if you think PHP is taking too long to generate the data, you can use your C++ program to generate the data and serve it using PHP. You can use exec() to do that.
However, I am not sure how your simulations work, but Javascript is a Turing-complete language, so you could possibly generate some/most/all of this data on the client side (whatever makes more sense). In this case, you would save lots of bandwidth and decrease loading times significantly - but mind that JS can be really slow.
For problem #2, if you leave data on the server you'll need to keep track of active sessions (ie: when was the last time the user interacted with the server), and set a timeout that makes sense for your application. After the timeout, you can delete the data.
To keep track of interaction, you can use JS to check if a user is active (by sending heartbeats or something like that).