Update and fetch the latest data from scraped website on Heroku app

Update and fetch the latest data from scraped website on Heroku app - javascript

I have made an API using Express.js and scraped a website using cheeriojs. I have deployed this API using heroku. I want my web application to fetch the latest data from the scraped website but my deployed app is not doing so. It is still showing the old data . How to make it fetch live data continuously

Hi I got stuck in a similar problem.The way out was using cron jobs and the database simultaneously.
So I configured cron jobs to visit my website twice a day to activate the server(You don't need this if you have a good number of active users).
So when the server restarts, my app checks if the data stored in my db(i.e. the last data that was scraped from the source when the server was active previously) is the same as the one that is currently present on the target website(from where I scrape data). If it is true then do nothing, else{
update the db with the latest data and display the same on your website
}
The drawbacks with this approach are:
1.It updates data only twice a day
2.If your site has many active users throughout the day, their constant visits won't let your app's server to get idle and hence at the time that you configure for the cron job to visit your site,there are chances that your server may already be online at that moment and it may not update the data.
However , for less active users this way works perfectly fine.
You can configure cron jobs from here: https://cron-job.org/en/members/jobs/

Related

Why does Node reset the values in NEDB databases?

I have built an app that connects to an api. For this discussion, let's just call it api X. Api X provides data that is updated on a monthly basis, but the exact date when new data is published varies. My NodeJS application is basically built to check api X twice per day (this is a Cron job) and see if new data has been published. If it has, my app will download the latest data and store it in an NEDB database.
https://github.com/louischatriot/nedb
My app is deployed on Heroku, where I have the basic dyno "web", for $7/month. This app allows me to always check the latest data in a quick and easy way and mostly I have no trouble with it.
But something weird is happening. Sometimes it seems the database "resets". When I uploaded the app to Heroku from my local pc, the database was updated until December. The app quickly self-updates to include january and february, as these are the latest published datas. But sometimes when I check my app to view the data, it only has up until December. It is as if the app has been reset to the state it was in when I uploaded. The app of course quickly gets back the Jan and Feb data, as the Cron job is running. But it's a problem for me that this reset keeps happening. I don't know how often it happens, because I haven't observed many instances of this.
Does anyone know why this might be happening and what I can do about it?

how can Reduce the pressure from main page of gps tracking

im developing gps tracking system with asp.net mvc and sql server db first , i have question about main page that show all device on map, in this page fetch many data from database and and process it and after every 15 seconds i should refresh this page
how can i Reduce the pressure on this page?
If you have an idea or a specific way to reduce the pressure on this page please guide me

You could use SignalR which implements web sockets; unlike what you're doing which is polling (constantly requesting data from the server every 15s or so), with websockets you don't have to initiate a request again and again, as the server will be the one send data to any connected client IF there is data to be sent.
It's a popular C# library so you'll find a lot of tutorials here and there.

diagram of main tables,but i insert row data of gps device in another table but last log table is live data for display last state of carrier

a Method to Updating REST API data in periodic time?

i dunno how to put it. Hope the title is right for my problem or scenario.
I want to build a REST API, with a data coming from many rssfeed web. Right now, i'm able to fetch the data using a script javascript and saving it in my database. To be able fetch that data, i have to open a page so the script will be able to run and reload every 1 minute. The Rest Api is still in localhost by the way.
The Question is, what if i want to host it, should i have 1 PC to
always running 24 hours which only open a browser and access a REST
API address so the script will keep running and the data will always
be up to date?
Right now this the only method in my head, is there any method that i shouldn't have 1 pc to running 24hours a day seven days a week.

The best solution of your problem is to setup a scheduler that will be running on a predefined period, and fetch the data and store it in DB internally and you don't need to open a page to do that if you are not modifying the response returned from rssfeed.
You can go through this, Post , Tutorial, Node-Schedule, Parse. These are some of the example which you can use based on your requirements

Tornado, Django, and push notifications

I've searched Stack overflow and all I can find is how to use Tornado as a HTTP server.
Now, my question is how do I start doing push notifications using the system?
Let me give you some context...
The database
I have a database on some server far away that I know nothing about, other than its a postgreSQL database and a piece of software on that server updates the database every so often ( maybe every couple of seconds, to couple of days).
Currently
I Have a django app that displays these database rows. it gets these database rows from a different app - an app called api - using an ajax call every 5 seconds. As we all know this method is wasteful.
What I'd like to do
Well I'll bullet point it:
I'd like my Django app to stay the same in structure
The Django app will contain in its view JS code for connecting to a separate server.
this separate server will check the database for changes every 60 seconds. If the database has changed, then notify the clients with a message, such as "new data available"
Hopefully thats not too vague.
Thanks,
Andy.

I found that the django-websocket-redis package suits my needs which are very much comparable to yours as it can easily be implemented on top of your existing project.
Mind that there are a few dependencies (UWGSI and Redis, primarily) and I've had to switch to a Linux development environment to get everything to work properly.

patterns for building Web/Mobile apps that processes a lot of data on the client side

I'm trying to build a single page web app using Backbone. the app looks and behaves like a mobile app running on a tablet.
The web app is built to help event organizers manage their lists of people attending their events, and this includes the ability to search and filter those lists of attendees.
I load all attendees list when the user opens the attendees screen. and whenever the user starts to search or filter the attendees, the operation happens on the client side.
This way always works perfectly when the event has about ~400 attendees or less, but when the number of attendees gets bigger than that (~1000), the initial download time takes longer (makes sense) .. but after all data is loaded, searching and filtering is still fast relatively.
I originally decided to go with the option of fully loading all the data each time the app is loaded; to do all search operations on the client side and save my servers the headache and make search results show up faster to the user.
I don't know if this is the best way to build a web/mobile app that processes a lot data or not.
I wish there's a known pattern for dealing with these kinds of apps.

In my opinion your approach to process the data on the client side makes sense.
But what do you mean with "fully loading all the data each time the app is loaded"?
You could load the data only once at the beginning and then work with this data throughout the app lifecycle without reloading this data every time.
What you also could do is store the data which you have initially fetched to HTML5 localstorage. Then you only have to refetch the data from the server if something changed. This should reduce your startup time.

We Keep Coding

JavaScript is the programming language of the Web.

Update and fetch the latest data from scraped website on Heroku app - javascript

Related

Why does Node reset the values in NEDB databases?

how can Reduce the pressure from main page of gps tracking

a Method to Updating REST API data in periodic time?

Tornado, Django, and push notifications

patterns for building Web/Mobile apps that processes a lot of data on the client side

Categories

Resources