In Chrome, you can go to developer tools > network tab to see all the requests the website is making.
What would be a good way to get the list of these requests programmatically? I guess I can grab the content of the site, grab all the URLs that are in the page and parse them, but that seems a bit tedious, especially if the requests are being made from a JS file.
Is there an easier way?
On the server side, there's the access log that will have request information for all requests made against it. It might take some configuration to save all the fields you want to use (method, POST data, cookies, time, remote client address) but this is more efficient than trying to do things from a browser. Each web server has a different way of configuring its log files.
I am serving Angular app as a static content in Express Server. When serving static files with Express, Express by default adds ETag to the files. So, each next request will first check if ETag is matched and if it is, it will not send files again. I know that Service Worker works similar and it tries to match the hash. Does anyone know what is the main difference between these two approaches (caching with ETag and caching with Service Workers), and when we should use one over the other? What would be the most efficient when it comes to performance:
Server side caching and serving Angular app static files
Implementing Angular Service Worker for caching
Do both 1 and 2
To give a better perspective, I'll address a third cache option as well, to clarify the differences.
Types of caching
Basically we have 3 possible layers of caching, based on the priority they are checked from the client:
Service Worker cache (client-side)
Browser Cache, also known as HTTP cache (client-side)
Server side cache (CDN)
PS: Some browser like Chrome have an extra memory cache layer in front of the service worker cache.
Characteristics / differences
The service worker is the most reliable from the client-side ones, since it defines its own rules over how to manage the caching, and provide extra capabilities and fine-grained control over exactly what is cached and how caching is done.
The Browser caching is defined based on some HTTP headers from the assets response (Cache-Control and Expires), but the main issue is that there are many conditions in which those are ignored.
For instance, I've heard that for files bigger than 25Mb, normally they are not cached, specially on mobile, where the memory is limited (I believe it's getting even more strict lately, due to the increase in mobile usage).
So between those 2 options, I'd always chose the Service Worker cache for more reliability.
Now, talking to the 3rd option, the CDN checks the HTTP headers to look for ETag for busting the cache.
The idea of the Server-side caching is to only call the origin server in case the asset is not found on the CDN.
Now, between 1st and 3rd, the main difference is that Service Workers works best for Slow / failing network connections and offline, since the cache is done client-side, so if the network is off, then the service worker retrieves the last cached information, allowing for a smooth user experience.
Server-side, on the other hand, only works when we are able to reach the server, but at the same time, the caching happens out of user's device, saving local space, and reducing the application memory consumption.
So as you see, there's no right / wrong answers, just what works best for your use case.
Some Sources
MDN Cache
MDN HTTP caching
Great article from web.dev
Facebook study on caching duration and efficiency
Let's answer your questions:
what is the main difference between these two approaches (caching with ETag and caching with Service Workers)
Both solutions cache files, the main difference is the need to reach the server or stay locally:
For the ETag, the browser hits the server asking for a file with a hash (the etag), depending on the file stored in the server, the server will answer with a "the file was not modified, use your local copy" with a 300 HTTP response or "here is a new version of that file" with a 200 HTTP response and a new file. In both cases the server always decides. and the user will wait for a round trip.
With the Service worker approach you can decide locally what to do. You can write some logic to control what/when to use a local copy (cached) or when go to the server. This is very useful for offline capabilities since the logic is happening in the client, and there is no need to hit the server.
when we should use one over the other?
You can use both together. You can define some logic in the service worker, if there is no connection return the local copies, otherwise go to the server.
What would be the most efficient when it comes to performance:
Server side caching and serving Angular app static files
Implementing Angular Service Worker for caching
Do both 1 and 2
My recommended approach is use both approaches. Although treat your files differently, the 'index.html' file can change, in this case use the service worker (in case there is no internet access) and if there is internet access let the web server answer with the etag. All the other static files (CSS and JS) should be immutable files, this is you can be sure the local copy is valid, in this case add a hash to the files' name (so they are always unique files) and cache them. When you have a new version of your app, you will modify the 'index.html' pointing to new immutable files.
I’ve thought about making a server with routes/models that will be hosted on Heroku so that it can process requests. Then I would make incremental requests to the server until they can be qualified as Big Data.
I think that there is limitation to how many consequtive requests one pc can make to a mongo datbase, although I couldn’t find any prove on that.
I need results of Big Data flow so that I could further analyze and include them in my essay.
Is it the right way to do it?
You can use JMeter, JMeter is a Load testing is the process of putting the load through (HTTP, HTTPS, WebSocket etc) calls on any software system to determine its behavior under normal and high load conditions. Load test helps identify maximum requests a software system can handle.
I'm developing a small app using Ajax and http requests.
Currently i'm sending one request each second to server for checking if there are updates, and if there are, to get them and download data.
This timing profile is modified when a user interact with the app, but it's negligible.
I'm just wondering.. i could send an infinite loop of requests to the server. more the requests are often, more the app will be speedy. but then doesn't server get too many requests?
but how much is the right time from a request to another one?
I've read something about tokens, but i can't understand which is the better way to check if servers have some updates. thanks in advance
Long polling is one of the main options here. You should look into the server and see that there is good support for persistent HTTP connections if you expect to have a large number of users persistently connected.
For example, Apache webserver on its own opens a thread per connection, that can be a notable challenge with regard to many users with persistent connections, you end up with lots of threads (there are approaches to address it in Apache which you can research further if need be). Jetty, a java based web server (among my personal favorites), uses a more advanced network library that scales connections to threads much more logarithmically and efficiently, essentially putting connections to sleep until traffic in/out is detected.
Here are a few link:
http://en.wikipedia.org/wiki/Push_technology
http://en.wikipedia.org/wiki/Comet_(programming)
http://www.einfobuzz.com/2011/09/reverse-ajax-comet-server-push.html
Is it possible to configure the browser's timeout for pending requests via JavaScript?
By browser requests I mean requests that the browser executes in order to fetch static resources like images referenced by url in html files, fonts, etc (I'm no talking about xmlHttpRequests made intentionally by the application for fetching dynamic content on a back-end server. I'm talking about requests that I'm unable to control, made automatically by the browser, like the aforementioned ones).
Sometimes the requests for such resources stay pending occupying a connection (and the number of connections is limited). I'd like to timeout (or cancel) those so they don't occupy connection.