I need to get from thousands of online JSON about 300.000 final lines, equal to 30MB.
Being beginner in coding, I prefer to stick to JS to $getJSON data, cut it, append interesting parts to my <body>, and loop on the thousands online JSON. But I wonder :
can my web-browser handles 300.000 $getJSON queries and the resulting 30~50MB webpage without crashing ?
is it possible to use JS to write down a file with this results, so the script's works is constantly saved ?
I expect my script to run about 24 hours. Numbers are estimations.
Edit: I don't have server side knowledge, just JS.
A few things aren't right about your approach for this:
If what you are doing is fetching (and processing) data from another source then displaying it for a visitor, processing of this scale should be done separately and beforehand in a background process. Web browsers should not be used as data processors on the scale you're talking about.
If you try to display a 30-50MB webpage, your user is going to experience lots of frustrating issues - browser crashes, lack of responsiveness, timeouts, long load times, and so on. If you expect any users on older IE browsers, they might as well give up without even trying.
My recommendation is to pull this task out and do it using your backend infrastructure, saving the results in a database which can then be searched, filtered, and accessed by your user. Some options worth looking into:
Cron
Cron will allow you to run a task on a repeated and regular basis, such as daily or hourly. Use this if you want to continually update your dataset.
Worker (Heroku)
If running Heroku, take it out of the dyno and use a separate worker so as not to clog up any existing traffic on your app.
Related
I'm developing an app that should receive a .CSV file, save it, scan it, and insert data of every record into DB and at the end delete the file.
With a file with about 10000 records there aren't problems but with a larger file the PHP script is correctly runned and all data are saved into DB but is printed ERROR 504 The server didn't respond in time..
I'm scanning the .CSV file with the php function fgetcsv();.
I've already edit settings into php.ini file (max execution time (120), etc..) but nothing change, after 1 minute the error is shown.
I've also try to use a javascript function to show an alert every 10 seconds but also in this case the error is shown.
Is there a solution to avoid this problem? Is it possible pass some data from server to client every tot seconds to avoid the error?
Thank's
Its typically when scaling issues pop up when you need to start evolving your system architecture, and your application will need to work asynchronously. This problem you are having is very common (some of my team are dealing with one as I write) but everyone needs to deal with it eventually.
Solution 1: Cron Job
The most common solution is to create a cron job that periodically scans a queue for new work to do. I won't explain the nature of the queue since everyone has their own, some are alright and others are really bad, but typically it involves a DB table with relevant information and a job status (<-- one of the bad solutions), or a solution involving Memcached, also MongoDB is quite popular.
The "problem" with this solution is ultimately again "scaling". Cron jobs run periodically at fixed intervals, so if a task takes a particularly long time jobs are likely to overlap. This means you need to work in some kind of locking or utilize a scheduler that supports running the job sequentially.
In the end, you won't run into the timeout problem, and you can typically dedicate an entire machine to running these tasks so memory isn't as much of an issue either.
Solution 2: Worker Delegation
I'll use Gearman as an example for this solution, but other tools encompass standards like AMQP such as RabbitMQ. I prefer Gearman because its simpler to set up, and its designed more for work processing over messaging.
This kind of delegation has the advantage of running immediately after you call it. The server is basically waiting for stuff to do (not unlike an Apache server), when it get a request it shifts the workload from the client onto one of your "workers", these are scripts you've written which run indefinitely listening to the server for workload.
You can have as many of these workers as you like, each running the same or different types of tasks. This means scaling is determined by the number of workers you have, and this scales horizontally very cleanly.
Conclusion:
Crons are fine in my opinion of automated maintenance, but they run into problems when they need to work concurrently which makes running workers the ideal choice.
Either way, you are going to need to change the way users receive feedback on their requests. They will need to be informed that their request is processing and to check later to get the result, alternatively you can periodically track the status of the running task to provide real-time feedback to the user via ajax. Thats a little tricky with cron jobs, since you will need to persist the state of the task during its execution, but Gearman has a nice built-in solution for doing just that.
http://php.net/manual/en/book.gearman.php
We have built a webapp using Meteor.JS and we would like to find out
how will it perform in real production mode when thousands/millions of users can log request/response.
How efficient will be the framework when it comes to volume and response time.
We would like to know if there is any tools or best practices which we can use.
Thank you for your time in advance.
What you need to do is look at load testing your app. There are many ways to do this, but the premise is that you script what a user interaction consists of and you run that script multiple times, ramping up the volume until you get to the concurrency numbers you are looking for.
As mentioned, there are LOTS of tools to do this. Some I have personally used:
Loadrunner
Apache jMeter
Rational Performance Tester
Simple shell scripts with curl
There is a list on Wikipedia that would give you a good start on what other tools may be available.
The next point I'd like to make is that load testing is not just ramping up the concurrent users until the system breaks. A good load test will attempt to mimic the type of traffic you see or expect to see. To say it another way, two users do not do the exact same thing each time they hit your system, and not all requests to your server produce the same load on your system. Your users may even use the system differently at different times of the day.
The ultimate goal would be to have several "types" of users and transactions scripted, and have your load tool of choice weight the requests such that they match the expected percentages for each type. This way you can better see how your application will perform over time with loads that match what you really expect to see.
Finally, you need to consider your infrastructure. Typically, in the Enterprise environments I have worked in, we try to make our QA environment as close to Production as possible. We would have the exact same server hardware, exact same configuration (both physical and software), and attempt everything possible to make it mirror. Where we sometimes deviate is the size. For instance, I worked in on environment where we had 4 app servers in Prod, each with 4 nodes, for a total of 16 nodes in a cluster. In QA, we had the same bare metal hardware, but everything was cut in half, so we had 2 physical servers, each with 4 nodes totally 8 nodes in the QA cluster. When we tested for load, we would then half the expected total, so if we expected 200 users concurrently in Prod, a successful QA test would be 100 users concurrently. We would also copy and obfuscate data from the Production databases to the QA databases to make sure we are operating with the same sized data.
The process for load testing is this:
Grab a "baseline" result
Make whatever changes you need to make
Rerun the EXACT same tests as #1, only adding any new functionality to your tests
The baseline is important when making changes over time to ensure that your change doesn't effect performance. I can't tell you how many times I have seen SQL statements that ran perfectly fine in Dev that completely took down a system because of differences in the amount of data.
I'm building my first site using this framework, i'm remaking a website i had done in PHP+mySQL and wish to know something about performance... In my website, i have two kinds of content:
Blog Posts (for 2 sections of the site) - these have the tendency to one day sum to thousands of records and, are more often updated
Static (sort of) data: this is information i keep in the database, like site section's data (title, metatags, header image url, fixed html content, javascript and css filenames to include in that section), that is rarely updated and it's very small in size.
While i was learning the basics on nodeJS, i started thinking of a way to improve the performance of the website, in a way i couldn't do with PHP. So, what i'm doing is:
When i run the app, the static content is all loaded into memory, i have a "model" Object for each content that stores the data in an array, has a method to refresh that data, ie, when the administrator updates something, i call refresh() to go get the new data from the database to that array. In this way, for every page load, instead of querying the database, the app queries the object in memory directly.
What i would like to know is if there should be any increase of performance, working with objects directly in memory or if constant queries to the database would work just as good or even better.
Any documentation supporting your answer will be much appreciated.
Thanks
In terms of the general database performance, MongoDB will keep your working set in memory - that's its basic method of operation.
So, as long as there is no memory contention to cause the data to get swapped out, and it is not too large to fit into your physical RAM, then the queries to the database should be extremely fast (in the sub millisecond range once you have your data set paged in initially).
Of course, if the database is on a different host then you have network latency to think about and such, but theoretically you can treat them as the same until you have a reason to question it.
I don't think there will be any performance difference. First thing is that this static data is probably not so big (up to 100 records?) and querying DB for it is not a big deal. Second thing (more important) is that most DB engines (including mongoDB) have caching systems built-in (although I'm not sure how they work in details). Third thing is that holding query results in memory does not scale well (for big websites) unless you use storage engine like Redis. And that's my opinion, although I'm not the expert.
I got a webpage that calls oracle and then does some processing and then a lot of javascript.
The problem is that all of this make it slow for the user. I have to use internet explorer 6 so the javascript takes very long to load, around 15 seconds.
How can i make my server do all of this every minute for example and save the page so if a user requests it it would server them that page that is all ready calculated etc
im using tomcat server my webpage is mainly javascript and html
edit:
By the way I can not rewrite my webpage, it would have to remain as it is
I'm looking for something that would give the user a snapshot of the webpage that the server loaded
YSlow recommendations would tell you that you should put all your CSS in the head of your page and all JavaScript at the bottom, just before the closing body tag. This will allow the page to fully load the DOM and render it.
You should also minify and compress your JavaScript to reduce download size.
To do that, you'd need to have your server build up the DOM, run the JavaScript in an environment that looks (enough) like web browser, and then serialize the result as HTML.
There have been various attempts to do that, Jaxer is one of them (it was originally a product from Aptana, now an Apache project). Another related answer here on SO pointed to the jsdom project, which is a DOM implementation in JavaScript (video here).
Re
By the way I can not rewrite my webpage, it would have to remain as it is
That's very unlikely to be successful. There is bound to be some modification involved. At the very least, you're going to have to tell your server-side framework what parts it should process and what parts should be left to the client (e.g., user-interaction code).
Edit:
You might also look for "website thumbnail" services like shrinktheweb.com and similar. Their "pro" account allows full-size thumbnails (what I don't know is whether it's an image or HTML). But I'm not specifically suggesting them, just a line you might pursue. If you can find a project that does thumbnails, you may be able to adapt it to do what you want.
But again, take a look at Jaxer, you may find that it does what you need or very similar (and it's open-source, so you can modify it or extract the bits you want).
"How can i make my server do all of this every minute for example"
If you are asking how you can make your database server 'pre-run' a query, then look into materialized views.
If the Oracle query is responsible for (for example) 10 seconds of the delay there may be other things you can do to speed it up, but we'd need a lot more information on what the query does
I came across a site that does something very similar to Google Suggest. When you type in 2 characters in the search box (e.g. "ca" if you are searching for "canon" products), it makes 4 Ajax requests. Each request seems to get done in less than 125ms. I've casually observed Google Suggest taking 500ms or longer.
In either case, both sites are fast. What are the general concepts/strategies that should be followed in order to get super-fast requests/responses? Thanks.
EDIT 1: by the way, I plan to implement an autocomplete feature for an e-commerce site search where it 1.) provides search suggestion based on what is being typed and 2.) a list of potential products matches based on what has been typed so far. I'm trying for something similar to SLI Systems search (see http://www.bedbathstore.com/ for example).
This is a bit of a "how long is a piece of string" question and so I'm making this a community wiki answer — everyone feel free to jump in on it.
I'd say it's a matter of ensuring that:
The server / server farm / cloud you're querying is sized correctly according to the load you're throwing at it and/or can resize itself according to that load
The server /server farm / cloud is attached to a good quick network backbone
The data structures you're querying server-side (database tables or what-have-you) are tuned to respond to those precise requests as quickly as possible
You're not making unnecessary requests (HTTP requests can be expensive to set up; you want to avoid firing off four of them when one will do); you probably also want to throw in a bit of hysteresis management (delaying the request while people are typing, only sending it a couple of seconds after they stop, and resetting that timeout if they start again)
You're sending as little information across the wire as can reasonably be used to do the job
Your servers are configured to re-use connections (HTTP 1.1) rather than re-establishing them (this will be the default in most cases)
You're using the right kind of server; if a server has a large number of keep-alive requests, it needs to be designed to handle that gracefully (NodeJS is designed for this, as an example; Apache isn't, particularly, although it is of course an extremely capable server)
You can cache results for common queries so as to avoid going to the underlying data store unnecessarily
You will need a web server that is able to respond quickly, but that is usually not the problem. You will also need a database server that is fast, and can query very fast which popular search results start with 'ca'. Google doesn't use conventional database for this at all, but use large clusters of servers, a Cassandra-like database, and a most of that data is kept in memory as well for quicker access.
I'm not sure if you will need this, because you can probably get pretty good results using only a single server running PHP and MySQL, but you'll have to make some good choices about the way you store and retrieve the information. You won't get these fast results if you run a query like this:
select
q.search
from
previousqueries q
where
q.search LIKE 'ca%'
group by
q.search
order by
count(*) DESC
limit 1
This will probably work as long as fewer than 20 people have used your search, but will likely fail on you before you reach a 100.000.
This link explains how they made instant previews fast. The whole site highscalability.com is very informative.
Furthermore, you should store everything in memory and should avoid retrieving data from the disc (slow!). Redis for example is lightning fast!
You could start by doing a fast search engine for your products. Check out Lucene for full text searching. It is available for PHP, Java and .NET amongst other.