I came across a site that does something very similar to Google Suggest. When you type in 2 characters in the search box (e.g. "ca" if you are searching for "canon" products), it makes 4 Ajax requests. Each request seems to get done in less than 125ms. I've casually observed Google Suggest taking 500ms or longer.
In either case, both sites are fast. What are the general concepts/strategies that should be followed in order to get super-fast requests/responses? Thanks.
EDIT 1: by the way, I plan to implement an autocomplete feature for an e-commerce site search where it 1.) provides search suggestion based on what is being typed and 2.) a list of potential products matches based on what has been typed so far. I'm trying for something similar to SLI Systems search (see http://www.bedbathstore.com/ for example).
This is a bit of a "how long is a piece of string" question and so I'm making this a community wiki answer — everyone feel free to jump in on it.
I'd say it's a matter of ensuring that:
The server / server farm / cloud you're querying is sized correctly according to the load you're throwing at it and/or can resize itself according to that load
The server /server farm / cloud is attached to a good quick network backbone
The data structures you're querying server-side (database tables or what-have-you) are tuned to respond to those precise requests as quickly as possible
You're not making unnecessary requests (HTTP requests can be expensive to set up; you want to avoid firing off four of them when one will do); you probably also want to throw in a bit of hysteresis management (delaying the request while people are typing, only sending it a couple of seconds after they stop, and resetting that timeout if they start again)
You're sending as little information across the wire as can reasonably be used to do the job
Your servers are configured to re-use connections (HTTP 1.1) rather than re-establishing them (this will be the default in most cases)
You're using the right kind of server; if a server has a large number of keep-alive requests, it needs to be designed to handle that gracefully (NodeJS is designed for this, as an example; Apache isn't, particularly, although it is of course an extremely capable server)
You can cache results for common queries so as to avoid going to the underlying data store unnecessarily
You will need a web server that is able to respond quickly, but that is usually not the problem. You will also need a database server that is fast, and can query very fast which popular search results start with 'ca'. Google doesn't use conventional database for this at all, but use large clusters of servers, a Cassandra-like database, and a most of that data is kept in memory as well for quicker access.
I'm not sure if you will need this, because you can probably get pretty good results using only a single server running PHP and MySQL, but you'll have to make some good choices about the way you store and retrieve the information. You won't get these fast results if you run a query like this:
select
q.search
from
previousqueries q
where
q.search LIKE 'ca%'
group by
q.search
order by
count(*) DESC
limit 1
This will probably work as long as fewer than 20 people have used your search, but will likely fail on you before you reach a 100.000.
This link explains how they made instant previews fast. The whole site highscalability.com is very informative.
Furthermore, you should store everything in memory and should avoid retrieving data from the disc (slow!). Redis for example is lightning fast!
You could start by doing a fast search engine for your products. Check out Lucene for full text searching. It is available for PHP, Java and .NET amongst other.
Related
I'm creating a statistics web page which can see sensitive information.
The webpage has a sort of table which has massive data in it, editable and stored in Server's database. But It needs to be hidden before the user got proper authentications(Like log-in). (Table itself and it's code too). But I found that most of the questions in stack overflow say it is basically impossible. But when I see lots of well-known websites, it seems they are hiding them well. So I guess there are some solutions to the problem.
At first, I build a full-stack of React - Express - Node - MariaDB toolchain.
The react client is responsible for rendering contents of a webpage and editable tables and request for submitting edited content.
The node with express are responsible for retrieving data from DB, updating DB (Provides data to manipulate from client-side -- that's all)
It comes to a problem when I'm considering security on client-side code. I want to hide all content of the page (not just data from the server, but also its logic and features)
To achieving my goals, I consider several things, but I doubt if it is right and working well if I create.
Using Serverside rendering -- Cannot use due to performance reason and lack of resources available
Serverside rendering can hide logic from the user cause it omits the only HTML from the server and all actions are submitted to the server and the server handle the actions and provide its result.
So I can provide only the login page at first, and if login is successful, I can send the rest of HTML and it's logics from the server.
The problem is that my content in the webpage is massive and will be interacted with the user very often, and applying virtualization on my table (by performance reason), it's data and rendering logic should be handled by the web browser.
Combining SSR and Client-Side Rendering
My inspection for this is not sure, I doubt if it is possible.
Use SSR for hiding content of the site from unauthorized users, and if authorized, the web browser renders its full content on demand. (Code and logics should be hidden before authorization, the unauthorized user only can see the login page)
Is it possible to do it?
Get code on demand.
Also my inspection, this is what I am looking for. But I strongly doubt if it is possible.
Workflow is like below
If a user is not logged in:: User only can see the login page and its code
If the user is logged in:: User can see features of the page like management, statistics, etc.
If the user approaches specific features:: Rendering logic and HTTP request interface is downloaded from the server (OR less-performance hindering logic or else...) and it renders what users want to see and do.
It is okay not to find ways from the above idea. Can you provide some outlines for implement such kind of web page? I'm quite new to Web Programming, so I cannot find proper ways. I want to know how can I achieve this with what kinds of solutions, library, structure.
What lib or package should I use for this?
How can I implement this?
OR can you describe to me how modern websites achieve this? (I think the SAP system quite resembles with what I wanna achieve)
Foreword
Security is a complex topic, in which it is not possible to reach 0 threat. I'll try to craft an answer that could fullfil what you are looking for.
Back end: Token, credentials, authentication
So, you are currently using Express for your back end, hence the need to sort of protect the access from this part, many solution exist, I favor the token authentication, but you can do something with username/password (or this) to let the users access the back end.
From what you are describing you would use some sort of API (REST, GraphQL etc.) to connect to the back-end and make your queries (fetch, cross-fetch, apollo-link etc.) and add the token to the call to the back end in the headers usually.
If a user doesn't have the proper token, they have no data. Many sites use that method to block the consumption of data from the users (e.g. Twitter, Instagram). This should cover the security of the data for your back end, and no code is exposed.
Front-end: WebPack and application code splitting
Now the tricky part, so you want the client side not to have access to all the front-end at once but in several parts. This has 2 caveats:
It will be a bit slower than in normal use
Once the client logged in once, he will have access to the application
The only workaround I see in this situation is to use only server side rendering, if you want to limit to the bare minimum the amount of data the client has on your front end. Granted it is slow, but if you want maximum protection that is the only solution.
Yet, if you want to still keep some interactions and have a faster front end, while keeping a bit of security, you could use some code splitting with WebPack. I am not familiar with C so I can't say, but the Multiple page application of WebPack, as I was mentionning in the comment, should give you a good start to build something more secure.
First, you would have for example 2 html files for entering the front end: one with the login and one with the application. The login contains only the Javascript modules that are for entering the application and shouldn't load the other Javascript modules.
All in all, entrypoints are the way you can enter the application, this is a very broad topic that I can't cover in this answer, but I would recommend you to follow WebPack's tutorial and find out how you can work this out.
I recommend the part on code splitting, but all the tutorial is worth having a look.
Second, you will have to tweak the optimisation module. It is usually a module that tries to reduce the size of the application by merging methods that are used by different parts or that are redundant: you don't want this.
In your case, you don't want un-authenticated users to have access. So you would have to probably change things there (as well another broad topic to be covered in a single answer, since you would have to decide what you keep for optimisation and what you remove for security), but here is the link to the optimisation module and a heads up, you will have to modify the SplitChunksPlugin not to do this optimisation.
I hope this helps, there are many solutions are hand and this is not a comprehensive guide but that should give you enough materials to get to what you need.
Update: I am getting the impression that this is not even the right website to post this. If someone can point me in the right direction, I'd be appreciative...
I have an existing PHP+MySQL application that wasn't built to render "real-time" or similarly live-style data. But now I need to build in a way to pull nearly real-time data into the application and keep the data on the page fresh. This live data is only for 1 page in the application.
Looked at things like socket.io and PHP-based websockets libraries, but it seemed like overkill because the data is basically coming from 1 source and being delivered to 1 person (the client). Multiple other users could have this process running, but each one would bring their own data endpoint. That's... like a year down the road. But good to think about. Would ideally have hundreds, or thousands of users on the system, pulling their live-ish data. So I want this to be as streamlined and low-impact as possible.
Users must be authenticated and authorized to consume the data. This is already baked into the current system.
The API to get the data (which has already been built by another vendor) is also NOT streaming. It's set on a 20-second cron, so the new data is available every 20 seconds, which satisfies the client's needs.
My current plan is to do something like this...
Data is pulled on a cron every 20 seconds, organized, and stored into the database (complete)
Adjust #1 so it also does any additional proprietary calculations on data AND compiles + writes a JSON file on the server (unique to the user) which is the exact data needed for the front end (DB data is needed for other pages)
Create small PHP-based service which validates a client-provided JWT and reads the JSON file out
Write AJAX front end to poll endpoint from #3 every X seconds using a JWT for authorization
This all seems sort of like I might be reinventing the wheel, or missing something. The fact that this is an existing PHP based application (LAMP) does have some limiting factors, but I feel like there's got to be a more efficient way to handle this... It's pretty new to me. Also, I'm open to other technologies that'll run on the LAMP stack, if it'll make things better.
I would say go for the API solution in the beginning :) Since it fits the architecture more and is for sure the least amount of work. Also if there will be problem with the "live" feeling of the data you can fix it by polling more often or introducing long polling, assuming you change the cron job time.
I mean in the end it is all about impact for the time spent, don't start implement features that customers don't care about :)
The biggest problem to solve is to implement it in a way that fits your requirements and is somewhat future extendable. You still have to deal with issues like resolution, time outs, reducing server processing when requesting data and so on!
For me, if you need to maintain a global service state because a single client(s) request could affect all other connected client request(s) then most all server-side scripting languages are not the best choice! Also to further add, if you plan on implementing something like this with PHP, you will be setting your self up for a living nightmare! Why, because simply put, PHP(s) socket(s) implementation is that bad!
I have been developing a PHP application for quite a while. The (basic) idea is as follows; users can build web-pages using blocks. These blocks can contain images, text etc. Each of these blocks have their own options. These blocks are defined in Domain Driven Design through PHP.
I've build the application to use a php-based Controller that handles the requests from a jQuery/Javascript front. Each time the user edits an option its send to this Controller which unserialises a collection of blocks (php-objects) from Redis and/or the php-session and sets the the attributes of the blocks that are edited or adds/removes one of the blocks. This is to enforce the Domain logic.
Which was fine will developing for myself. I never kept race conditions and such in mind. While moving forward with the product I notice that people lose data. I'll explain what happens;
User edits an option of a block
press save
A request is made to the controller which,
unserialises the collection and
sets the blocks based on their uuid
puts the blocks back in the collection and
serialises the collection again.
There are scenario's where 2 concurrent request might be created which will override the edits of 1 of both requests.
I know I need to rewrite this part of the application. The question is what is the best approach. I could;
Implement some javascript library which will take me a lot of work because it would require me to rewrite that entire part of the application. Also I do not have a lot of experience implementing javascript based solutions. But I do not might stepping into something new. I do want to javascript testing to prevent future problems from occurring and enable cross-browser testing
Apply Redis / Session locking to only enable the controller to process a single request and prevent concurrent requests from overwriting the data set in the previous request. This will lower the chance of concurrent request and data loss, but not fully. People with real slow internet connection might get their connections losing when they might produce a lot of concurrent requests.
I'm curious what other approaches I might be missing, or if one of the two I mentioned above will suffice.
As far as I understand your problem, what you may want to implement is optimistic locking.
A simple way to implement it, is to version your aggregate.
Every time someone edits your object, increment its version.
When you POST your edited blocks, you send back the version on which you are trying to apply your changes.
then, when getting back your object from your persistence storage, you compare the version and ensure you are actually working on an up-to-date object.
If it is, save it, if it is not, reject the modification, notify the user, and reload the object, and take the appropriate action (it depends on your needs).
I'm not exactly sure how to formulate the question, but I think it's more of a suggestions request, instead of a question per se.
We are building an HTML5 service on which users get credited (rewarded, on social gaming lingo) for completing a series of offers. Most of these offers are video ad watching. We already have an implementation of this built on Flash, but for HTML5 I'm encountering a bit more issues on how to make the request calls to validate legit watched video ads. On the Flash interface, we have a series of HTTP requests that the SWF makes, some upon the video playback starts, in the middle and at the end, each one of those requests are related to each other, meaning, the response of one is needed on the next request, etc. Most of the logic to "hide" this "algorithm" is lightly hidden on the SWF binary, and it pretty much serves it purpose.
However, for HTML5 we have to rely on world visible JavaScript and that "hidden" logic is open wide. So, I guess this is a call for suggestions on how these cases are usually handled so that an skilled person could not (so easily) get access to it and exploit the service to get credited programmatically. Obfuscating the JavaScript seems like something that could help but that in no way protects fully.
There's of course some extra security on the backend (like frequency capping, per user capping, etc), but since our capping clears every day, an skilled person could still find a way to get credit for all available offers even without completing them.
It sounds like you want to ensure that your server can distinguish requests that happened as the result of the user interacting with your UI in ways you approve of from requests that did not happen that way.
There are a number of points of attack on such a system.
Inspect the JavaScript to find the event handler and invoke them via Firebug or another tool.
Inspect any keys from your code, and generate the HTTP requests without involving the browser.
Run code in the browser to programmatically generate events.
Use a 3rd-party tool that instruments the browser to generate clicks.
If you've got reasonable solutions to instrumentation attacks (3 and 4), then you can look at Is there any way to hide javascript functions from end user? for ways to get secrets into the client to allow you to sign your requests. Beyond that, obfuscation is the only (and imperfect) way to stop a not-too-determined attacker from any exploitation, and rate-limiting and UI event logging are probably your best bets for stopping determined attackers from benefiting from wide-scale fraud.
You will not be able to prevent a determined attacker (even with SWF, though it's more obfuscated). Your best bet is to make sure that:
Circumventing your measures is expensive in terms of effort, perhaps by using a computationally expensive crypto algorithm so they can't just set up a bunch of scripts to do it.
The payoff is minimal (user-capping is an example of how to reduce payoff; if you're giving out points, it's fine; if you're mailing out twenty dollar bills, you're out of luck)
Cost-benefit.
I am working on a simple notification service that will be used to deliver messages to the users surfing a website. The notifications do not have to be sent in real time but it might be a better user experience if they happened more frequently than say every 5 minutes. The data being sent to and from the client is not very large and it is a straight forward database query to retrieve the data.
In reading other conversations on the topic it would appear that an AJAX push can result in higher server loads. Since I can tolerate longer server delays is it worth while to have the server push notifications or to simply poll.
It is not much harder to implement the push scenario and so I thought I would see what the opinion was here.
Thanks for your help.
EDIT:
I have looked into a simple AJAX Push and implemented a simple demo based on this article by Mike Purvis.
The client load is fairly low at around 5k for the initial version and expected to stay that way for quite some time.
Thank you everyone for your responses. I have decided to go with the polling solution but to wrap it all within a utility library so that if they want to change it later it is easier.
I'm surprised noone here has mentioned long-polling. Long polling means keeping an open connection for a longer period (say 30-60 seconds), and once it's closed, re-opening it again, and simply having the socket/connection listen for responses. This results in less connections (but longer ones), and means that responses are almost immediate (some may have to wait for a new polling connection). I'd like to add that in combination with technologies like NodeJS, this results in a very efficient, and resource-light solution, that is 100% browser compatible across all major browsers and versions, and does not require any additional tech like Comet or Flash.
I realize this is an old question, but thought it might still be useful to provide this information :)
Definitely use push its much cooler. If you just want simple notifications I would use something like StreamHub Push Server to do the heavy-lifting for you. Developing your own Ajax Push functionality is an extremely tricky and rocky road - you have to get it working in all browsers and then handle firewalls and proxies killing keep-alive connections etc... Why re-invent the wheel. Also, it has a similarly low footprint of less than 10K so it should suit if that is a priority for you.
Both have diferent requirements and address diferent scenarios.
If you need realtime updates, like in an online chat, push is a must.
But, if the refresh period is big, as it is in your case (5 minutes), then pool is the appropriate solution. Push, in this case, will require a lot of resource from both the client and the server.
Tip! try to make the page that checks the pool fast and clean, so it doesn't consumes a lot of resources in the server in each request. What I usually do is to keep a flag in memory (like in a session variable) that says if the pool is empty or not... so, I only do havy look in the pool only if it is not empty. When the pool is empty, which is most of the time, the page request runs extremely fast.
Because using a push requires an open HTTP connection to be maintained between your server and each client, I'd go for poll as well - not only is that going to consume a lot of server resources but it's also going to be significantly more tricky to implement as matt b mentioned.
My experience with polling is that if you have a frequent enough polling interval on a busy enough site your web server logs can get flooded with poll requests real quickly.
Edit (2017): I'd say your choices are now are between websockets and long polling (mentioned in another answer). Sounds like long polling might be the right choice based on the way the question mentions that the notifications don't need to be received in real time, an infrequent polling period would be pretty easy to implement and shouldn't be very taxing on your server. Websockets are cool and a great choice for many applications these days, sounds like that might be overkill in this case though.
I would implement a poll just because it sounds simpler to write, and keeping it simple is very valuable.
Not sure if you have taken a look at some of the COMET implementations out there (is that what you mean by AJAX push).
If the user is surfing the site, won't that in effect be requesting information from the server that this notification can piggy-back on?
It's impossible to say whether polling will be more expensive then pushing without knowing how many clients you'll have. I'd recommend polling because:
It sounds like you want to update data about once per minute. Unless notifications are able to arrive at a much faster rate than that, pushing would mean you're keeping an HTTP connection open but seeing very little activity on it.
Polling is built on top of existing HTTP conventions, so any server that talks to web browsers is already ready to respond to ordinary Ajax requests. A Comet– or Flash socket–based solution has different requirements; you'll need something like cometd on the server side and a client-side library that groks server-side push.
So if you needed something heavy-duty to manage a torrent of data and a crapload of clients, I'd recommend Comet. But that doesn't seem to be the case.
There's now a service http://pusherapp.com that is trying to solve this problem once and for all, in a blink. Might be worth checking out. (disclaimer: i am in no way associated with them).
I haven't tried it myself, but some say COMET works and is easier than you think. There's also a Ruby on Rails plug-in called Juggernaut that I've heard talked about highly. Again, I haven't used it, so YMMV, but my understanding is that it takes far fewer resources compared to polling. I believe (can someone confirm?) that COMET is how MacRumorsLive.com delivers live blogging of WWDC Stevenotes.