I'm building a tool, which core structure is: make an AJAX request to Cloudflare worker, which fetches HTML data, and then returns it.
So the steps are:
Send request from client
Worker receives request and makes another, which returns a response as a typical HTML document.
Aaaand on the third step I have two options:
to return the obtained HTML back via AJAX response and then parse it on client
to parse HTML first, and then return processed data via AJAX response
The first one is straightforward: I receive the response from my worker, and insert it the returned HTML somewhere in a hidden <div> and then parse it.
The reason I would prefer to go with a second one, though, is not to waste the bandwidth while delivering HTML from Cloudflare Worker back to client, because original page has a lot of irrelevant bloat. I mean, for example, the original page looks something like this:
<div class="very-much-bloat" id="some-other-bloat" useful_parameter ="value">
<div id="some-other-irrelevant-info" id="really-great-id">
something that I need
</div>
</div>
And all that I need from this is, for example
{
"really-great-id" : "something that I need",
"useful_parameter" : "value"
}
If I go with the first step, it would be pretty straightforward to parse it in-browser, however I'll waste bandwidth for delivering a lot of information that is later disposed of.
However, if the second one would involve using complex libraries, it wouldn't be probably a way to go since max execution time per request is 10ms (that's a free plan on Cloudflare, which otherwise is plenty enough: 100,000 requests per day is more than I probably ever need with this app).
The question is: is there any efficient way to parse HTML on Cloudflare worker without breaking 10ms time limit? Page size obtained with worker is around 10-100 KB, parsed data size is around 1-10KB (10 times less than original roughly). While I understand that 100KB may not sound like a lot, it's still mostly garbage that's better to filter as soon as possible.
Cloudflare Workers currently does not support the DOM API. However, it supports an alternative HTML parsing API that might work for you: HTMLRewriter
https://developers.cloudflare.com/workers/runtime-apis/html-rewriter/
This API is different from DOM in that it operates in a streaming fashion: JavaScript callbacks are invoked as the HTML data streams in from the server, without ever holding the entire document in memory at one time. If it fits your use case, it may allow you to respond faster and use fewer resources than a DOM-based solution would. The CPU time used by HTMLRewriter itself does not even count against the 10ms limit -- only the time spent by your callbacks counts. So if you design your callbacks carefully, you should have no problem staying within the limit.
Note that HTMLRewriter is primarily designed to support modifying an HTML document as it streams through. However, it should not be too hard to have it consume the document and generate a completely different kind of data, like JSON. Essentially, you would set up the rewriter so that the "rewritten" HTML is discarded, and you'd have your callbacks separately populate some other data structure or write to some other stream that represents the final result.
Related
Consider a JSP application with a couple of JavaScript files. The backend is fully localized, using several .properties files, one for each language. Rendered HTML contains strings in the correct language - all of this is the usual stuff and works perfectly.
Now however, from time to time I need to use some localized string in a JavaScript resource. Suppose e.g.:
function foo() {
alert('This string should be localized!');
}
Note that this is somewhat similar to the need to refer some AJAX endpoints from JavaScript, a problem well solved by a reverse JS router. However the key difference here is that the backend does not use the router, but it does use the strings.
I have come up with several approaches, but none of them seems to be good enough.
Parameters
JSP that renders the code to invoke foo() will fetch the string:
foo('<%= localize("alert.text") %>');
function foo(alertText) {
alert(alertText);
}
Pros: It works.
Cons: Method signatures are bloated.
Prototypes
JSP renders a hidden span with the string, JS fetches it:
<span id="prototype" class="hidden">This string should be localized!</span>
function foo() {
alert($('#prototype').text());
}
Pros: Method signatures are no longer bloated.
Cons: Must make sure that the hidden <span>s are always present.
AJAX
There is an endpoint that localizes strings by their key, JS calls it. (The code is approximate.)
function foo() {
$.ajax({ url : '/ajax/localize', data : { key : 'alert.text' } })
.done(function(result) {
alert(result);
} );
}
Pros: Server has full control over the localized result.
Cons: One HTTP call per localized string! Any of the AJAX calls fail, the logic breaks.
This can be improved by getting multiple strings at once, but the rountrip problem is an essential one.
Shared properties files
Property file containing the current language is simply exposed as an additional JS resource on the page.
<script src="/locales/en.js" /> // brings in the i18n object
function foo() {
alert(i18n.alert.text);
}
Pros: Fast and reliable.
Cons: All the strings are pulled in - also the ones we don't need or want to expose to the user.
This can be improved by keeping a separate set of strings for JS, but that violates the DRY principle.
Now what?
So that's it, that's the ideas I've had. None of them is ideal, all have their own share of problems. I am currently usings the first two approaches, with a mixed success. Are there any other options?
Your idea with a shared properties file is the neater solution out of the 4 ideas you suggested. A popular CMS I use called Silverstripe actually does the same thing, loads a localised JS file that adds the strings to a dictionary, allowing a common API for retrieving the strings.
One point made in the comments is about including localised strings for a particular view. While this can have some uses under particular situations where you have thousands of strings per localisation (totaling more than a few hundred KBs), it can also be a little unnecessary.
Client-side
Depending how many other JS resources you are loading at the same time, you may not want another request per view just to add a few more strings for that locale. Each view's localisation would need to be requested separately which can be a little inefficient. Give the browser all the localisations in one request and let it just read from its cache for each view.
The browser caching the complete collection of locale strings can lead to a better user experience with faster page load times with one less request per view. For mobile users, this can be quite helpful as even with faster mobile internet, not every single request is lightning fast.
Server-side
If you go by the method suggested in the comments by having a locale.asp file generating the JS locale strings on the fly, you are giving the server a little more work per user. This won't be that bad if each user requests it once however if it is request per view, it might start adding up.
If the user views 5 different pages, that is 5 times the server is executing the JSP, building the data for the particular view. While your code might be basic if-statements and loading a few files from the filesystem, there is still overhead in executing that code. While it might not be a problem say for 10 requests per minute, it could lead to issues with 1,000 requests per minute.
Again, that extra overhead can be small but it just simply isn't necessary unless you really want many small HTTP requests instead of few larger HTTP requests and little browser caching.
Additional Thoughts
While this might sound like premature optimisation, I think it is a simple and important thing to consider. We don't know whether you have 5 users or 5,000, whether your users go see 5 different views or 500, whether you will have many/any mobile users, how many locales you want to support, how many different strings per locale you have.
Because of this I think it is best to see the larger picture of what the choice of having locale strings downloaded per view would do.
I have some data that I want to display on a web page. There's quite a lot of data so I really need to figure out the most optimized way of loading and parsing it. In CSV format, the file size is 244K, and in JSON it's 819K. As I see it, I have three different options:
Load the web page and fetch the data in CSV format as an Ajax request. Then transform the data into a JS object in the browser (I'm using a built-in method of the D3.js library to accomplish this).
Load the web page and fetch the data in JSON format as an Ajax request. Data is ready to go as is.
Hard code the data in the main JS file as a JS object. No need for any async requests.
Method number one has the advantage of reduced file size, but the disadvantage of having to loop through all (2700) rows of data in the browser. Method number two gives us the data in the end-format so there's no need for heavy client-side operations. However, the size of the JSON file is huge. Method number three has the advantage of skipping additional requests to the server, with the disadvantage of a longer initial page load time.
What method is the best one in terms of optimization?
In my experience, data processing times in Javascript are usually dwarfed by transfer times and the time it takes to render the display. Based on this, I would recommend going with option 1.
However, what's best in your particular case really does depend on your particular case -- you'll have to try. It sounds like you have all the code/data you need to do that anyway, so why not run a simple experiment to see which one works best for you.
I want to create a AJAX search to find and list topics in a forum (just topic link and subject).
The question is: Which one of the methods is better and faster?
GET threads list as a JSON string and convert it to an object, then loop over items and create a <li/> or <tr>, write data (link, subject) and append it to threads list. (jQuery Powered)
GET threads list which it wrapped in HTML tags and print it (or use innerHTML and $(e).html())
Thanks...
I prefer the second method.
I figure server-side you have to either convert your data to JSON or html format so why not go directly to the one the browser understands and avoid having to reprocess it client-side. Also you can easily adapt the second method to degrade gracefully for users who have disabled JavaScript (such that they still see the results via standard non-JS links.)
I'm not sure which way is better (I assume the second method is better as it would seem to touch the data less) but a definitive way to found out is try both ways and measure which one does better.
'Faster' is probably the second method.
'Better' is probably subjective.
For example, I've been in situations (as a front end dev) where I couldn't alter the html the server was returning and i wished they would have just delivered a json object so i could design the page how i wanted.
Also, (perhaps not specific to your use case), serving up all the html on initial page load could increase the page size and load time.
Server generated HTML is certainly faster if the javascript takes long time to process the JSON and populate the html.
However, for maintainability, JS is better. You can change HTML generation just by changing JS, not having to update server side code, making a delta release etc etc.
Best is to measure how slow it really is. Sometimes we think it is slow, but then you try it out in real world and you don't really see a big difference. You might have the major delay in transmitting the JSON object. That delay will still be there and infact increase if you send an html representation from the server.
So, if you bottleneck really is parsing JSON and generating html, not the transmission from server, then sending html from server makes sense.
However, you can do a lot of optimization in producing the html and parsing JSON. There are so many tricks to make that faster. Best if you show me the code and I can help you make a fast JS based implementation or can tell you to do it on the server.
I'm starting to do some JS/HTML/CSS. From looking around, it seems that it's not unusual to return HTML from the back-end (for example, an Ajax response) and directly display it (such as by assigning it to an element's innerHTML). For example, I believe that the jQuery load() method basically is a shortcut to do this.
Taking the approach worries me for a couple of reasons, but I'm not sure if it's just that I'm not familiar with the approaches and idioms in these areas and I am just behind the times or whether these are legitimate concerns. My concerns specifically are:
1) It seems insecure to directly assign HTML to an element. Or, at a minimum, dangerous at least if there's a possibility of any user content (or even third-party content).
2) Sending presentation information (HTML) directly seems like it could likely lead to presentation/model mixing that is best avoided. Of course, it would be possible to have these cleanly separated on the back-end and still return HTML, but on the handful of projects that I've seen that hasn't been the case.
So, my question is, is returning HTML a legitimate form of HTTP response in an Ajax app or is it best avoided?
I don't see right or wrong way to do this, it depends on the ammount of data you are sending and how fast you want it rendered. Inserting HTML directly is faster than building elements from JSON or XML. XSS should not be an issue because you should be escaping user data regardless of the format you're sending it in.
If you take a look at Facebook, all XHR responses (as far as I saw, I only started looking when I saw your question :) are something like:
for (;;);{"__ar":1,"payload":"\u003cdiv class=\"ego_column\">\u003cdiv
class=\"ego_section\">\u003cdiv class=\"uiHeader uiHeaderTopAndBottomBorder
mbs uiSideHeader\">\u003cdiv class=\"clearfix uiHeaderTop\">\u003ca
class=\"uiHeaderActions rfloat\" href=\"http:\/\/www.facebook.com\/campaign\
/landing.php?placement=advf2&campaign_id=368901427978&extra_1=auto\">
Create an Ad\u003c\/a>\u003cdiv>\u003ch4 class=\"uiHeaderTitle\">Sponsored
\u003c\/h4> [...]" }
Their AJAX is content-heavy, so it probably pays off for them to send HTML. Probably their achitecture deals with structure-presentation separation.
I think it depends on the use case to be honest. There is a fairly heafty penalty to be paid on the client if it has to construct a lot of HTML based on some JSON or XML data.
Personally I use a mixture of both - if it's just a small bit of data (an integer or a small string) I'll use JSON or even just the raw data on its own.
If its a complicated set of data (say a bunch of user comments) that I'm going to have to format on the client side then I'll just send the html and save the client the work.
Personally I wouldn't worry about security, at least not users injecting malicious HTML - you should be dealing with that when it's submitted anyway.
Edit: There is an exception to this - when bandwidth is a concern (mobile web, for instance) then sending as little data over the wire is nearly always best.
It's a simple case of a javascript that continuously asks "are there yet?" Like a four year old on a car drive.. But, much like parents, if you do this too often or, with too many kids at once, the server will buckle under pressure..
How do you solve the issue of having a webpage that looks for new content in the order of every 5 seconds and that allows for a larger number of visitors?
stackoverflow does it some way, don't know how though.
The more standard way would indeed be the javascript that looks for new content every few seconds.
A more advanced way would use a push-like technique, by using Comet techniques (long-polling and such). There's a lot of interesting stuff under that link.
I'm still waiting for a good opportunity to use it myself...
Oh, and here's a link from stackoverflow about it:
Is there some way to PUSH data from web server to browser?
In Java I used Ajax library (DWR) using Comet technology - I think you should search for library in PHP using it.
The idea is that server is sending one very long Http response and when it has something to send to the client it ends it and send new response with updated data.
Using it client doens't have to ping server every x seconds to get new data - I think it could help you.
You could make the poll time variable depending on the number of clients. Using your metaphor, the kid asks "Are we there yet?" and the driver responds "No, but maybe in an hour". Thankfully, Javascript isn't a stubborn kid so you can be sure he won't bug you until then.
You could consider polling every 5 seconds to start with, but after a while start to increase the poll interval time - perhaps up to some upper limit (1 minute, 5 minute - whatever seems optimal for your usage). The increase doesn't have to be linear.
A more sophisticated spin (which could incorporate monzee's suggestion to vary by number of clients), would be to allow the server to dictate the interval before next poll. The server could then increase the intervale over time, and you can even change the algorithm on the fly, or in response to network load.
You could take a look at the 'Twisted' framework in python. It's event-driven network programming framework that might satisfy what you are looking for. It can be used to push messages from the server.
Perhaps you can send a query to a real simple script, that doesn't need to make a real db-query, but only uses a simple timestamp to tell if there is anything new.
And then, if the answer is true, you can do a real query, where the server has to do real work !-)
I would have a single instance calling the DB and if a newer timestamp exists, put that new timestamp in a application variable. Then let all sessions check against that application variable. Or something like that. That way only one innstance are calling the sql-server and the number of clients does'nt matter.
I havent tried this and its just the first idéa on the top of the head but I think that cashe the timestamp and let the clients check the cashe is a way to do it, and how to implement the cashe (sql-server-cashe, application variable and so on) I dont know whats best.
Regarding how SO does it, note that it doesn't check for new answers continuously, only when you're typing into the "Your Answer" box.
The key then, is to first do a computationally cheap operation to weed out common "no update needed" cases (e.g., entering a new answer or checking a timestamp) before initiating a more expensive process to actually retrieve any changes.
Alternately, depending on your application, you may be able to resolve this by optimizing your change-publishing mechanism. For example, perhaps it might be feasible for changes (or summaries of them) to be put onto an RSS feed and have clients watch the feed instead of the real application. We can assume that this would be fairly efficient, as it's exactly the sort of thing RSS is designed and optimized for, plus it would have the additional benefit of making your application much more interoperable with the rest of the world at little or no cost to you.
I believe the approach shd be based on a combination of server-side sockets and client-side ajax/comet. Like:
Assume a chat application with several logged on users, and that each of them is listening via a slow-load AJAX call to the server-side listener script.
Whatever browser gets the just-entered data submits it to the server with an ajax call to a writer script. That server updates the database (or storage system) and posts a sockets write to noted listener script. The latter then gets the fresh data and posts it back to the client browser.
Now I haven't yet written this, and right now I dunno whether/how the browser limit of two concurrent connections screws up the above logic.
Will appreciate hearing fm anyone with thoughts here.
AS