The anatomy of uploading - javascript

I am wondering what is the general consensus for uploading moderately large files. I have a web app, and every time a user uploads a file (typically larger than 5mb), the web server tends to hang until the file upload is finished.
The above seems normal, because a single upload can take up a single HTTP request handler. Do web devs take this into consideration and either:
a) Pay for more HTTP handlers
b) Use some other method to overcome this by using AJAX, or other approach
I've heard that it is quite normal for web apps to have a few HTTP request handlers to take care of this, which will cost quite a bit more. On the other, if cost is an issue, then some have suggested trying to upload directly to the web server or storage service (i.e. Amazon S3) directly via Flash + AJAX. The latter method takes a bit of scripting and is a bit messy.
My second concern:
By using ajax to upload files onto a server. Does this still take up a whole HTTP request handler? i.e. does the server hang until the upload is finished?
Even with flash, I would still need to specify a url to upload to. The url would be one of the actions on my controller. Which would mean that processing still takes place on the server side. Is this right so far?
I was thinking. If I were, in the other hand, to use one of the upload scripts (plupload, uploadify, swfupload, etc) to upload directly to Amazon S3, then the processing is handled on the S3 server instead of the local web server. Which wont hang the web app at all. Am I understanding this correctly?
Would like to hear your feedback.

For large uploads you should use non-blocking, evented servers like Node.js, Twisted on Pyhon,
AnyEvent on Perl or EventMachine on Ruby. Using the thread-per-connection model is just too expensive for long running connections.
It is not uncommon for Node.js users to have so many simultaneous connections that they actually hit their operating systems limits while still not using all their resources - for example see this question asked by someone who was concerned by having only 30 thousand simultaneous connections and then managed to reach over 60 thousand connections on a single server with 4GB of RAM.
The point is that if your are concerned about your connections blocking your server from serving new requests, then you shouldn't use a blocking server in the first place.

I'm currently developing a web app which handles multiple image uploads simultaneously. I researched far and wide, and the best option I found was swfupload. It is super easy to implement and highly customizable. Users can select multiple files from the dialogue box, add them to a queue, and get actual progress feedback from the browser. So that lag isn't such a big deal to the user.
Though, bah.....it uses flash to initialize the dialogue box, but everything else is handled with good old javascript.
A great working example is carbonmade.com

Thanks for the responses so far.
Unfortunately, our host Heroku does not support non-blocking, evented servers. I've also tried flash + javascript based uploaders like SWFUpload, Uploadify. Some variations of the mentioned plugins worked, and some didn't. Spent countless hours of trial and error, but didnt like how the code was being integrated on my Rails app.
In the end, went with manually uploading the file to S3 directly following this link. Which also enables a response back from the S3 server to notify us that an upload was successful, giving us the path to the uploaded file so that we can then create a background job (via redis + resque) to process the file.

In the future if you are going to do direct uploading to S3 via Rails, please check out my sample projects below. You will save yourself many, many headaches and it's not very "messy" :)
Sample project using Rails 3, Flash and MooTools-based FancyUploader to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-FancyUploader
Sample project using Rails 3, Flash/Silverlight/GoogleGears/BrowserPlus and jQuery-based Plupload to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-Plupload
By the way, you can do post-processing with Paperclip using something like this blog post describes:
http://www.railstoolkit.com/posts/fancyupload-amazon-s3-uploader-with-paperclip

Related

The speed of scraping tweet on a remote server depends on what?

I am working on my first webapp project which I plan to publish using a remote server. I have a question about the architecture.
My webapp is to scrape tweets using twitterscraper Python package. A user who visits the website enters some keywords and click "Scrape" button. A Python backend scrapes the tweets containing the keywords, goes through some Natural Language Processing analysis, and visualise the result in charts. This twitterscraper package lets you scrape tweets using Beautiful Soup, therefore you don't need to create an API credential. The scraping speed depends on the bandwidth of the internet that you are using.
I made a Python script, JavaScript file, html file and css file. In my local environment the webapp works perfectly.
So the question is, after I put these files on the hosting server and publish the webapp, when a user clicks "Scrape" button, on what does the scraping speed depend? The bandwidth of the internet that the user is using? Or is there any "bandwidth" that the server is relying on?
As I said I am very new to this kind of architecture. So it would also be nice to suggest me an alternative way for structuring this kind of webapp. Thank you!
Where the bottle-neck is depends on a bunch of different variables.
If you're doing a lot of data manipulation, but you don't have a lot of CPU time allocated to the program (i.e. there are too many users for your processor to handle), it could slow down there.
If you don't have sufficient memory, and you're trying to parse and return a lot of data, it could slow down there.
Because you're also talking to Twitter, whatever the bandwidth restrictions are between your server and the twitter server will affect the speed at which you can retrieve results from their API, and so the time it takes your program to respond to a user.
There's also the connection between yourself and the user. If that's slow, it could affect your program.

A question about how web applications work and how server-client is implemented

This is kind of a weird question I think to ask, but I have browsing about for the past some time and cannot find a clear definite answer.
I understand that a client connects to its own server and communicates with the web-server through sockets and I kind of see how that works in php (I have never used php but have used sockets before so I understand the concept).
The issue is I'm trying to get a real view of this.
The question is, do websites generally use sockets and contact a web-server to fetch data or the actual html? Or is it a rare choice made in some areas?
If it is generally used, then is the "real" js usually in the server? or is it client-side (for performance sake)?
Context:
Let me explain a bit where I'm coming from, I'm not a web expert, but I am a computer engineering student so most concepts are easy to understand. A "real"-er view of this would be very helpful.
Now, onto why I'm asking this. I'm developing a web-app as part of a project and have done a fair bit of progress on it but everything was done on a local dev server (so basically a client?)
I've started wondering about this because I wanted to use a database for my website and since I want to connect to something, I will need to connect to a web-server first (for security sake).
My question's intent is to guide me on how and most importantly, where, to setup this server.
I don't think showing any code would be of help here, but assume I have my client running on localhost:1234, my database on localhost:3306, I think I should have a web-server on another port so I can establish this communication, but I want to do it in a clean and legitimate way so all of my current solutions can be ported online with little to no changes (except the obvious)
There's a bunch to unpack here.
First of all, servers can be distant or local. Usually they are distant, local server are mostly used for development purposes.
Even if your server is on your local machine, it still isn't the client. The client is the part that is connecting to your server. For web development it is usually the user browser.
Javascript is a language that can be used server-side, with a NodeJS server, but more often client-side, in your user browser.
Your website, or web application, communicate with your server through various means. Most common one is the HTTP protocol, used to make server requests such as data request to populate your page (in case of an API server, REST or otherwise), or simply request the actual page to display in the browser. The HTTP protocol works by resolving URLs, and making requests to your server registered to this url using special methods such as GET, POST, DELETE, etc...
Sockets are used to create a persistent connection with your server that works both ways. It is mostly used for realtime updates, such as a live chat, as it allows you to push updates from the server instead of having the client request everything.
In most cases the database can be found on the same server as the one serving the website or application, as it is a lot easier to handle, and often faster without the extra networks requests to get the data. However it can be placed on another server, with it's own API to get the data (not necessarily web related)
Ports such as 1234 or 3306 are often used for local development, however once your move your project to a host service, this is usually replace by urls. And the host service will provide you with a config to access the associated database. Or if you are building your own server you might still use ports. It is heavily dependent on your server config.
Hope this clear some things up.
In addition to #Morphyish answer, in the simplest case, a web browser (the client) requests an URL from a server. The URL contains the domain name of the server and some parameters. The server responds with HTML code. The browser interprets the code and renders the webpage.
The browser and the server communicates using HTTP protocol. HTTP is stateless and closes the connection after each request.
The server can respond with static HTML, e.g. by serving a static HTML file. Or, by serving dynamic HTML. Serving dynamic HTML requires some kind of server language (e.g. nodejs, PHP, python) that essentially concatenates strings to build the HTML code. Usually, the HTML is created by filling templates with data from the database (e.g. MySQL, Postgres).
There are countless languages, frameworks, libraries that help to achieve this.
In addition to HTML, the server can also serve javascript that is interpreted in the browser and adds dynamics to the webpage. However, there could be 2 types of javascript that should not be mixed. NodeJS runs on the server and formats the server response, client javascript runs on the browser. Remember, client and server are completely isolated and can communicate only through an HTTP connection.
That said, there ways to make persistent connections between client and server with WebSockets, and add all kinds of exotic solutions. The core principle remains the same.
It does not matter if server software (e.g apache, nginx) is running on your local machine or anywhere else. The browser makes a request to an address, the DNS and network stack figures out how to reach the server and makes it work.

Angular - how to test Internet upload speed without backend?

I want to upload file into folder from which my Angular app is served while running on localhost. I'm not able to find any solution without using backend.
For example I just want to upload an image file and that file should copy in specified folder of the project. This should be done only with Angular without using any Backend script or hitting any API endpoint.
Depending on your webhost, you can make your assets-folder accessible via FTP.
Making a FTP-call from javascript (angular is javascript) isn't that difficult. And there are plenty of example and questions about it on the internet (like this)
Why you wouldn't do that:
The credentials for your ftp-connection will be accessible in the compiled javascript-code. With a little bit of effort, everyone can find it.
Each gate you open through the webhosts firewall, is a extra vulnerability. Thats why everybody will recommend you to add an API endpoint for uploading files so that you keep holding the strings of what may be uploaded.
Edit:
As I read your question again and all the sub-answers, I (think) figured out that you are building an native-like app with no back-end, just an angular-single page front-end application. An I can understand why (you can run this on every platform in an application that supports javascript), but the problem you are encountering is only the first of a whole series.
If this is the case, I wouldn't call it uploadingas you would store it locally.
But the good news is that you have localstoragefor your use to store temporary data on the HDD of the client. It isn't a very large space but it is something...
The assets folder is one of the statically served folders of the Angular app. It is located on the server so you can't add files to it without hitting the server (HTTP server, API, or whatever else...).
Even when running your app on localhost, there's a web server under the hood, so it behaves exactly the same than a deployed application, and you can't add files to the assets folder via the Angular app.
I don't know what exactly you want to do with your uploaded files, but:
If you want to use them on client side only, and in one user session, then you can just store the file in a javascript variable and do what you want with it
If you want to share them across users, or across user sessions, then you need to store them on the server, and you can't bypass an API or some HTTP server configuration
Based on your clarification in one of your comments:
I'm trying to develop a small speed test application in which user can upload any file from his system to check upload and download speed.
The only way to avoid having you own backend is to use 3rd party API.
There are some dedicated speed test websites, which also provide API access. E.g.:
https://myspeed.today
http://www.speedtest.net
https://speedof.me/api.html
Some more: https://duckduckgo.com/?q=free+speedtest+api
Note, that many of these APIs are paid services.
Also, I've been able to find this library https://github.com/ddsol/speedtest.net, which might indicate that speedtest.net has some kind of free API tier. But this is up to you to investigate.
This question might also be of help, as it shows using speedtest.net in React Native: Using speedtest.net api with React Native
You can use a third party library such ng-speed-test. For instance here is an Angular library which has an image hosted on a third party server (ie GitHub) to test internet speed.

Image upload to Heroku App

I am currently investigating a way to upload Images to a Heroku repo where I have a python-application that takes in the images, has them classified and saves the results in a .csv file.
The Images can be selected for upload via a website that uses Javascript and HTML.
My Question now is, how would I best enable the upload from the website to the Heroku App?
Bearing in mind that the Frontend is currently running on my local machine and that I want to use Heroku as a Backend to take in either Images or Strings.
Will I need an SSH-connection to a separate Web server? Will I need to use Amazon S3?
Not looking for a complete Solution to my problem per se, but if someone could point me in the right direction as to what I will need to solve my problem that would be great.
You could upload an image to Heroku however there two problems with that
Heroku router times out requests after 30 seconds which means that if your users have a spotty connection and/or huge files the upload will fail
Heroku's ephemeral filesystem means that you must process this file in your web process because workers run on different dynos and don't have access to your web dyno filesystem. So that's another strike at 30 seconds timeout.
Your best bet is to have your users upload their files directly to s3 from their browsers. We had a good experience with filestack.com js widget, but there are other ways.
Your page will then ping your backend with this newly uploaded file's s3 url. The backend will launch an asynchronous job using Heroku worker to process it.
This neatly solves all issues with timeouts and blocking your web dynos.

How to Communicate Between Web App and C Sharp

I'm attempting to make a Web app that needs to communicate to a program written in C Sharp. But I can't find a good form of communication. What I need is if a user clicks something on the Web app, it will notify the C Sharp program. Also, if an event happens on the C Sharp program, it needs to alert the Web app. Both of these are going to be running on the same machine.
Right now I'm mainly focusing on the C Sharp program just periodically "asking" what the status of the Web app is.
I've tried using POST requests to the Web app and I've had a bit of success with this but I don't know how to essentially store and update a "status" on the Web App. For example, C Sharp program sends a POST/GET request asking for the status, the Web app responds with "nothing has changed" or some sort of status code. I don't know how to keep track of that status.
I've attempted using Web Sockets but I don't think it is going to be possible on the C Sharp side. However, I'm definitely open to suggestions on how this might work.
I've looked into using the ReST architectural style but I'm having a hard time understanding how I would implement it. I'm using mainly AJAX on an apache server and most of the ReST examples I saw used IIS.
One way I've been successful with this is a horrible workaround. I use 3 files to store contents, status of Web app, and status of C Sharp program. But this requires me constantly fetching files, reading them, writing a new one, and uploading it.
Sorry if this question is poorly formed, I'm obviously new to a lot of this and this is my first SO post. Also, I'd include example code but I'm posting this from my tablet so it's not accessible right now.
If they are on the same machine, you can use 'pipes' (Unix), local sockets or file handlers.
These are all types of IO objects both applications can 'listen' to without exposing themselves to the network and without blocking while they are 'waiting' for data..
... But this will limit your scalability.
Another option is to use a Pub/Sub service such as Redis. This is a better option than websockets, because you can have multiple C# apps listening to multiple web apps on a centralized data source.
It uses the same IO concept (using sockets) behind an abstraction layer coupled with a database - it's very effective.
Good luck!
I implemented something similar to this. I needed a desktop application to listen for api calls from the browser. I ultimately decided to implement a "web connector" which can either be created as part of the application OR installed as a service.
Here is a good example: https://msdn.microsoft.com/en-us/library/system.net.sockets.tcplistener(v=vs.110).aspx

Categories