The speed of scraping tweet on a remote server depends on what? - javascript

I am working on my first webapp project which I plan to publish using a remote server. I have a question about the architecture.
My webapp is to scrape tweets using twitterscraper Python package. A user who visits the website enters some keywords and click "Scrape" button. A Python backend scrapes the tweets containing the keywords, goes through some Natural Language Processing analysis, and visualise the result in charts. This twitterscraper package lets you scrape tweets using Beautiful Soup, therefore you don't need to create an API credential. The scraping speed depends on the bandwidth of the internet that you are using.
I made a Python script, JavaScript file, html file and css file. In my local environment the webapp works perfectly.
So the question is, after I put these files on the hosting server and publish the webapp, when a user clicks "Scrape" button, on what does the scraping speed depend? The bandwidth of the internet that the user is using? Or is there any "bandwidth" that the server is relying on?
As I said I am very new to this kind of architecture. So it would also be nice to suggest me an alternative way for structuring this kind of webapp. Thank you!

Where the bottle-neck is depends on a bunch of different variables.
If you're doing a lot of data manipulation, but you don't have a lot of CPU time allocated to the program (i.e. there are too many users for your processor to handle), it could slow down there.
If you don't have sufficient memory, and you're trying to parse and return a lot of data, it could slow down there.
Because you're also talking to Twitter, whatever the bandwidth restrictions are between your server and the twitter server will affect the speed at which you can retrieve results from their API, and so the time it takes your program to respond to a user.
There's also the connection between yourself and the user. If that's slow, it could affect your program.

Related

Progressive Web Apps and Private SQL Credentials

I am tasked with converting a PHP application into a progressive web app. This entails converting the existing PHP logic into JavaScript that runs client-side.
However, the PHP application contains sensitive information, including SQL credentials, which must never be leaked. This complicates the conversion because one of the biggest requirements of a progressive web app is Offline First, or the ability to operate without an Internet connection and/or not slow down even if an Internet connection is available.
Encrypting the JavaScript code is not an option because, no matter how strong the encryption, the decryption code must be shipped alongside it, and thus, determined hackers will always be able to crack the encryption. HTTPS cannot prevent hackers from jailbreaking their phones.
On the other hand, sending an Ajax request to a proxy server that holds the sensitive credentials will slow down the application, defeating the whole point of progressive web applications.
I have spent hours looking up solutions online, yet nothing I found is relevant enough. So how should developers go about ensuring that SQL credentials and other sensitive information are never exposed in the progressive web app?
EDIT: I should clarify that, while I understand that synchronizing local data with server data is the preferred behavior of progressive web apps, I am explicitly forbidden from doing so in this particular case. The data must be kept confidential.
To answer your original question on how to store your DB passwords safely in client side, "you can't". Anything at client side is not for sensitive information like server side DB password.
PWA is a web application end of the day with new features. But those doesn't gives you any added security to perform server side like operations which you can hide from users. Even if you use HTTPS, it will only encrypt data over network.
What if you use: If you store "DB password" in a PWA app or any web app for that matter, user can get the password using Chrome Dev tools for example and use that to connect to DB directly to get all the data in it, not just his.
Solution: PHP is a server side scripting language. When you convert that to HTML/JS, server side code from it will be remaining for your to put it again in server side itself and expose the data using web services to PWA.
On Downloading data: Caching is not plainly equivalent to downloading. Read more on here and if you still don't want caching, you "Network only" mode as explained in the same link and make use of other PWA aspects..like notifications, install to home screen.

AngularJS: how to get system remote?

Please let me elaborate what is my goal
How to take a remote desktop connection from my Angular.js application of a windows application running system. My server is Google App Engine.
What I have thought of so far:
Windows application will take screen shots and send to Google App Engine Channel API.
The Google App Engine channel API will notify the Angular app and send it the screen shots and show it.
The problem with this method is that it's very costly and slow.
Request
Please suggest some tool or api or a way to make a screen sharing application.
This will not be the answer you are looking for but read on either way.
tl;dr;
What you are trying to do is not an App Engine use case and you really shouldn't use App Engine to implement this kind of solution.
long version:
As you found out yourself the channel API will become costly and slow for what you are trying to do. This is because the channel API simply isn't made to stream large amounts of data to the client. It's meant to send regular updates to client, like incoming updates for a real time chat or news ticker. Best case scenario is that you only notify the client of new content and the client requests this new content from the server. So you could send the URL of the new screenshot to the client and the client requests it. When you stream a desktop or a video this is very unnecessary though because what you want with that kind of streaming is as many updates as you can get. You might as well just poll every few milliseconds.
Using screenshots to share a desktop is a particular kind of madness because the data "stream" cannot be compressed properly and will thus be way larger than it has to be. Usually remote desktop systems use compression very similar to video compression algorithms where only the changes / difference of the previous picture / frame will be transmitted and there's a full key frame once in a while. More data means more bandwidth and more latency in your stream. It's really important that you at least try to minimize the dataflow as much as possible.
The goal in most App Engine applications is to allow scaling to thousands of parallel connections. This is accomplished by allowing multiple instances to serve the same content and by enforcing several restriction (like 60 seconds request deadline for frontend / 10 minutes for backend request, maximum bandwidth usuage in a single request, etc.) which chop huge tasks into small requests which can then be served by the multitude of app engine instances. The same restrictions will not allow you to create a long running continuous data stream for something like video or remote desktop streaming. If you poll every few milliseconds as suggested above, app engine would spawn new instances on a regular basis which would cause warm up requests and further delays.
But enough of what won't work, this is an example of what should work:
Streaming server are actually servers which allow direct streaming to clients
Streaming servers publish their service URL to your app engine application
Your AngularJS application requests a stream from the app engine application
App Engine tells the AngularJS application the streaming server information from above
The client requests the stream directly from the server
This approach leaves out app engine as a proxy for your data - so you don't have to worry about the streaming data. It does however require your server to be directly available on the internet.
Alternatively, there are a vast number of applications / services (twitch.tv to name an example) available which allow desktop streaming without you writing a single line of code. Such streams could simply be embedded in your Angular application. Since this is not Software Recommendations i don't want to go any deeper into this matter here.

How to Communicate Between Web App and C Sharp

I'm attempting to make a Web app that needs to communicate to a program written in C Sharp. But I can't find a good form of communication. What I need is if a user clicks something on the Web app, it will notify the C Sharp program. Also, if an event happens on the C Sharp program, it needs to alert the Web app. Both of these are going to be running on the same machine.
Right now I'm mainly focusing on the C Sharp program just periodically "asking" what the status of the Web app is.
I've tried using POST requests to the Web app and I've had a bit of success with this but I don't know how to essentially store and update a "status" on the Web App. For example, C Sharp program sends a POST/GET request asking for the status, the Web app responds with "nothing has changed" or some sort of status code. I don't know how to keep track of that status.
I've attempted using Web Sockets but I don't think it is going to be possible on the C Sharp side. However, I'm definitely open to suggestions on how this might work.
I've looked into using the ReST architectural style but I'm having a hard time understanding how I would implement it. I'm using mainly AJAX on an apache server and most of the ReST examples I saw used IIS.
One way I've been successful with this is a horrible workaround. I use 3 files to store contents, status of Web app, and status of C Sharp program. But this requires me constantly fetching files, reading them, writing a new one, and uploading it.
Sorry if this question is poorly formed, I'm obviously new to a lot of this and this is my first SO post. Also, I'd include example code but I'm posting this from my tablet so it's not accessible right now.
If they are on the same machine, you can use 'pipes' (Unix), local sockets or file handlers.
These are all types of IO objects both applications can 'listen' to without exposing themselves to the network and without blocking while they are 'waiting' for data..
... But this will limit your scalability.
Another option is to use a Pub/Sub service such as Redis. This is a better option than websockets, because you can have multiple C# apps listening to multiple web apps on a centralized data source.
It uses the same IO concept (using sockets) behind an abstraction layer coupled with a database - it's very effective.
Good luck!
I implemented something similar to this. I needed a desktop application to listen for api calls from the browser. I ultimately decided to implement a "web connector" which can either be created as part of the application OR installed as a service.
Here is a good example: https://msdn.microsoft.com/en-us/library/system.net.sockets.tcplistener(v=vs.110).aspx

The anatomy of uploading

I am wondering what is the general consensus for uploading moderately large files. I have a web app, and every time a user uploads a file (typically larger than 5mb), the web server tends to hang until the file upload is finished.
The above seems normal, because a single upload can take up a single HTTP request handler. Do web devs take this into consideration and either:
a) Pay for more HTTP handlers
b) Use some other method to overcome this by using AJAX, or other approach
I've heard that it is quite normal for web apps to have a few HTTP request handlers to take care of this, which will cost quite a bit more. On the other, if cost is an issue, then some have suggested trying to upload directly to the web server or storage service (i.e. Amazon S3) directly via Flash + AJAX. The latter method takes a bit of scripting and is a bit messy.
My second concern:
By using ajax to upload files onto a server. Does this still take up a whole HTTP request handler? i.e. does the server hang until the upload is finished?
Even with flash, I would still need to specify a url to upload to. The url would be one of the actions on my controller. Which would mean that processing still takes place on the server side. Is this right so far?
I was thinking. If I were, in the other hand, to use one of the upload scripts (plupload, uploadify, swfupload, etc) to upload directly to Amazon S3, then the processing is handled on the S3 server instead of the local web server. Which wont hang the web app at all. Am I understanding this correctly?
Would like to hear your feedback.
For large uploads you should use non-blocking, evented servers like Node.js, Twisted on Pyhon,
AnyEvent on Perl or EventMachine on Ruby. Using the thread-per-connection model is just too expensive for long running connections.
It is not uncommon for Node.js users to have so many simultaneous connections that they actually hit their operating systems limits while still not using all their resources - for example see this question asked by someone who was concerned by having only 30 thousand simultaneous connections and then managed to reach over 60 thousand connections on a single server with 4GB of RAM.
The point is that if your are concerned about your connections blocking your server from serving new requests, then you shouldn't use a blocking server in the first place.
I'm currently developing a web app which handles multiple image uploads simultaneously. I researched far and wide, and the best option I found was swfupload. It is super easy to implement and highly customizable. Users can select multiple files from the dialogue box, add them to a queue, and get actual progress feedback from the browser. So that lag isn't such a big deal to the user.
Though, bah.....it uses flash to initialize the dialogue box, but everything else is handled with good old javascript.
A great working example is carbonmade.com
Thanks for the responses so far.
Unfortunately, our host Heroku does not support non-blocking, evented servers. I've also tried flash + javascript based uploaders like SWFUpload, Uploadify. Some variations of the mentioned plugins worked, and some didn't. Spent countless hours of trial and error, but didnt like how the code was being integrated on my Rails app.
In the end, went with manually uploading the file to S3 directly following this link. Which also enables a response back from the S3 server to notify us that an upload was successful, giving us the path to the uploaded file so that we can then create a background job (via redis + resque) to process the file.
In the future if you are going to do direct uploading to S3 via Rails, please check out my sample projects below. You will save yourself many, many headaches and it's not very "messy" :)
Sample project using Rails 3, Flash and MooTools-based FancyUploader to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-FancyUploader
Sample project using Rails 3, Flash/Silverlight/GoogleGears/BrowserPlus and jQuery-based Plupload to upload directly to S3: https://github.com/iwasrobbed/Rails3-S3-Uploader-Plupload
By the way, you can do post-processing with Paperclip using something like this blog post describes:
http://www.railstoolkit.com/posts/fancyupload-amazon-s3-uploader-with-paperclip

Implementing Forum Live View with Ajax and JSP

I'm starting a personal project, so I have at the moment complete architectural/design control. I'm just planning out the structure at this point. My goal is some sort of web forum, chat thing. The difference is it should update live, new posts growing on client views soon after they've hit the server.
I think to use ajax and jquery to download a viewed thread's new posts (from a tomcat server), the posts will be some small XML structure that is compiled into a nice post on the client side. This hopefully reduces my bandwidth costs. Bandwidth is my primary concern. I'm worried that having a few users with a javascript thread polling the server every ten seconds will cause a storm of http requests to my server, even if the content is small.
Is there a better way than having each user perform polling? I can write the backend in any structure neccessary, frontend too for that matter. I want to stay away from Flash and Silverlight. As a public webpage it might end up with a lot of viewers (every web dev's deam). Having everyone polling at 30 second intervals will be an incredible number of hits to support, and 30 seconds is probably too slow for 'live view' anyway!
My preferred language is JSP.
Client side pooling is not the only option to implement "live view". You should consider, so called "Reverse AJAX" technique as well.
Furthermore you could use some of well established frameworks that provide you with that functionality out of the box: DWR or even JSF(ice faces).

Categories