I need to develop an chrome extension to perform scraping on certain web pages of our customers in the private area of each user once logged on those web pages (all this always under the approval of the user informing him at all times the data that will be obtained). Each of our clients has a different website and the necessary scraping code is different for each of the clients. Every month we get new clients, so we have to develop the necessary scraping script for new clients. In addition, the information obtained from scraping will be sent by the extension to our REST web service for storage in a database.
To avoid having to generate a new version of the extension every time we develop the scraping script of the new clients, I had thought of developing a web service of type REST that the extension would initially consult, and that would return the URLs of all the clients, along with all the scripts associated with each client. In this way, we would only make a version of the extension and the scraping script would store it in a database and return it to the extension through the web service indicated above and the extension would inject it as a content script using the chrome.tabs.executeScript() method. Once the scraping is done, the result would be sent by the extension to the corresponding web service.
Would the architecture that I have commented be feasible in terms of security? If it is not, what options do I have? This project is very important for my company and I have to look for the right options.
Is there a problem with the injection of javascript code stored in our external system?
Is there a problem with sending the information obtained by scraping to our external server for storage?
I am very worried about all this, can someone help me?
Thank you very much in advance.
Related
I am working on my first webapp project which I plan to publish using a remote server. I have a question about the architecture.
My webapp is to scrape tweets using twitterscraper Python package. A user who visits the website enters some keywords and click "Scrape" button. A Python backend scrapes the tweets containing the keywords, goes through some Natural Language Processing analysis, and visualise the result in charts. This twitterscraper package lets you scrape tweets using Beautiful Soup, therefore you don't need to create an API credential. The scraping speed depends on the bandwidth of the internet that you are using.
I made a Python script, JavaScript file, html file and css file. In my local environment the webapp works perfectly.
So the question is, after I put these files on the hosting server and publish the webapp, when a user clicks "Scrape" button, on what does the scraping speed depend? The bandwidth of the internet that the user is using? Or is there any "bandwidth" that the server is relying on?
As I said I am very new to this kind of architecture. So it would also be nice to suggest me an alternative way for structuring this kind of webapp. Thank you!
Where the bottle-neck is depends on a bunch of different variables.
If you're doing a lot of data manipulation, but you don't have a lot of CPU time allocated to the program (i.e. there are too many users for your processor to handle), it could slow down there.
If you don't have sufficient memory, and you're trying to parse and return a lot of data, it could slow down there.
Because you're also talking to Twitter, whatever the bandwidth restrictions are between your server and the twitter server will affect the speed at which you can retrieve results from their API, and so the time it takes your program to respond to a user.
There's also the connection between yourself and the user. If that's slow, it could affect your program.
I'm considering Apps script for a small project that involves fetching data from a local server that's not on the internet and then populating a Google sheets spread sheet with the data... I can't seem to find a clear answer to this question anywhere.
Is it possible to make HTTP requests from a Google spread sheet using Apps Script to a local server? I'm assuming as the Javascript is Client side code it should be possible?
It is possible as long as the local server is route able from a public address. Apps Script runs exclusively on Google's infrastructure so even when you click run in the IDE it is not your browser running the code, but a google server. There is a way to work around this as app script can serve webpages that can communicate with your script called webapps. The served webpages do run in your browser and can access the localhost.
https://developers.google.com/apps-script/guides/web
https://developers.google.com/apps-script/guides/html/
I have come across few chat application and the website which I don't know how these things work. can someone tell me how these application works in the website.
I have seen some website with a chat widget for helping the customer for respective to their business. On signing in these application, a script file is send to email, when it is pasted over the website. Automatically a widget is created. And this widget and the application is connected externally.
Sorry, if my question is not clear. I will give some links of application and website which give you some idea what I'm trying to ask.
Website
http://www.a1000yoga.com/
http://www.voyzek.com/contact-us/
Application
https://www.zopim.com/
https://my.livechatinc.com/
What you mean is called providing Web API,
Wikipedia's definition of Web API is :
A server-side web API is a programmatic interface consisting of one or
more publicly exposed endpoints to a defined request-response message
system, typically expressed in JSON or XML, which is exposed via the
web—most commonly by means of an HTTP-based web server. Mashups are
web applications which combine the use of multiple server-side web
APIs.
In brief, They implement their service in their servers, and let you access their services by HTTP requests.
You sign up in their website and they generate a API token(a random string maybe), then when you want use their services, you send them your requests with your API token as authentication or identifying mean, and they process your request throw their application and with your data on their servers, and send you respond.
For example when you use this messaging services(or Captchas, Ads networks, etc), they provide a piece of Javascript containing your API code, then when someone views your web page, the code will send request with your API token to API provider servers and they process data for you.
You can then access to your data through their website, another API, email or etc.
Basically I'm developing .NET API that allows a certain Javascript to access a Database through it. The database contains User Information and the API is the mediator between the client (running the javascript on their websites) and the database. The javascript simply gets the data from the DB and displays them.
My problem is, where do I host my API so that the client Javascripts can access it? What is this system called? I'm using Microsoft Azure SQL Databases to store the user information. How do I access my C# API from the client's javascripts? Do I need to host my API on Azure's API Hosting service? Very confused.
1) Client adds the Javascript and a HTML div to their website
2) The Javascript should access the API
3) API accesses the Database and gets the Data (Which is completed and it works)
4) Send the Data to the client and the javascript populates it
I just need to figure out how to make a connection between the API and the javascript on the client's website
Do I need to use THIS?
I would use ASP.NET Web.Api. It allows you to build a REST endpoint in C# that you can host on the Azure platform as well. You will be able to host it using the web sites features of Azure. Even though you want to build an API not a web site hosting it in a web site container will give you what you need:
Easy hosting solutions
Web endpoints for your client JavaScript to consume
C#.NET
Web endpoints close to your database. (Host them in the same data center)
Scalability
Monitoring
Ability to create a web site at the same address if you need to.
I haven't used Azure api management so I can't comment on that, but you will be able to get an ASP.NET Web.Api site up very quickly.
How can a web application store a very large amount of data client-side? (I'm talking concretely about allowing a capacity of some millions of records).
What I want to do here is to allow research of these records offline.
All of the users are using Chrome.
I was opting for indexedDb until I read that with about 400k records, it is almost unusable.
Then there is the Web SQL, but it had been deprecated.
I was then thinking that my last option would be to install a web server like apache with small script locally that would interact with my web application and store the records in a DB like MySQL. With AJAX I could access my script in localhost, but then there is the cross-domain problem.
I ran out of ideas
Update: clarification->
The main web application is running on a distant server. It has to be on a server as the application is used by several people at different locations (it is shared), and need to be accessed by smartphone, etc. The last idea was to install a web application locally (on all of the user's computer), that would interact with that distant web application to fetch the records from it and store them locally. Anyway it wouldn't work because of cross-domain issues I guess.
I see few alternatives:
don't you actually need a desktop application. I know, I know it is so 1990's...
installing a local web server and accessing your application via web browser is an option as well. But this is dangerously close to point 1.
you might consider developing a Java applet and permitting it to use the file system