I'm a fresh-out-of-college programmer with some experience in Python and Javascript, and I'm trying to develop either a website or just a back-end system that will aggregate information from online market websites which don't have any API (or none that I've found, anyway). Ideally I would also want the system that can write to local storage to track changes to the data over time in some kind of database, but that's down the road a bit.
I've already pounded out some javascript that can grab the data I want, but apparently there doesn't seem to be a way to access or act upon data from other websites due to data security protections or to save the data to local storage in order to be read from other pages. I know that there are ways to aggregate data, as I've seen other websites that do this.
I can load websites in Python using the urllib2 and use regular expressions to parse what I want from some pages, but on a couple of the desired sites I would need to log into the website before I can access the data I want to gather.
Since I am relatively new to programming, is there an ideal tool / programming language that would streamline or simplify what I'm trying to do?
If not, could you please point me in the right direction for how I might go about this? After doing some searching, there seems to be a general lack of cross-domain data gathering and aggregation. Maybe I'm not even using the right terminology to describe what I'm trying to do.
Whichever way you would look at this, please help! :-)
i suggest you use selenium webdriver to login to get cookie,and use requests library to scrap the message.That is what my company do in the scraping system.if you only use selenium webdriver, you will need to many memory and cpu capacity.
if you are good at html and js,it is a good way for you to use requests library to Simulate logging.
for the website you must log in,the most import thing is to get cookie.
Related
I'm developer-beginner and I would like to do simple card-sorting site.
hundreds of cards with plain text
no log-ins, it must be as simple for user as possible
few dozens users per day
final state of sorting should be saved under unique link, like site.com/q56we1cfuz4 or site.com/link.php?unique=q56we1cfuz4
User sorts cards as he/she wishes and the result is saved into browser sessions storage (few kb of data in JSON or so). So far, so good.
Question:
As I know only Javascript:
If I need to share the state of the page (data in session storage) with some unique link, is something like Firebase.com good solution for such kind of back-end? Or some simple DB with help od Node.js?
In order to "catch" the link when someone click at unique URL (site.com/link.php?unique=q56we1cfuz4), I still need some server-side script which will ask the DB, like PHP header redirect, right?
Your questions are a little fuzzy, no problem tho. You are just getting into web dev, so there's a lot to wrap your head around and all of the options can be pretty confusing. Some people will complain about opinionated answers, and I'm not going to claim to be objective here, but here are a few tips that I think will get you pointed in a better direction...
Firstly, yes - firebase is a good solution for you to try working with. Aside from the fact that it will give you the db/storage features you need, it's a realtime db, which will give you a lot more than just storage in the long run, and will better equip you for the future web development trends. The firebase API is (mostly) designed to work asynchronously, so from a javascript perspective, it falls right in line with the kind of code you'll end up learning to write.
Re: the other aspect of your question - server-side - check out nodeJS. It's basically a server-side javascript platform that will allow you to use the same skills you're learning to write client-side code for the server. Also check out expressJS, a nodeJS package that provides you the http-server, and allows you to handle dynamic urls, etc. - the bits you were thinking about when you made a reference to PHP.
Hopefully this will help you focus on a few specific tools to familiarize yourself with as you learn web development, rather than having to struggle with everything from new languages, platforms, and way too many libraries, frameworks and techniques to wrap your head around. Good luck!
I don't know much about databases, I've been asking a few questions about them lately to get a better understanding but I'm still a bit confused about what does and doesn't need one.
I'm making a simple application using HTML/CSS/JavaScript, it has a few quizzes and "tutorials" targeted towards children. I don't want the next tutorial/quiz to be unlocked until the previous one is completed.
So for that would I need a database so that it "saves" when one is completed? I don't need to save scores or anything like that, they just get to move on once they get a passing score.
Any other requirements such as saving to a profile or needing to persist between sessions (e.g. changing of device)?
Browsers have localStorage APIs now which allow you to save a lot of the data (and keep it for a set duration of time). There are also good'ol'fashioned cookies which allow you save pieces of information as well.
Keep in mind that both of the above mandate the user use the same browser and allow these mechanisms. Obviously using "private"/"incognito" browsing would also affect saving status.
It's up to what you feel the requirements are.
EDIT Just saw your mention of a mobile app. If you're planning on allowing the experience to transcend devices, you'll need a database. otherwise, you'll be relying heavily on if they use cross-device sync (like Chrome and Firefox do with bookmarks, passwords, etc.)
If you don't mind that people can do a "view source" on the webpage or use every browsers' developer tools to find out the answers or move on to the next tutorial or quiz, then you can use cookies to store the user's status. Or you can use the preferable Web Storage API.
You might want to look at Firebase. Using just simple JavaScript on the web browser, you can have users with logins (or just allow them to login via Facebook or other services) very easily. And then you can store and retrieve data very easily as well, like quizzes, tutorials and results. This way nobody can see the answers even if they're adept at analyzing the webpage.
When you don't use database, before any check, you have to load all data in your static page.
So My sloution: store students situation in a cookie. On each page check cookie status and then use Jquery remove() to remove (Client-side) those parts of page that he/she can not access.
EDIT
This wont work when JavaScript is disabled.
There seems to be a lot of ideas but no clarifying on the database subject.
TL;DR is: No.
Now for the specifics. A database is nothing more than a way to store information. While traditional "SQL" databases (it is pronounced "Sequel" as in "My Sequel" for MySQL) have concepts of tables, where you define columns with items to store and saves each row with its value, much like an Excel file, some databases like Redis store key-value pairs and others lide MongoDB store JavaScript Objects.
You can store information in the source code (As Variables possibly) or in a file. A database is a way to organize that information.
With that said, in your case, you probably need a backend or an API. An API is basically a means of communication with a server through AJAX (JavaScript in the browser asks for stuff). That would be your way to retrieve information from the server as needed, so that users wouldn't see the answers before they answer.
With that out of the way, there are some options. FireBase (As noted on other answer) and AppBase are easy ways to integrate this concept with little effort. But they tie you and your information to their system, and they are mostly targeting more resource intensive apps.
Since you are using JS and seem to be enjoying your learning experience, I would suggest you consider suing NodeJS and defining the data as either a JSON file or a variable in JS. You keep working on your problem but add options and get to learn some stuff.
If you decide to integrate a database and possibly do some neat stuff, you have most of the groundwork done already.
If NodeJS picks your interest, Mean.IO and KrakenJS are, in my opinion, the best places to start, though they may both seem overkill in your specific case.
Do consider though: A database is just a small possible piece in a puzzle, and it's mostly a horrible way to name some of the software that tries to organize your information. Consider first if you need to organize information, and what and how do you need to organize, then start thinking if databases are the best way to organize it.
I have a web app that relies on html5 offline storage features so that it can be accessed by the user without an internet connection. The app essentially just serves html pages and a little bit of css and javascript.
I am trying to add the ability to search the text served on these pages for key words, but because the app isn't guaranteed access to the server it needs to be able to perform these searches on the client side.
My thought is I can store the searchable text in the browser's web sql database and perform the search either through javascript or through the browser's sql api. I have a few question about the best way to do this:
1) I vaguely remember an article about how to implement something like this, maybe from airbnb? Does anyone remember such an article?
2) The text is 2,000,000+ words so I would assume that indexOf is going to break down at this data size. Is there any chance regex will hold up? What are some options for implementing the actual search? (libraries, algorithms, etc.) Any article suggestions for understanding the tradeoffs of string search algorithms if I need to go down that road?
Well, I just wrote a quick benchmark for you and was surprised to find that you could probably get away with using String.indexOf(). I get about 35ms per search, which is about 30 searches per second.
EDIT: a better benchmark. There appears to be some sort of initialization delay, but it looks like indexOf is pretty fast. You could play around with the benchmark and see if it looks like it will work for you.
I'm looking for any javascript library that i can use to search content on my website, i have came across quiet a few but mostly they require the use of a database to store indexes for optimizing search queries, but i only need a database free search engine built in with javascript. can anyone direct me to the right location(url) where i can download it and install it on my website, which is build on a cakephp framework. Was thinking of a search engine that could index every new page a include in my website maybe once a night and then when i search it should output the search results on the separate page and with links to the actual pages where the keyword was taken from.
Happy new year and have a splendid year ahead..
I bet your mistaken about JavaScript. In order to search, you will be needing records coming from the users which will be stored in the database so basically, you will be dealing with server side languages not JavaScript. JavaScript is only used for client-side which doesn't have to do anything with the database (Not unless your using Node.js).
Both JavaScript and the browsers have come a long way. You could use Lunr or search-index. Both can run in the browser. For search-index you use localStorage as the index. That means your data is stored per domain, in the browser. Nothing to install, nothing to maintain. And low server requirement since it's all happening on the client side.
Lunr is more mature and quicker to get up to speed, but search-index is maybe more feature rich?
I am working on a Google Gadget that will collect some data through Google API's. What I am getting stuck on is how to collect the data and then save it somewhere to be processed later. The final idea being that I would run the gadget on my own computer it would collect the data and then save it to somewhere on my own computer. (I guess I want to emphasize that this is, for now, a small personal project and does not necessarily need fancy server scripts, I want to be able to run this all on my PC running XP).
Is there a pure Javascript way to save a file on a computer?
Can I use other languages besides XML, HTML, and Javascript to add functionality to my Google Gadget?
Edit: The goal of this is to be able to log how many of my contacts are signed into gchat over a period of time. I decided on a Gadget because that was the only way I could figure out how to access that information. Any other ways to approach this idea are welcome!
No, Javascript alone cannot save a file automatically. And be careful, javascript is affected by the no cross domain rule. If you're hosting the project on your own computer, why bother writing a complex Google Gadget?
I suggest a simple PHP script, and MySQL, if you like, to store the data. By itself, PHP should be more than enough to run most tasks. If you would like me to add in more info about this, please tell me what type of task.
In increasing order of flexibility:
The options object is almost certainly the easiest approach - not really designed for that kind of usage but I suspect it would be fine for your use case.
On windows you could use system.filesystem to get hold of the WScript FileSystemObject which you can then use to create files and write to them.
Also see the Google desktop API blog for embedding an SQLite database in your gadget (looks pretty easy).