searching a large amount of text using javascript and html5 storage

searching a large amount of text using javascript and html5 storage - javascript

I have a web app that relies on html5 offline storage features so that it can be accessed by the user without an internet connection. The app essentially just serves html pages and a little bit of css and javascript.
I am trying to add the ability to search the text served on these pages for key words, but because the app isn't guaranteed access to the server it needs to be able to perform these searches on the client side.
My thought is I can store the searchable text in the browser's web sql database and perform the search either through javascript or through the browser's sql api. I have a few question about the best way to do this:
1) I vaguely remember an article about how to implement something like this, maybe from airbnb? Does anyone remember such an article?
2) The text is 2,000,000+ words so I would assume that indexOf is going to break down at this data size. Is there any chance regex will hold up? What are some options for implementing the actual search? (libraries, algorithms, etc.) Any article suggestions for understanding the tradeoffs of string search algorithms if I need to go down that road?

Well, I just wrote a quick benchmark for you and was surprised to find that you could probably get away with using String.indexOf(). I get about 35ms per search, which is about 30 searches per second.
EDIT: a better benchmark. There appears to be some sort of initialization delay, but it looks like indexOf is pretty fast. You could play around with the benchmark and see if it looks like it will work for you.

Related

Looking for ways to aggregate info/data from different websites

I'm a fresh-out-of-college programmer with some experience in Python and Javascript, and I'm trying to develop either a website or just a back-end system that will aggregate information from online market websites which don't have any API (or none that I've found, anyway). Ideally I would also want the system that can write to local storage to track changes to the data over time in some kind of database, but that's down the road a bit.
I've already pounded out some javascript that can grab the data I want, but apparently there doesn't seem to be a way to access or act upon data from other websites due to data security protections or to save the data to local storage in order to be read from other pages. I know that there are ways to aggregate data, as I've seen other websites that do this.
I can load websites in Python using the urllib2 and use regular expressions to parse what I want from some pages, but on a couple of the desired sites I would need to log into the website before I can access the data I want to gather.
Since I am relatively new to programming, is there an ideal tool / programming language that would streamline or simplify what I'm trying to do?
If not, could you please point me in the right direction for how I might go about this? After doing some searching, there seems to be a general lack of cross-domain data gathering and aggregation. Maybe I'm not even using the right terminology to describe what I'm trying to do.
Whichever way you would look at this, please help! :-)

i suggest you use selenium webdriver to login to get cookie,and use requests library to scrap the message.That is what my company do in the scraping system.if you only use selenium webdriver, you will need to many memory and cpu capacity.
if you are good at html and js,it is a good way for you to use requests library to Simulate logging.
for the website you must log in,the most import thing is to get cookie.

Back-end solution for pure-Javascript site with link sharing

I'm developer-beginner and I would like to do simple card-sorting site.
hundreds of cards with plain text
no log-ins, it must be as simple for user as possible
few dozens users per day
final state of sorting should be saved under unique link, like site.com/q56we1cfuz4 or site.com/link.php?unique=q56we1cfuz4
User sorts cards as he/she wishes and the result is saved into browser sessions storage (few kb of data in JSON or so). So far, so good.
Question:
As I know only Javascript:
If I need to share the state of the page (data in session storage) with some unique link, is something like Firebase.com good solution for such kind of back-end? Or some simple DB with help od Node.js?
In order to "catch" the link when someone click at unique URL (site.com/link.php?unique=q56we1cfuz4), I still need some server-side script which will ask the DB, like PHP header redirect, right?

Your questions are a little fuzzy, no problem tho. You are just getting into web dev, so there's a lot to wrap your head around and all of the options can be pretty confusing. Some people will complain about opinionated answers, and I'm not going to claim to be objective here, but here are a few tips that I think will get you pointed in a better direction...
Firstly, yes - firebase is a good solution for you to try working with. Aside from the fact that it will give you the db/storage features you need, it's a realtime db, which will give you a lot more than just storage in the long run, and will better equip you for the future web development trends. The firebase API is (mostly) designed to work asynchronously, so from a javascript perspective, it falls right in line with the kind of code you'll end up learning to write.
Re: the other aspect of your question - server-side - check out nodeJS. It's basically a server-side javascript platform that will allow you to use the same skills you're learning to write client-side code for the server. Also check out expressJS, a nodeJS package that provides you the http-server, and allows you to handle dynamic urls, etc. - the bits you were thinking about when you made a reference to PHP.
Hopefully this will help you focus on a few specific tools to familiarize yourself with as you learn web development, rather than having to struggle with everything from new languages, platforms, and way too many libraries, frameworks and techniques to wrap your head around. Good luck!

Is a database required for a "quiz" type of game?

I don't know much about databases, I've been asking a few questions about them lately to get a better understanding but I'm still a bit confused about what does and doesn't need one.
I'm making a simple application using HTML/CSS/JavaScript, it has a few quizzes and "tutorials" targeted towards children. I don't want the next tutorial/quiz to be unlocked until the previous one is completed.
So for that would I need a database so that it "saves" when one is completed? I don't need to save scores or anything like that, they just get to move on once they get a passing score.

Any other requirements such as saving to a profile or needing to persist between sessions (e.g. changing of device)?
Browsers have localStorage APIs now which allow you to save a lot of the data (and keep it for a set duration of time). There are also good'ol'fashioned cookies which allow you save pieces of information as well.
Keep in mind that both of the above mandate the user use the same browser and allow these mechanisms. Obviously using "private"/"incognito" browsing would also affect saving status.
It's up to what you feel the requirements are.
EDIT Just saw your mention of a mobile app. If you're planning on allowing the experience to transcend devices, you'll need a database. otherwise, you'll be relying heavily on if they use cross-device sync (like Chrome and Firefox do with bookmarks, passwords, etc.)

If you don't mind that people can do a "view source" on the webpage or use every browsers' developer tools to find out the answers or move on to the next tutorial or quiz, then you can use cookies to store the user's status. Or you can use the preferable Web Storage API.
You might want to look at Firebase. Using just simple JavaScript on the web browser, you can have users with logins (or just allow them to login via Facebook or other services) very easily. And then you can store and retrieve data very easily as well, like quizzes, tutorials and results. This way nobody can see the answers even if they're adept at analyzing the webpage.

When you don't use database, before any check, you have to load all data in your static page.
So My sloution: store students situation in a cookie. On each page check cookie status and then use Jquery remove() to remove (Client-side) those parts of page that he/she can not access.
EDIT
This wont work when JavaScript is disabled.

There seems to be a lot of ideas but no clarifying on the database subject.
TL;DR is: No.
Now for the specifics. A database is nothing more than a way to store information. While traditional "SQL" databases (it is pronounced "Sequel" as in "My Sequel" for MySQL) have concepts of tables, where you define columns with items to store and saves each row with its value, much like an Excel file, some databases like Redis store key-value pairs and others lide MongoDB store JavaScript Objects.
You can store information in the source code (As Variables possibly) or in a file. A database is a way to organize that information.
With that said, in your case, you probably need a backend or an API. An API is basically a means of communication with a server through AJAX (JavaScript in the browser asks for stuff). That would be your way to retrieve information from the server as needed, so that users wouldn't see the answers before they answer.
With that out of the way, there are some options. FireBase (As noted on other answer) and AppBase are easy ways to integrate this concept with little effort. But they tie you and your information to their system, and they are mostly targeting more resource intensive apps.
Since you are using JS and seem to be enjoying your learning experience, I would suggest you consider suing NodeJS and defining the data as either a JSON file or a variable in JS. You keep working on your problem but add options and get to learn some stuff.
If you decide to integrate a database and possibly do some neat stuff, you have most of the groundwork done already.
If NodeJS picks your interest, Mean.IO and KrakenJS are, in my opinion, the best places to start, though they may both seem overkill in your specific case.
Do consider though: A database is just a small possible piece in a puzzle, and it's mostly a horrible way to name some of the software that tries to organize your information. Consider first if you need to organize information, and what and how do you need to organize, then start thinking if databases are the best way to organize it.

Ajax requests/responses: how to make them lightning fast?

I came across a site that does something very similar to Google Suggest. When you type in 2 characters in the search box (e.g. "ca" if you are searching for "canon" products), it makes 4 Ajax requests. Each request seems to get done in less than 125ms. I've casually observed Google Suggest taking 500ms or longer.
In either case, both sites are fast. What are the general concepts/strategies that should be followed in order to get super-fast requests/responses? Thanks.
EDIT 1: by the way, I plan to implement an autocomplete feature for an e-commerce site search where it 1.) provides search suggestion based on what is being typed and 2.) a list of potential products matches based on what has been typed so far. I'm trying for something similar to SLI Systems search (see http://www.bedbathstore.com/ for example).

This is a bit of a "how long is a piece of string" question and so I'm making this a community wiki answer — everyone feel free to jump in on it.
I'd say it's a matter of ensuring that:
The server / server farm / cloud you're querying is sized correctly according to the load you're throwing at it and/or can resize itself according to that load
The server /server farm / cloud is attached to a good quick network backbone
The data structures you're querying server-side (database tables or what-have-you) are tuned to respond to those precise requests as quickly as possible
You're not making unnecessary requests (HTTP requests can be expensive to set up; you want to avoid firing off four of them when one will do); you probably also want to throw in a bit of hysteresis management (delaying the request while people are typing, only sending it a couple of seconds after they stop, and resetting that timeout if they start again)
You're sending as little information across the wire as can reasonably be used to do the job
Your servers are configured to re-use connections (HTTP 1.1) rather than re-establishing them (this will be the default in most cases)
You're using the right kind of server; if a server has a large number of keep-alive requests, it needs to be designed to handle that gracefully (NodeJS is designed for this, as an example; Apache isn't, particularly, although it is of course an extremely capable server)
You can cache results for common queries so as to avoid going to the underlying data store unnecessarily

You will need a web server that is able to respond quickly, but that is usually not the problem. You will also need a database server that is fast, and can query very fast which popular search results start with 'ca'. Google doesn't use conventional database for this at all, but use large clusters of servers, a Cassandra-like database, and a most of that data is kept in memory as well for quicker access.
I'm not sure if you will need this, because you can probably get pretty good results using only a single server running PHP and MySQL, but you'll have to make some good choices about the way you store and retrieve the information. You won't get these fast results if you run a query like this:
select
q.search
from
previousqueries q
where
q.search LIKE 'ca%'
group by
q.search
order by
count(*) DESC
limit 1
This will probably work as long as fewer than 20 people have used your search, but will likely fail on you before you reach a 100.000.

This link explains how they made instant previews fast. The whole site highscalability.com is very informative.
Furthermore, you should store everything in memory and should avoid retrieving data from the disc (slow!). Redis for example is lightning fast!

You could start by doing a fast search engine for your products. Check out Lucene for full text searching. It is available for PHP, Java and .NET amongst other.

Is it possible to store javascript in a database?

I have an idea for a web application where a user can submit Javascript code that can then be served up to other users. I'm wondering what's the best way of going about this. Is it possible to store the Javascript in a database and then serve it up to users as they request it? I would also like to be able to attach metadata to each piece of code: name, user ratings, etc., so a database seems like the natural solution to my somewhat underinformed mind. I'm looking at using Rails on the backend with MongoDB.

Javascript is a string of text. Databases can store strings of text. Hence, databases can store Javascript.
Unless you have some specific idea I'm missing though, I wholly agree with #Aircule's sentiment.
Wow, I don't think I've seen a worse idea in ages.

Yes, it seems like you've got a grasp of what is required. Just be careful not to execute the arbitrary code - you could be entering a world of XSS hurt.
Unless you're going to be getting millions of hits a minute, any database or framework will be fine.

I highly recommend reading up on XSS and CSRF. (shameless plug, i blogged a high level overview here) It is hard enough to prevent these sort of things when you are actively trying to look out for them, sanitizing js would be an absolute nightmare.

We Keep Coding

JavaScript is the programming language of the Web.