Application:
I wish to publish a web-application that takes input strings, searches for the string in about 5,000 plain-text files and returns the name of the files with matches. Each text file is about 4MB (uncompressed).
Problem:
In PHP, I could use exec(grep -l pattern dir/* ) and get the job done. However, for cost reasons, I would go for a shared web-hosting plan which normally do not allow for executing programs.
Could you please suggest any alternative to grep for web environment?
I have understood following so far:
A binary program file for any grep-alternative (e.g sift) could work. However, the problem of executing on a shared server would remain.
PHP function preg_match is inappropriate considering a large number of files and their size.
I am open to implementations of grep-like function in other languages (e.g perl or javascript). However, I am not sure if the performance would be comparable to grep and whether the problem of execution would still remain.
I have tried looking for different web-hosting providers and understood that a virtual-private server (VPS) might be the solution. However, the price for a VPS plan by all hosting providers I have come across is unaffordable.
Any solutions or guidance for this problem?
Possible solutions depend on what your hosting provider offers and your budget.
Will you have a RDBMS available? You could then use full texts search which many offers. If not you could use SQLite, which has full texts search support.
If you have to stick to low tech solutions, than the PHP solution linked on the right might work for you.
Perl has a File::Find module, which you could use.
Related
Background:
I am building a node.js-based Web app that needs to make use of various fonts. But it only needs to do so in the backend since the results will be delivered as an image. Consequently, the client/browser does not need access to the fonts at all in my case.
Question:
I will try to formulate the question as little subjective as possible:
What are the typical options to provide a node.js backend with a large collection of fonts?
The options I came up with so far are:
Does one install these hundreds or thousands of fonts in the operating system of the (in my case: Ubuntu) server?
Does one somehow serve the fonts from a cloud storage such as S3 or (online) database such as a Mongo DB server?
Does one use a local file system to store the fonts and retrieve them?
...other options
I am currently leaning towards Option 1 because this is the way a layman like me does it on a local machine.
Without starting a discussion here, where could I find resources discussing the (dis-)advantages of the different options?
EDIT:
Thank you for all the responses.
Thanks to these, I noticed that I need to clarify something. I need the fonts to be used in SVG processing libraries such as p5.js, paper.js, raphael.js. So I need to make the fonts available to these libraries that are run on node.js.
The key to your question is
hundreds or thousands of fonts
Until I took that in there is no real difference between your methods. But if that number is correct (kind of mind-boggling though) I would:
not install them in the OS. What happens if you move servers without an image? Or move OS?
Local File system would be a sane way of doing it, though you would need to keep track manually of all the file names and paths for your code.
MongoDB - store file names+paths in the collection..and store the actual fonts in your system.
In advent of moving servers you would have to pick up the directory where all the actual files are stored and the DB where you hold the file name+paths.
If you want you can place it all in a MongoDB but then that file would also be huge, I assume - that is up to you.
Choice #3 is probably what I would do in such a case.
If you have a decent enough server setup (e.g. a VPS or some other VM solution where you control what's installed) then another option you might want to consider is to do this job "out of node". For instance, in one of my projects where I need to build 175+ as-perfect-as-can-be maths statements, I offload that work to XeLaTeX instead:
I run a node script that takes the input text and builds a small but complete .tex file
I then tell node to call "xelatex theFileIJustMade.tex" which yields a pdf
I then tell node to call "pdfcrop" on that pdf, to remove the margins
I then tell node to call "pdf2svg", which is a free and amazingly effective utility
Then as a final step mostly to conserve space and bandwidth, I use "svgo" which is a nodejs based svg optimizer that can run either as normal script code, or as CLI utility.
(some more details on that here, with concrete code here)
Of course, depending on how responsive a system you need, you can do entirely without steps 3 and 5. There is a limit to how fast we can run, but as a server-side task there should never be the expectation of real-time responsiveness.
This is a good example of remembering that your server runs inside a larger operating system that might also offer tools that can do the job. While you're using Node, and the obvious choice is a Node solution, Node is also a general purpose programming language and can call anything else through spawn and exec, much like python, php, java, C#, etc. As such, it's sometimes worth thinking about whether there might be another tool that is even better suited for your needs, especially when you're thinking about doing a highly specialized job like typesetting a string to SVG.
In this case, LaTeX was specifically created to typeset text from the command line, and XeLaTeX was created to do that with full Unicode awareness and clean, easy access to fonts both from file and from the system, with full OpenType feature control, so would certainly qualify as just as worthwhile a candidate as any node-specific solution might be.
As for the tools used: XeLaTeX and pdfcrop come with TeX Live (installed using whatever package manager your OS uses, or through MiKTeX on Windows, but I suspect your server doesn't run on windows) pdf2svg is freely available on github, and svgo is available from npm)
I have a simple audio sequencer application using javascript and HTML5. The goal of the application is to let users make a beat and save it so other people can open it up, add on to it, then re save. The issue I'm having is the save/open feature. How Do I add this? Or at least where can I start looking fro this solution?
If you are asking for an only-JS solution to this:
Opening Files
You can use the FileAPI to open the files from your users local machine, as long as its your users who are selecting the files manually(you are not digging into the file system yourself).
Here is nice demo for this functionality: http://html5demos.com/file-api
Saving Files
However, saving files was a W3C recommendation that got ditched so your only choice here is to allow your users to download the file. See this relevant question I made some time ago
If I were doing this for a class or prototype I might use PUT or POST requests to save the beat sequence in json form to a web server. When it's time to save your work, take your beat sequence data, convert from a js (I presume) object to a json string and PUT it to an appropriate url on your server. Give it a name and use the name in the url path, like "myserver.com/beats/jpbeat1.json".
When you want to retrieve that beat, issue an HTTP get to the url you saved. You can, of course also maintain and rewrite a directory page in json or html, or you can just navigate the directory offered by your web server. Most developers will think this crude, I imagine, but in a race for time, crude can be good if it meets the requirements.
Choose a web server, the easiest one for you. I might use Amazon S3. Costs money, but available to the internet, has authentication, security features, and supports http GET, PUT of static files. There are many other site hosting services.
For a local install, I might use Node.js. Also very easy to set up to serve static content. For example, the Express framework has static content service and easy examples how to set it up if you are comfortable with Javascript. With Node, you can make the site a bit fancier, by writing server side code to maintain an index, for example. Any real world implementation would require at least some server side code.
There are many other choices. Most developers would want a database, but you don't have to have one if simple files are ok for your class project. It all depends on your requirements.
I hope those ideas are helpful.
To learn node.js, I am writing a web site that allows users to play the online game Mafia. For those unfamiliar with Mafia, it is a game most commonly played on forums, and pits an uninformed majority (the "Town") against an informed minority (the "Mafia"). However, although this is an accurate brief overview, in fact every game session can exhibit widely varying house rules that can dramatically change the game mechanics.
I want my website to be able to handle all of these variations. At first I planned for my website to implement a comprehensive framework that could run all Mafia variants itself. However, after going over a ton of rule sets for finished games archived on several different forums, I realized that the space of reasonable rules and gameplay mechanics is so huge that I would essentially have to create a new domain-specific programming language to allow all possible variants. Inventing a new language for a an otherwise straightforward personal project is rather silly and not something I'm interested in at the moment, especially given I have a perfectly good language at hand, namely JavaScript.
Therefore, I decided to let variant authors to upload a JavaScript file containing the variant code that my website will call into at the appropriate points. Essentially, JavaScript modules implementing Mafia variant game logic (which my website code will require()) will act as a scripting language to my web site's "game engine". Think Lua for C++ games. Unfortunately, this introduces a severe security problem. Unlike in the browser, node-run JavaScript has access to the file system, the network, etc. etc. So it would be trivial for a malicious user to upload a variant file that deletes the contents of my hard drive, or starts Bitcoin mining, or whatever.
My first thought was to do a replace() over each user's uploaded code for dangerous libraries such as 'fs' and 'http' into invalid strings, and catch the consequent exceptions when I try to load the file. However, this ad-hoc blacklisting technique feels like the kind of approach that one of the many people smarter/more knowledgeable that me will be able to overcome in a heartbeat. What I really need is a way to whitelist the libraries that user-uploaded code has access to. Is there a way to do this using JavaScript in node.js? If not, how would you recommend I secure the computer my node server will be running on as much as possible?
My current strategy is to require myself and a small number of trusted users to review and then vote unanimously in favor of user-uploaded JavaScript variant code before it is brought into the system, but I'm hoping there is a more automatic way of doing it.
You need to use vm module for this. Basically it allows to run scripts in customized contexts, so you can put whatever globals you want, define your own require etc.
You should also remember that in node.js it's possible to harm your app without any libraries — a user can simply add something like while (true) {} which will stop the whole process. So you need to run all untrusted code in separate processes and be ready to kill them, when they start to abuse cpu or memory.
Anyone know if there's a combination of packages that can be used together to create a multisite setup with node.js?
A multisite setup would have one code base, possibly one server, serving different tlds from different databases.
Thanks.
Maybe this is what you are looking for?
https://github.com/visionmedia/express/blob/master/examples/vhost/index.js
An alternative is that you could simply run a number of sites, each on their own ports and use https://github.com/substack/bouncy to route incoming requests to the appropriate site.
You can choose how your sites are started as well. You can start them up as separate executables, or you can create something a bit more dynamic using http://nodejs.org/docs/latest/api/child_processes.html#child_process.fork. This would allow you to have a single executable with any number of means of starting up sites, bringing them up or down, gathering statistics from them, whatever.
If you're looking for a straight-up CMS, check out http://calip.so/
There are numerous log files that I have to review daily for my job. Several good parsers already exist for these log files but I have yet to find exactly what I want. Well, who could make something more tailored to you than you, right?
The reason I am using JavaScript (other than the fact that I already know it) is because it's portable (no need to install anything) but at the same time cross-platform accessible. Before I invest too much time in this, is this a terrible method of accomplishing my goal?
The input will be entered into a text file, delimited by [x] and the values will be put into an array to make accessing these values faster than pulling the static content.
Any special formatting (numbers, dates, etc) will be dealt with before putting the value in the array to prevent a function from repeating this step every time it is used.
These logs may contain 100k+ lines which will be a lot for the browser to handle. However, each line doesn't contain a ton of information.
I have written some of it already, but with even 10,000 lines it's starting to run slow and I don't know if it's because I wasn't efficient enough or if this just cannot be effectively done. I'm thinking this is because all the data is in one giant table. I'd probably be better off paginating it, but that is less than desirable.
Question 1: Is there anything I failed to mention that I should consider?
Question 2: Would you recommend a better alternative?
Question 3: (A bit off topic, so feel free to ignore). Instead of copy/pasting the input, I would like to 'open' the log file but as far as I know JavaScript cannot do this (for security reasons). Can this be accomplished with a input="file" without actually having a server to upload to? I don't know how SSJS works, but it appears that I underestimated the limitations of JavaScript.
I understand this is a bit vague, but I'm trying to keep you all from having to read a book to answer my question. Let me know if I should include additional details. Thanks!
I think JavaScript is an "ok" choice for this. Using a scripting language to parse log files for personal use is a perfectly sane decision.
However, I would NOT use a browser for this. Web browsers place limitations on how long a bit of javascript can run, or on how many instructions it is allowed to run, or both. If you exceed these limits, you'll get something like this:
Since you'll be working with a large amount of data, I suspect you're going to hit this sooner or later. This can be avoided by clever use of setTimeout, or potentially with web workers, but that will add complexity to your project. This is probably not what you want.
Be aware that JavaScript can run outside of browsers as well. For instance, Windows comes with the Windows Script Host. This will let you run JavaScript from the command prompt, without needing a browser. You won't get the "Script too long" error. As an added bonus, you will have full access to the file system, and the ability to pass command-line arguments to your code.
Good luck and happy coding!
To answer your top question in bold: No, it is not a terrible idea.
If JS is the only language you know, you want to avoid setting up any dependencies, and you want to stay platform-independent... JavaScript seems like a good fit for your particular case.
As a more general rule, I would never use JS as a language to write a desktop app. Especially not for doing a task like log parsing. There are many other languages which are much better suited to this type of problem, like Python, Scala, VB, etc. I mention Python and Scala because of their script-like behaviour and minimal setup requirements. Python also has very similar syntax to JS so it might be easier to pick up then other languages. VB (or any .NET language) would work too if you have a Visual Studio license because of it's easy to use GUI builder if that suits your needs better.
My suggested approach: use an existing framework. There are hundreds, if not thousands of log parsers out there which handle all sorts of use-cases and different formats of logs that you should be able to find something close to what you need. It may just take a little more effort than Google'ing "Log Parsers" to find one that works. If you can't find one that suits your exact needs and you are willing to spend time making your own, you should use that time instead to contribute to one of the existing ones which are open source. Extending an existing code base should always be considered before trying to re-invent the wheel for the 10th gillion time.
Given your invariants "javascript, cross-platform, browser ui, as fast as possible" I would consider this approach:
Use command line scripts (windows: JScript; linux: ?) to parse log files and store 'clean'/relevant data in a SQLite Database (fall back: any decent scripting language can do this, the ready made/specialized tools may be used too)
Use the SQLite Manager addon to do your data mining with SQL
If (2) gets clumsy - use the SQLite Manager code base to 'make something more tailored'
Considering your comment:
For Windows-only work you can use the VS Express edition to write an app in C#, VB.NET, C++/CLI, F#, or even (kind of) Javascript (Silverlight). If you want to stick to 'classic' Javascript and a browser, write a .HTA application (full access to the local machine) and use ADO data(base) access and try to get the (old) DataGrid/Flexgrid controls (they may be installed already; search the registry).