I am fairly new to NLP in general. My goal is to create some kind of parser that can easily find files on my various hard drives.
I have no idea how to properly parse the input to transform it into any managable representation that a program can easily use to create a given output.
For example, the following sentences should return a list of documents:
documents created 3 months ago
documents modified 2 weeks ago
photos taken in china (this one would then use the GPS data within the image file)
It can probably be easily done using some kind of Regex pattern (<filetype> <action> <time>) but I would love to make it more flexible.
I looked into compromise, a JS library that has some easy to use API to retrieve specific parts of the input. But I kind of doubt that calling methods like calculatedResult.nouns()[0] and calculatedResult.verbs()[0].stem() should be used to parse the commands as those require a fixed kind of syntax.
Any tips on how to achieve my goal? I am not sure if using ML and training a custom model is the way to go. I never use ML and, based on my low knowledge of it, it seems kind of hard to train it constructs like those (as I would need a LOT of example sentences but there is just a finite amount of realisically used combinations that make sense).
The NLP technique you need to explore is intent detection. You can either integrate a NLP library like RASA or Spacy into your program or work with a commercial API that conducts intent detection. You will need example sentences in both cases but probably not as many as you think. Intent detection is a key part of chatbots so there's quite a lot of tools out there. Low level, hands-ons ML development is not really needed these days with all the high level intent detection tools out there.
Related
I've been working with active record and data mapper implementations of ORM enough to know the problems with using active record implemented ORM in my large projects. Right now I'm thinking to migrate one of my projects to node.js and trying to find the similar tools Im using right now. After research I didn't found any node.js ORM that follows data mapper pattern. They all are active record. Maybe I missing something, and you can tell me is there is a good popular ORM for node.js that doesn't follow active record pattern?
The libraries Ive looked on:
http://docs.sequelizejs.com/
https://github.com/dresende/node-orm2
http://bookshelfjs.org/
some others
After lot of frustration of currently exist ORMs for JavaScript I have written my own ORM that supports TypeScript / ES6 / ES5 and follows data mapper patterns and all other best practices - TypeORM
I wrote an ORM for Node.js called node-data-mapper; it's available here: https://www.npmjs.com/package/node-data-mapper. It's an ORM for Node.js that uses the data-mapper pattern. The developer uses plain old JavaScript objects when reading from and writing to the database. Relationships between tables are not rigidly defined, which makes joining very flexible--in my opinion, anyway--albeit somewhat verbose. The actual data mapping algorithm is fast and short, and the complexity is linear (the transformation from tabular DB data to a normalized JavaScript object is done in one loop).
I also did my best to make it fairly fault tolerant. There's 100% code coverage and, while I know that doesn't prove the absence of defects, I did try to test as thoroughly as possible.
I modeled the interface very loosely after Doctrine 1. (I've used LINQ, Doctrine 1 and 2, and Hibernate fairly extensively, and of those ORMs I like the interface for Doctrine 1 the best. node-data-mapper is not a JavaScript port of Doctrine by any means, though, and the interface is significantly different.) The query interface returns promises using the deferred module.
I modeled the conditions (e.g. WHERE and ON clauses) after MongoDB's conditions. Hopefully that makes the conditions somewhat intuitive while providing a way for making reusable queries (specifically, complex SELECT queries that can be filtered securely in many different ways). The conditions are treated as a domain-specific language, and are lexed, parsed, and compiled.
Anyway, the module is something that I use in my personal projects, but I'd love to get some feedback from other developers in the community! I tried to provide plenty of examples to get people up and running quickly. Currently the module supports MySQL only, but I'm working on adding support for MSSQL.
The distinction between data-mapper pattern and active record doesn't really make sense in the dynamic language such as JavaScript.
Typically data mapper is more lightweight in typed language, but in JS it does not really make a difference.
From the top of my head, I can mention two very popular projects which you probably don't know:
Waterline.js is a Sails abstraction, which works quite well on top of many database systems.
If you would consider MongoDB for your DB - Mongoose.js.
I'm wondering how much benefit a CRUD-centric web application can benefit from Haskell's type system, particularly when the front end is built with a Javascript MVC framework like AngularJS which passes around typeless data objects.
It seems to me that as soon as you transform Haskell datatypes into JSON objects which you pass to a heavy JavaScript MVC framework layer, the benefits of having Haskell's type system as part of the web stack start eroding dramatically, as there is no way to let the type checker ensure the type-integrity of the data flow through the whole web application.
For example, you could change the database schema and the associated Haskell types, but the type checker won't be able to tell you what parts of the JavaScript MVC front-end need updating as well. I see this as a problem.
Am I stating the problem correctly, and if so, what advice could Haskell web application developers give on this point?
We've been wrestling with this exact same question quite a bit since we recently started a project with a substantial javascript front end. My anecdotal observation has been that we have a lot more bugs in the javascript application than we had with previous applications that just used Snap and generated HTML with Heist. We haven't decided on anything yet, but here are some of the possible solutions we've been considering:
Thin Javascript Wrapper
CoffeeScript
TypeScript
Dart
These are pretty unsatisfying solutions to me. The improve slightly on Javascript, but don't come close to giving me the things that I get with Haskell.
Much more functional and type safe front-end language
Fay is a proper subset of Haskell that compiles to Javascript. It definitely has some appeal, but doesn't give you access to all of Haskell. Last I heard it didn't support type classes, which I imagine would become an obstacle pretty quickly.
Elm
Roy
The problem with these solutions (as well as with the previous group) is that if your back end is written in Haskell, you still have the impedance mismatch because your front end language is not Haskell. This makes your code less DRY because you have to end up defining the same data structures in Haskell and the front end language. And when you change them in Haskell, you don't get errors indicating where your front end code needs changing. The app just breaks.
Compile Haskell to Javascript
The new game in town here is ghcjs. This is a very promising project, but I don't consider it to be viable for production at least until GHC 7.8 is released. That will hopefully happen within the next week. Once 7.8 is out the door, you still have to take into consideration that ghcjs is still very new. And even in the hypothetical scenario that it was 100% feature complete and the first release worked perfectly, you still have to remember that a fair amount of infrastructure has to be built before Haskell+ghcjs is as effective as high level javascript frameworks like Angular, Ember, etc.
UPDATE September 2016: Now, almost three years after I originally wrote this answer, GHCJS has improved greatly. There is still room for more improvements, but I have used it for production applications and it worked very well. It's especially powerful when combined with the Reflex FRP library that makes it much easier to build reactive UIs.
Generate Javascript from Haskell with an EDSL
If you have a relatively constrained problem, it might be possible to do all your application work on top of an EDSL that generates javascript. We already have the fantastic jmacro package to take care of the low level concerns of generating Javascript. You could leverage that and generate code that uses whatever other javascript libraries are appropriate for your application. That could be javascript + jquery, D3.js, or even code using a higher level javascript framework like Angular or Ember. I tend to think that Angular would be much easier to generate code for than Ember because of its simplicity and stronger encapsulation.
Greenfield a bytecode VM designed for functional languages in the browser
This is just a pie in the sky idea of mine. I don't think it's really practical because it would take a huge amount of work and be very difficult to gain adoption. But I like to at least mention the idea for completeness. Others have pointed out that asm.js is almost like this already. That may be the case, but it would be nice to have things like tailcall optimization designed into the VM level from the start.
In my opinion, easy solution - generate typescript interfaces from haskell data declarations(or another api scheme descriptions) and use TypeScript for front-end part.
it gives the opportunity to work on a project for people who do not know haskell, but who knows Javascript.
For example
data RpcResponse = RpcResponse { number :: Int, string :: Maybe String }
compile to
interface RpcResponse {
number : number,
string?: string
}
and function parseRpcResponseJson has Typescript type
parseRpcResponseJson(response: string): Option<RpcResponse>;
I have a web app that relies on html5 offline storage features so that it can be accessed by the user without an internet connection. The app essentially just serves html pages and a little bit of css and javascript.
I am trying to add the ability to search the text served on these pages for key words, but because the app isn't guaranteed access to the server it needs to be able to perform these searches on the client side.
My thought is I can store the searchable text in the browser's web sql database and perform the search either through javascript or through the browser's sql api. I have a few question about the best way to do this:
1) I vaguely remember an article about how to implement something like this, maybe from airbnb? Does anyone remember such an article?
2) The text is 2,000,000+ words so I would assume that indexOf is going to break down at this data size. Is there any chance regex will hold up? What are some options for implementing the actual search? (libraries, algorithms, etc.) Any article suggestions for understanding the tradeoffs of string search algorithms if I need to go down that road?
Well, I just wrote a quick benchmark for you and was surprised to find that you could probably get away with using String.indexOf(). I get about 35ms per search, which is about 30 searches per second.
EDIT: a better benchmark. There appears to be some sort of initialization delay, but it looks like indexOf is pretty fast. You could play around with the benchmark and see if it looks like it will work for you.
There are numerous log files that I have to review daily for my job. Several good parsers already exist for these log files but I have yet to find exactly what I want. Well, who could make something more tailored to you than you, right?
The reason I am using JavaScript (other than the fact that I already know it) is because it's portable (no need to install anything) but at the same time cross-platform accessible. Before I invest too much time in this, is this a terrible method of accomplishing my goal?
The input will be entered into a text file, delimited by [x] and the values will be put into an array to make accessing these values faster than pulling the static content.
Any special formatting (numbers, dates, etc) will be dealt with before putting the value in the array to prevent a function from repeating this step every time it is used.
These logs may contain 100k+ lines which will be a lot for the browser to handle. However, each line doesn't contain a ton of information.
I have written some of it already, but with even 10,000 lines it's starting to run slow and I don't know if it's because I wasn't efficient enough or if this just cannot be effectively done. I'm thinking this is because all the data is in one giant table. I'd probably be better off paginating it, but that is less than desirable.
Question 1: Is there anything I failed to mention that I should consider?
Question 2: Would you recommend a better alternative?
Question 3: (A bit off topic, so feel free to ignore). Instead of copy/pasting the input, I would like to 'open' the log file but as far as I know JavaScript cannot do this (for security reasons). Can this be accomplished with a input="file" without actually having a server to upload to? I don't know how SSJS works, but it appears that I underestimated the limitations of JavaScript.
I understand this is a bit vague, but I'm trying to keep you all from having to read a book to answer my question. Let me know if I should include additional details. Thanks!
I think JavaScript is an "ok" choice for this. Using a scripting language to parse log files for personal use is a perfectly sane decision.
However, I would NOT use a browser for this. Web browsers place limitations on how long a bit of javascript can run, or on how many instructions it is allowed to run, or both. If you exceed these limits, you'll get something like this:
Since you'll be working with a large amount of data, I suspect you're going to hit this sooner or later. This can be avoided by clever use of setTimeout, or potentially with web workers, but that will add complexity to your project. This is probably not what you want.
Be aware that JavaScript can run outside of browsers as well. For instance, Windows comes with the Windows Script Host. This will let you run JavaScript from the command prompt, without needing a browser. You won't get the "Script too long" error. As an added bonus, you will have full access to the file system, and the ability to pass command-line arguments to your code.
Good luck and happy coding!
To answer your top question in bold: No, it is not a terrible idea.
If JS is the only language you know, you want to avoid setting up any dependencies, and you want to stay platform-independent... JavaScript seems like a good fit for your particular case.
As a more general rule, I would never use JS as a language to write a desktop app. Especially not for doing a task like log parsing. There are many other languages which are much better suited to this type of problem, like Python, Scala, VB, etc. I mention Python and Scala because of their script-like behaviour and minimal setup requirements. Python also has very similar syntax to JS so it might be easier to pick up then other languages. VB (or any .NET language) would work too if you have a Visual Studio license because of it's easy to use GUI builder if that suits your needs better.
My suggested approach: use an existing framework. There are hundreds, if not thousands of log parsers out there which handle all sorts of use-cases and different formats of logs that you should be able to find something close to what you need. It may just take a little more effort than Google'ing "Log Parsers" to find one that works. If you can't find one that suits your exact needs and you are willing to spend time making your own, you should use that time instead to contribute to one of the existing ones which are open source. Extending an existing code base should always be considered before trying to re-invent the wheel for the 10th gillion time.
Given your invariants "javascript, cross-platform, browser ui, as fast as possible" I would consider this approach:
Use command line scripts (windows: JScript; linux: ?) to parse log files and store 'clean'/relevant data in a SQLite Database (fall back: any decent scripting language can do this, the ready made/specialized tools may be used too)
Use the SQLite Manager addon to do your data mining with SQL
If (2) gets clumsy - use the SQLite Manager code base to 'make something more tailored'
Considering your comment:
For Windows-only work you can use the VS Express edition to write an app in C#, VB.NET, C++/CLI, F#, or even (kind of) Javascript (Silverlight). If you want to stick to 'classic' Javascript and a browser, write a .HTA application (full access to the local machine) and use ADO data(base) access and try to get the (old) DataGrid/Flexgrid controls (they may be installed already; search the registry).
The previous programmer left the website in pretty unusable state, and I am having difficulty modifying anything. I am new to web design so I don't know whether my skills are a mismatch to this kind of job or is it normal in the real industry to have websites like these
The Home page includes three frames
Each of these frames have their own javascript functions ( between <head>, and also call other common javascript functions (using <script src=..>
Excessive usage of document.all - in fact the elements are referred or accessed by document.all only.
Excessive usage of XSLT and Web Services - Though I know that using Web Services is generally considered a good design choice - is there any other way I can consume these services other than using xslt. For example, the menu is created using the data returned by a web method.
Every <div>, <td> and every other element has an id, and these id's are manipulated by the javascript functions, and then some appropriate web service and the xslt files are loaded based on these..
From the security perspective, he used T-SQL's for xml auto for most of the data that is returned by the web service - is it a good choice from the security standpoint to expose the table names and column names to the end user??
I am a lot confused about the state of the application itself. Should I learn about the intricacies that he has developed and continue working on it, or should I start rewriting everything? What I am perplexed a lot is the lack of alternatives - and whether this is the common way web projects are handled in the real world or was it an exception?
Any suggestions, any pointers are welcome. Thanks
No, it is not acceptable in this industry that people keep writing un-maintainable code.
My advice to you is to go up the chain and convince everyone that this needs to be rewritten. If they question you, find an external consultant with relevant web development skills to review the application (for 1 day).
Keeping this website as-is, because it 'works' is like keeping a working model Ford-T car on today's highways, very dangerous. Security and maintenance costs are likely the most persuading topics to convince anyone against keeping this site 'as-is'.
Next, get yourself trained, it will pay off if you can rewrite this application knowing the basics. Todays technology (asp.net MVC) allows you to implement core business value faster than trying to maintain this unconventionally written app.
Tough spot for an inexperienced developer (or any) to be left in. I think you have a few hard weeks a head of you where you really need to read up on the technologies involved to get a better understanding of them and what is best practice. You will also need to really dig down into the existing code to understand how it all hangs together.
When you done all that you really need to think about your options. Usually re-writing something from scratch (especially if it actually works) is a bad idea. This obviously depend on the size of the project, for a smaller projects with only a couple of thousand lines of code it might be OK. When looking at someone elses code it is also easy to overlook that all that weird shit going on could actually be fixes for valid requirements. Things often start out looking neat, but then the real words comes visiting.
You will need to present the business with time estimates for re-writing to see if that is an option at all, but I'm guessing you will need to accept the way things are and do your best with what you have. Maybe you could gradually improves things.
I would recommend moving the project to MVC3 and rewriting the XSLT portions to function using views and/or partial views with MVC. The Razor model binding syntax is very clean and should be able to quickly cleave out the dirty XSLT code and be left with just the model properties you would need.
I would then have those web services invoked from MVC serverside and for you to deserialize the object results into real objects (or even just use straight XQuery or Json traversing to directly pull stuff out for your model) and bind those to your views.
This could be a rather gargantuan leap for technology at your company though. Some places have aversion to change.
I'd guess this was written 6-7 years ago, and hacked on since then. Every project accumulates a certain amount of bubble gum and duct tape. Sounds like this one's got it bad. I suggest breaking this up into bite size chunks. I assume that the site is actually working right now? So you don't want to break anything, the "business" often thinks "it was working just fine when the last guy was here."
Get a feel for your biggest pain points for maintaining the project, and what you'll get the biggest wins from fixing. a rewrite is great, if you have the time and support. But if it's a complex site, there's a lot to be said for a mature application. Mature in the sense that it fulfills the business needs, not that it's good code.
Also, working on small parts will get you better acquainted with the project and the business needs, so when you start the rewrite you'll have a better perspective.