Diffing two texts in Rails and printing human readable HTML output - javascript

I'm using the paper_trail gem to auto-create a versions history of my Page model.
In pages#show, I display the versions like so:
The most important element is the diff which shows the difference between the previous and the currently displayed version of a field, e.g. content. It looks like a diff on GitHub, and it is marked up well using <ins> for insertions and <del> for deletions.
The sad thing is that at the time being, I generate this diff using https://code.google.com/p/google-diff-match-patch/, a JavaScript library that's run in the browser of the user. I've done it this way because I didn't manage to find a Ruby gem or similar that does the same in a similar elegant way.
Well, I found https://github.com/samg/diffy and https://github.com/pvande/differ, but the diff of both gems aren't close as elegant: differ needs to know manually whether to diff by line, word, or character (while the used JavaScript decides this automatically and uses a combination of these options which feels very intuitive to me), and diffy doesn't offer an option for this. I don't know exactly how the JavaScript works, but it states that "Myer's diff algorithm" is used internally:
This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.
You can try it here: https://neil.fraser.name/software/diff_match_patch/svn/trunk/demos/demo_diff.html
Maybe use the following two strings to see a typical example of two versions of a page:
This is the about page.
Put markdowwn formated content here.
and
This is the about page.
Put markdown formatted here content.
A [link to page 11](11). And another one: [](11).
It results in something like this:
The problem with this approach is that it's run in the browser, so I can't mutate the generated code in my Rails application anymore. So I wonder whether there's an easy way to get similar diff results using e.g. a command line tool like diff? Maybe even git could be of use?

Related

Building Chrome Extensions on Existing Production React Websites

Background: I've been developing a bunch of browsers extensions on production sites (Yelp, Zillow, Trulia, Reddit) which use react. I've yet to take a course on React (I'm planning on doing it) but my questions are:
How stably named are the classes in production react sites (many of the classes have weird numbers and letters appended) and if they are not stable, how often do they change and is there any way to get a more stable selector for these types of items?
When classes are completely non-human readable, is there any way to view the class name in a more human readable format? e.g. <div class="_2jJNpBqXMbbyOiGCElTYxZ">
I'd hate to build these extensions and have them break whenever there is a minor release (I know they will break as the site is significantly updated but would prefer if they were stable for minor releases).
Example: Targeting a span like this
<span class="lemon--span__373c0__3997G text__373c0__2Kxyz reviewCount__373c0__2r4xT text-color--black-extra-light__373c0__2OyzO text-align--left__373c0__2XGa-">865</span>
with a queryselection like this:
const ratingCountTarget = result
.closest('.mainAttributes__373c0__1r0QA')
.querySelector('.reviewCount__373c0__2r4xT');
There's no way to get the original names and nothing precludes the site developers from updating the random parts any day or several times a day so find a way to not depend on the exact names.
Try finding the non-randomized attributes
Use relations between elements (combinators)
Use partial matching like foo.querySelector('[class*="reviewCount"]')
And get ready to having your extension inevitably break even if only occasionally.

Best way to scrape a set of pages with mixed content

I’m trying to show a list of lunch venues around the office with their today’s menus. But the problem is the websites that offer the lunch menus, don’t always offer the same kind of content.
For instance, some of the websites offer a nice JSON output. Look at this one, it offers the English/Finnish course names separately and everything I need is available. There are couple of others like this.
But others, don’t always have a nice output. Like this one. The content is laid out in plain HTML and English and Finnish food names are not exactly ordered. Also food properties like (L, VL, VS, G, etc) are just normal text like the food name.
What, in your opinion, is the best way to scrape all these available data in different formats and turn them into usable data? I tried to make a scraper with Node.js (& phantomjs, etc) but it only works with one website, and it’s not that accurate in case of the food names.
Thanks in advance.
You may use something like kimonolabs.com, they are much easier to use and they give you APIs to update your side.
Remember that they are best for tabular data contents.
There my be simple algorithmic solutions to the problem, If there is a list of all available food names this can be really helpful, you find the occurrence of a food name inside a document (for today).
If there is not any food list, You may use TF/IDF. TF/IDF allows to calculate the score of a word inside a document among the current document and also other documents. But this solution needs enough data to work.
I think the best solution is some thing like this:
Creating a list of all available websites that should be scrapped.
Writing driver classes for each website data.
Each driver has the duty of creating the general domain entity from its standard document.
If you can use PHP, Simple HTML Dom Parser along with Guzzle would be a great choice. These two will provide a jQuery like path finder and a nice wrapper arround HTTP.
You are touching really difficult problem. Unfortunately there are no easy solutions.
Actually there are two different parts to solve:
data scraping from different sources
data integration
Let's start with first problem - data scraping from different sources. In my projects I usually process data in several steps. I have dedicated scrapers for all specific sites I want, and process them in the following order:
fetch raw page (unstructured data)
extract data from page (unstructured data)
extract, convert and map data into page-specific model (fully structured data)
map data from fully structured model to common/normalized model
Steps 1-2 are scraping oriented and steps 3-4 are strictly data-extraction / data-integration oriented.
While you can easily implement steps 1-2 relatively easy using your own webscrapers or by utilizing existing web services - data integration is the most difficult part in your case. You will probably require some machine-learning techniques (shallow, domain specific Natural Language Processing) along with custom heuristics.
In case of such a messy input like this one I would process lines separately and use some dictionary to get rid Finnish/English words and analyse what has left. But in this case it will never be 100% accurate due to possibility of human-input errors.
I am also worried that you stack is not very well suited to do such tasks. For such processing I am utilizing Java/Groovy along with integration frameworks (Mule ESB / Spring Integration) in order to coordinate data processing.
In summary: it is really difficult and complex problem. I would rather assume less input data coverage than aiming to be 100% accurate (unless it is really worth it).

Refactoring with emacs while editing javascript

Hi so I am writing a lot of server side javascript and I would like the ability to refactor while editing with emacs. Is this possible? Thanks!
By refactor I mean like how in eclipse while editing Java you can refactor one variable called for example "variableOne" into "variable1" and now all other 15 times you wrote "variableOne" becomes "variable1".
Probably the most sophisticated JavaScript refactoring library for Emacs is Magnar Sveen's js2-refactor. Its list of supported refactorings includes
rv is rename-var: Renames the variable on point and all occurrences in its lexical scope.
which sounds a lot like what you're looking for. It also supports a number of other very useful common rafactoring actions.
Assuming you're on Emacs 24, I recommend installing it using the MELPA repository. If you're still on Emacs 23 you'll have to upgrade or manually install package.el before you can MELPA.
If you are looking for just renaming variables, you might also want take a look at tern. The advantage it has compared to js2-refactor (which I use too) is that it has a concept of projects so you can rename a certain variable across multiple files in a project. It also provides other features like jump-to-definition and auto-completions (which are quite accurate).
Here are some general options for renaming a variable
1) Multiple cursors - It has a useful command mc/mark-all-like-this-dwim, which marks all the occurences of the selected text in current context you can then edit all the occurrences simultaneously.
2) Wgrep - This package enables one to apply changes done in grep buffer to respective files. This is useful when I have to replace a word across many files, in such situations use rgrep to search the word in multiple files. Then enable wgrep in the resulting grep buffer, mark the word to replaced with multiple-cursors (you can also use query-replace), make the changes and then do wgrep-save-all-buffers and all my changes are saved!
Your question seems to be more about renaming variables than about refactoring in general. The two places to start for information about using Emacs to rename parts of your code are these:
Emacs Wiki Search and Replace category page. This includes search-and-replace across multiple files (e.g. of your project).
The Emacs manual: use C-h r to enter the manual from Emacs.
Then use hit the key i to look something up in the index (with completion):
i search and replace commands takes you to the section about replacement commands.
i search and replace in multiple files takes you to the section about Searching and Replacing with Tags Tables.
For Emacs support for projects, see the Emacs Wiki Projects category page.

Are there any javascript frameworks for parsing/auto-completing a domain specific language?

I have a grammar for a domain specific language, and I need to create a javascript code editor for that language. Are there any tools that would allow me to generate
a) a javascript incremental parser
b) a javascript auto-complete / auto-suggest engine?
Thanks!
An Example of implementing content assist (auto-complete)
using Chevrotain Javascript Parsing DSL:
https://github.com/SAP/chevrotain/tree/master/examples/parser/content_assist
Chevrotain was designed specifically to build parsers used (as part of) language services tools in Editors/IDEs.
Some of the relevant features are:
Automatic Error Recovery / Fault tolerance because editors and IDEs need to be able to handle 'mostly valid' inputs.
Every Grammar rule may be used as the starting rule as an Editor/IDE may only want to implement incremental parsing for performance reasons.
You may want jison, a js parser generator. In terms of auto-complete / auto-suggest...most of the stuff out there I know if more based on word completion rather than code completion. But once you have a parser running I don't think that part is too difficult..
This is difficult. I'm doing the same sort of thing myself.
One approach is:
You need is a parser which will give you an array of the currently possible ASTs for the text up until the token before the current cursor position.
From there you can see the next token can be of a number of types (usually just one), and do the completion, based on the partial text.
If I ever get my incremental parser working, I'll send a link.
Good luck, and let me know if you find a package which does this.
Chris.

Parse JavaScript to instrument code

I need to split a JavaScript file into single instructions. For example
a = 2;
foo()
function bar() {
b = 5;
print("spam");
}
has to be separated into three instructions. (assignment, function call and function definition).
Basically I need to instrument the code, injecting code between these instructions to perform checks. Splitting by ";" wouldn't obviously work because you can also end instructions with newlines and maybe I don't want to instrument code inside function and class definitions (I don't know yet). I took a course about grammars with flex/Bison but in this case the semantic action for this rule would be "print all the descendants in the parse tree and put my code at the end" which can't be done with basic Bison I think. How do I do this? I also need to split the code because I need to interface with Python with python-spidermonkey.
Or... is there a library out there already which saves me from reinventing the wheel? It doesn't have to be in Python.
Why not use a JavaScript parser? There are lots, including a Python API for ANTLR and a Python wrapper around SpiderMonkey.
JavaScript is tricky to parse; you need a full JavaScript parser.
The DMS Software Reengineering Toolkit can parse full JavaScript and build a corresponding AST.
AST operators can then be used to walk over the tree to "split it". Even easier, however, is to apply source-to-source transformations that look for one surface syntax (JavaScript) pattern, and replace it by another. You can use such transformations to insert the instrumentation into the code, rather than splitting the code to make holds in which to do the insertions. After the transformations are complete, DMS can regenerate valid JavaScript code (complete with the orignal comments if unaffected).
Why not use an existing JavaScript interpreter like Rhino (Java) or python-spidermonkey (not sure whether this one is still alive)? It will parse the JS and then you can examine the resulting parse tree. I'm not sure how easy it will be to recreate the original code but that mostly depends on how readable the instrumented code must be. If no one ever looks at it, just generate a really compact form.
pyjamas might also be of interest; this is a Python to JavaScript transpiler.
[EDIT] While this doesn't solve your problem at first glance, you might use it for a different approach: Instead of instrumenting JavaScript, write your code in Python instead (which can be easily instrumented; all the tools are already there) and then convert the result to JavaScript.
Lastly, if you want to solve your problem in Python but can't find a parser: Use a Java engine to add comments to the code which you can then search for in Python to instrument the code.
Why not try a javascript beautifier?
For example http://jsbeautifier.org/
Or see Command line JavaScript code beautifier that works on Windows and Linux
Forget my parser. https://bitbucket.org/mvantellingen/pyjsparser is great and complete parser. I've fixed a couple of it's bugs here: https://bitbucket.org/nullie/pyjsparser

Categories