Building Chrome Extensions on Existing Production React Websites - javascript

Background: I've been developing a bunch of browsers extensions on production sites (Yelp, Zillow, Trulia, Reddit) which use react. I've yet to take a course on React (I'm planning on doing it) but my questions are:
How stably named are the classes in production react sites (many of the classes have weird numbers and letters appended) and if they are not stable, how often do they change and is there any way to get a more stable selector for these types of items?
When classes are completely non-human readable, is there any way to view the class name in a more human readable format? e.g. <div class="_2jJNpBqXMbbyOiGCElTYxZ">
I'd hate to build these extensions and have them break whenever there is a minor release (I know they will break as the site is significantly updated but would prefer if they were stable for minor releases).
Example: Targeting a span like this
<span class="lemon--span__373c0__3997G text__373c0__2Kxyz reviewCount__373c0__2r4xT text-color--black-extra-light__373c0__2OyzO text-align--left__373c0__2XGa-">865</span>
with a queryselection like this:
const ratingCountTarget = result
.closest('.mainAttributes__373c0__1r0QA')
.querySelector('.reviewCount__373c0__2r4xT');

There's no way to get the original names and nothing precludes the site developers from updating the random parts any day or several times a day so find a way to not depend on the exact names.
Try finding the non-randomized attributes
Use relations between elements (combinators)
Use partial matching like foo.querySelector('[class*="reviewCount"]')
And get ready to having your extension inevitably break even if only occasionally.

Related

Diffing two texts in Rails and printing human readable HTML output

I'm using the paper_trail gem to auto-create a versions history of my Page model.
In pages#show, I display the versions like so:
The most important element is the diff which shows the difference between the previous and the currently displayed version of a field, e.g. content. It looks like a diff on GitHub, and it is marked up well using <ins> for insertions and <del> for deletions.
The sad thing is that at the time being, I generate this diff using https://code.google.com/p/google-diff-match-patch/, a JavaScript library that's run in the browser of the user. I've done it this way because I didn't manage to find a Ruby gem or similar that does the same in a similar elegant way.
Well, I found https://github.com/samg/diffy and https://github.com/pvande/differ, but the diff of both gems aren't close as elegant: differ needs to know manually whether to diff by line, word, or character (while the used JavaScript decides this automatically and uses a combination of these options which feels very intuitive to me), and diffy doesn't offer an option for this. I don't know exactly how the JavaScript works, but it states that "Myer's diff algorithm" is used internally:
This library implements Myer's diff algorithm which is generally considered to be the best general-purpose diff. A layer of pre-diff speedups and post-diff cleanups surround the diff algorithm, improving both performance and output quality.
You can try it here: https://neil.fraser.name/software/diff_match_patch/svn/trunk/demos/demo_diff.html
Maybe use the following two strings to see a typical example of two versions of a page:
This is the about page.
Put markdowwn formated content here.
and
This is the about page.
Put markdown formatted here content.
A [link to page 11](11). And another one: [](11).
It results in something like this:
The problem with this approach is that it's run in the browser, so I can't mutate the generated code in my Rails application anymore. So I wonder whether there's an easy way to get similar diff results using e.g. a command line tool like diff? Maybe even git could be of use?

Best way to scrape a set of pages with mixed content

I’m trying to show a list of lunch venues around the office with their today’s menus. But the problem is the websites that offer the lunch menus, don’t always offer the same kind of content.
For instance, some of the websites offer a nice JSON output. Look at this one, it offers the English/Finnish course names separately and everything I need is available. There are couple of others like this.
But others, don’t always have a nice output. Like this one. The content is laid out in plain HTML and English and Finnish food names are not exactly ordered. Also food properties like (L, VL, VS, G, etc) are just normal text like the food name.
What, in your opinion, is the best way to scrape all these available data in different formats and turn them into usable data? I tried to make a scraper with Node.js (& phantomjs, etc) but it only works with one website, and it’s not that accurate in case of the food names.
Thanks in advance.
You may use something like kimonolabs.com, they are much easier to use and they give you APIs to update your side.
Remember that they are best for tabular data contents.
There my be simple algorithmic solutions to the problem, If there is a list of all available food names this can be really helpful, you find the occurrence of a food name inside a document (for today).
If there is not any food list, You may use TF/IDF. TF/IDF allows to calculate the score of a word inside a document among the current document and also other documents. But this solution needs enough data to work.
I think the best solution is some thing like this:
Creating a list of all available websites that should be scrapped.
Writing driver classes for each website data.
Each driver has the duty of creating the general domain entity from its standard document.
If you can use PHP, Simple HTML Dom Parser along with Guzzle would be a great choice. These two will provide a jQuery like path finder and a nice wrapper arround HTTP.
You are touching really difficult problem. Unfortunately there are no easy solutions.
Actually there are two different parts to solve:
data scraping from different sources
data integration
Let's start with first problem - data scraping from different sources. In my projects I usually process data in several steps. I have dedicated scrapers for all specific sites I want, and process them in the following order:
fetch raw page (unstructured data)
extract data from page (unstructured data)
extract, convert and map data into page-specific model (fully structured data)
map data from fully structured model to common/normalized model
Steps 1-2 are scraping oriented and steps 3-4 are strictly data-extraction / data-integration oriented.
While you can easily implement steps 1-2 relatively easy using your own webscrapers or by utilizing existing web services - data integration is the most difficult part in your case. You will probably require some machine-learning techniques (shallow, domain specific Natural Language Processing) along with custom heuristics.
In case of such a messy input like this one I would process lines separately and use some dictionary to get rid Finnish/English words and analyse what has left. But in this case it will never be 100% accurate due to possibility of human-input errors.
I am also worried that you stack is not very well suited to do such tasks. For such processing I am utilizing Java/Groovy along with integration frameworks (Mule ESB / Spring Integration) in order to coordinate data processing.
In summary: it is really difficult and complex problem. I would rather assume less input data coverage than aiming to be 100% accurate (unless it is really worth it).

efficient search in mongodb v2.4

I'm using version 2.4 of mongodb which is working fine for my needs, except one thing, i.e. searching as it doesn't support some advanced options like $search. So, is there a way to implement that kind of searching in v2.4. The reason i'm sticking to older version is because i don't want to lose any of my data by upgrading and also i don't want to stop live mongo server.
The result i want should be something similar as this query's result:
db.data.find({$text: { $search: 'query' } }, { score: {$meta: "textScore" }})
This query is working fine for latest versions of mongoDB. And also if you people suggest me to use the latest version, please provide some references which can help me safely upgrading mongodb.
This is a little bit of a catch 22, introduced mainly by text search capabilities being considered "experimental" in earlier versions. Aside from being in an earlier development phase, the implementation is entirely different due to the fact that the "whole" query and index API has been re-written for MongoDB 2.6, largely in order to support the new types of indexes available and make the API for working with the data consistent.
So prior versions implement text search via the "command" interface directly and only. Things work a little differently and the current "deprecation" notice means that working in this way will be removed. But the "text" command will presently still operate as shown in the earlier documentation:
db.data.runCommand("text", { "search": "query" })
So there are limitations here as covered in the existing documentation. Notably being that the number of documents returned are those contained the "limit" argument to that command and there is no concept of "skip". Also that this is a "document" response and not a cursor, so the total results cannot exceed the BSON limit of 16MB.
That said, a little off topic but consider your MongDB 2.6 deployment scenario, and mostly on the following.
Future Proofing. In the earlier forms this is an experimental feature. So any general flaws and problems are not going to generally be "backported" in any way with fixes while you hang on to the version. Some may, but without a good reason to do so this mostly wanes over time. Remember this is "experimental" so due warning was given about use in production.
Consistency/Deprecation. The API for "text" and "geospatial" has changed. So implementation in earlier releases is different and "deprecated", and will go away. The right way is to have the same structure as other queries, and consistently use it in all query forms rather than a direct command.
Deployment. You say you don't wan't to stop the server, but you really should not have one server anyway. Apart from being out of the general philosophy of why you need MongoDB anyway, at the very least a "replica set" is a good idea for data redundancy and the "uptime" of your application. Removing a single point of failure means that you can individually "bring down" discrete nodes and "upgrade" without affecting application downtime.
So that strays "a little" off the programming topic, but for me, the last point is the most important. Better to make sure your application is without the failure points by building this into your deployment architecture. This then makes "staying ahead of the curve" a simpler decision. It is always worth noting the "experimental" clause with technologies before rolling out to production. Cover your bases.

Refactoring with emacs while editing javascript

Hi so I am writing a lot of server side javascript and I would like the ability to refactor while editing with emacs. Is this possible? Thanks!
By refactor I mean like how in eclipse while editing Java you can refactor one variable called for example "variableOne" into "variable1" and now all other 15 times you wrote "variableOne" becomes "variable1".
Probably the most sophisticated JavaScript refactoring library for Emacs is Magnar Sveen's js2-refactor. Its list of supported refactorings includes
rv is rename-var: Renames the variable on point and all occurrences in its lexical scope.
which sounds a lot like what you're looking for. It also supports a number of other very useful common rafactoring actions.
Assuming you're on Emacs 24, I recommend installing it using the MELPA repository. If you're still on Emacs 23 you'll have to upgrade or manually install package.el before you can MELPA.
If you are looking for just renaming variables, you might also want take a look at tern. The advantage it has compared to js2-refactor (which I use too) is that it has a concept of projects so you can rename a certain variable across multiple files in a project. It also provides other features like jump-to-definition and auto-completions (which are quite accurate).
Here are some general options for renaming a variable
1) Multiple cursors - It has a useful command mc/mark-all-like-this-dwim, which marks all the occurences of the selected text in current context you can then edit all the occurrences simultaneously.
2) Wgrep - This package enables one to apply changes done in grep buffer to respective files. This is useful when I have to replace a word across many files, in such situations use rgrep to search the word in multiple files. Then enable wgrep in the resulting grep buffer, mark the word to replaced with multiple-cursors (you can also use query-replace), make the changes and then do wgrep-save-all-buffers and all my changes are saved!
Your question seems to be more about renaming variables than about refactoring in general. The two places to start for information about using Emacs to rename parts of your code are these:
Emacs Wiki Search and Replace category page. This includes search-and-replace across multiple files (e.g. of your project).
The Emacs manual: use C-h r to enter the manual from Emacs.
Then use hit the key i to look something up in the index (with completion):
i search and replace commands takes you to the section about replacement commands.
i search and replace in multiple files takes you to the section about Searching and Replacing with Tags Tables.
For Emacs support for projects, see the Emacs Wiki Projects category page.

Equivalent of SPContet.Current.ListItem in Client Object Model (ECMAScript)

I'm integrating an external application to SharePoint 2010 by developing custom ribbon tabs, groups, controls and commands that are made available to editors of a SharePoint 2010 site. The ribbon commands use the dialog framework to open dialogs with custom application pages.
In order to pass a number of query string parameters to the custom applications pages, I'm therefore looking for the equivalent of SPContext.Current.ListItem in the Client Object Model (ECMAScript).
Regarding available tokens (i.e. {ListItemId} or {SelectedItemId}) that can be used in the declarative XML, I already emitting all tokens, but unfortunately the desired tokens are not either not parsed or simply null, while in the context of a Publishing Page (i.e. http://domain/pages/page.aspx). Thus, none of the tokes that do render, are of use to establishing the context of the calling SPListItem in the application page.
Looking at the SP.ClientContext.get_current() provides a lot of information about the current SPSite, SPWeb etc. but nothing about the current SPListItem I'm currently positioned at (again, having the page rendered in the context of a Publishing Page).
What I've come up with so far is the idea of passing in the url of the current page (i.e. document.location.href) and parse that in the application page - however, it feels like I'm going in the wrong direction, and SharePoint surely should be able to provide this information.
I'm not sure this is a great answer, or even fully on-topic, but is basically something I originally intended to blog about - anyway:
It is indeed a pain that the Client OM does not seem to provide a method/property with details of the current SPListItem. However, I'd venture to say that this is a simple concept, but actually has quite wide-ranging implications in SharePoint which aren't apparent until you stop to think about it.
Consider:
Although a redirect exists, a discussion post can be surfaced on 2 or 3 different URLs (e.g. Threaded.aspx/Flat.aspx)
Similarly, a blog post can exist on a couple (Post.aspx/EditPost.aspx, maybe one other)
A list item obviously has DispForm.aspx/EditForm.aspx and (sort of) NewForm.aspx
Also for even for items with an associated SPFile (e.g. document, publishing page), consider that these URLs represent the same item:
http://mydomain/sites/someSite/someLib/Forms/DispForm.aspx?ID=x, http://mydomain/sites/someSite/someLib/Filename.aspx
Also, there could be other content types outside of this set which have a similar deal
In our case, we wanted to 'hang' data off internal and external items (e.g. likes, comments). We thought "well everything in SharePoint has a URL, so that could be a sensible way to identify an item". Big mistake, and I'm still kicking myself for falling into it. It's almost like we need some kind of 'normalizeUrl' method in the API if we wanted to use URLs in this way.
Did you ever notice the PageUrlNormalization class in Microsoft.SharePoint.Utilities? Sounds promising doesn't it? Unfortunately that appears to do something which isn't what I describe above - it doesn't work across the variations of content types etc (but does deal with extended web apps, HTTP/HTTPS etc).
To cut a long story short, we decided the best approach was to make the server emit details which allowed us to identify the current SPListItem when passed back to the server (e.g. in an AJAX request). We hide the 'canonical' list item ID in a JavaScript variable or hidden input field (whatever really), and these are evaluated when back at the server to re-obtain the list item. Not as efficient as obtaining everything from context, but for us it's OK because we only need to resolve when the user clicks something, not on every page load. By canonical, I mean:
SiteID|WebID|ListID|ListItemID
IIRC, one of the key objects has a CanonicalId property (or maybe it's internal), which may help you build such a string.
So in terms of using the window.location.href, I'd avoid that if you're in vaguely the same situation as us. Suggest considering an approach similar to the one we used, but do remember that there are some locations (e.g. certain forms) where even on the server SPContext.Current.ListItem is null, despite the fact that SPContext.Current.Web (and possibly SPContext.Current.List) are populated.
In summary - IDs are your friend, URLs are not.

Categories