Most social media sites have a feature where you can type in a link and the site will generate a link preview of it. See example below from Google+
Let's say I'd like to build my own. I'm using Ruby on Rails as a web framework but that's irrelevant as I imagine I'll have to use JS to fetch this client-side right?
Where do I look for this data? I know it's usually in the <meta> tags, but is that standard? When I tried it for a few links only the description was in the <meta> tags. The image and title didn't match anything else in the meta tags.
How do I go about fetching a remote document asynchronously and parsing it's tags? If anyone could point me to an example I'd be grateful.
Thanks!
There are three common ways how authors might provide this data in HTML documents (from least expressive to most expressive):
Metadata in the head element: This is plain HTML, i.e.,
meta elements (with defined/registered values for the name attribute),
link elements (with defined/registered values for the rel attribute), and
the title element.
Microformats: Still using plain HTML, but together with specific class names. All Microformats are described in their wiki.
Structured data: Using extending/additional syntaxes (JSON-LD, Microdata, RDFa, …) and vocabularies (Schema.org, Open Graph Protocol, Dublin Core …).
You’ll typically find suitable parsers in your programming languages.
You’ll probably find that most sites make use of Open Graph Protocol (in RDFa), as this is used by Facebook and Twitter. Probably followed by Schema.org (in JSON-LD/Microdata/RDFa), as this is sponsored by the major search engines.
Note that 2. and 3. also allow authors to provide data about entities described on (or relevant to) the page, i.e., not every extracted data is suitable for link previews, so you have to take the context into account.
Related
My website sells stuff and I would like to customize the page title and meta description in certain pages when certain items are viewed. I would want these custom titles and descripts to be listed when shared on other websites. E.g.: Twitter, FB, etc...
Basically I want to customize the title and description based on the query string values. How is this possible? I've looked a js based plugin or similar on github as well but had no luck.
The issue with JavaScript rendered pages is that there's no guarantee a scraper will pick up the meta tags. The headers are read before scripts are run - meaning to lower overhead they likely won't even bother executing it. Better if you write a script that writes customised meta tags to the HTML file.
I am new to SharePoint and I like to execute some custom JavaScript on a form. In this articel the author outlines an approach where he adds a "Script Editor" webpart to the form that hosts the html & JavaScript.
This approach seems to be odd, because the Script is not executed as part of the form iteself (see webpart contains the entire html including head, body....).
So my questions is: How do I execute custom JavaScript on a form? An what is the best approach regarding the deployment?
If we're talking about server side development, here's my favorite method: JSLink and Display Templates. By using this approach you have a high degree of control ever the scope where you'd like to load your JavaScript code.
There are a vast number of objects you can attach JSLink references to
but the ones we are really interested are:
Site Columns
Content Types
List Views
List Forms (e.g. New / Edit / Display forms)
List View Web Parts
List Form Web Parts
Regarding the deployment, it depends on the scope you would like to apply:
When you are constructing your JSLink URL there are a number of tokens
you can take advantage of:
~site – reference to the current SharePoint site (or “Web”)
~sitecollection – reference to the current SharePoint site collection (or “Site”)
~layouts – version specific reference to the web application Layouts folder (so it will automatically swap out /_layouts/14 or /_layouts/15
for you)
~sitecollectionlayouts – reference to the layouts folder in the current site collection (e.g. /sites/team/_layouts/15)
~sitelayouts – reference to the layouts folder in the current site (e.g. /sites/teams/subsite/_layouts/15)
Basically, I'd recommend you to go through all the parts of the aforementioned tutorial in order to understand the power of JSLink.
Using a ContentEditorWebPart is a bit like loading an entire page into a DIV element. Because of this, you can access all the "Parent's" elements easily using document.getElementById selectors.
This practice is pretty widespread throughout the industry as it is the only way of injecting html/javascript without using SharePoint Designer or placing files in the SharePoint hive.
Usually you create a html or js file. Add it to a document library with version control enabled, and then reference that page from the ContentEditorWebPart
I have a webpage that works and all is swell. It is coded using mostly good practises of external css files and minimal inline styles/code. All is well.
Now however, I want to send that page via HTML text only, such as in an email. So there should be no external references to external sites at all. Meaning I now must move my beautiful external references, internally.
I am thinking I can write a javascript function that finds every class of an object, removes it from that class, then gives that object inline "style" attributes equal to what the class has.
But I was wondering if anyone else has other suggestions.
The end goal is to get a wall of text, that when pasted in a non-internet connected browser with no cache or anything, will display exactly what I have on the screen of my "normal operations" page.
There is a perl CPAN module for this:
CSS::Inliner
you can also find the source on github:
https://github.com/kamelkev/CSS-Inliner
Is there anyway to get access to stackoverflow's awesome tagging system? I would like to borrow Stack's awesome auto-suggest and tag mini-explanation boxes for my own site. Obviously, I can use the jQuery UI auto-suggest for tags but I would really like to also include the cool little tag descriptions as well. If not, can someone tell me where all these explanation/descriptions came from so that I can implement a similar system?
tageditornew.js
Line 308:
$.get("/filter/tags", {q: a,newstyle: !0}, "json").done(function(c) {
C["t_" + a] = c;
StackExchange.helpers.removeSpinner();
b(c)
})
This might help you out!
It turns out that,
the API url is this:
https://stackoverflow.com/filter/tags?q=STRING&newstyle=BOOLEAN
q - Query text.
newstyle - Require new style or not. Result in new style will be returned in JSON with additional information such as synonyms and excerpt.
DEMO: http://jsfiddle.net/DerekL/bXXb7/ (with Cross Domain Requests jQuery plguin)
For example:
https://stackoverflow.com/filter/tags?q=htm
would give you:
"html|99829\nhtml5|16359\nxhtml|4143\nhtml-parsing|1461\nhtml-lists|1328\nhtml5-video|949"
where 99829 is the amount of questions. It took me 15 minutes looking at the source code to find out this api. -_-"
Putting in javascript in new style gives you this: here
[{"Name":"javascript","Synonyms":"classic-javascript|javascript-execution","Count":223223,"Excerpt":"JavaScript is a dynamic language commonly used for scripting in web browsers. It is NOT the same as Java. Use this tag for questions regarding ECMAScript and its dialects/implementations (excluding ActionScript and JScript). If a framework or library, such as jQuery, is used, include that tag as well. Questions that don't include a framework/library tag, such as jQuery, implies that the question requires a pure JavaScript answer."},{"Name":"javascript-events","Synonyms":"javascript-event","Count":5707,"Excerpt":"Creating and handling JavaScript events inline in HTML or through a script."},{"Name":"facebook-javascript-sdk","Synonyms":"","Count":992,"Excerpt":"Facebook's JavaScript SDK provides a rich set of client-side functionality for accessing Facebook's server-side API calls. These include all of the features of the REST API, Graph API, and Dialogs."},{"Name":"javascript-library","Synonyms":"","Count":675,"Excerpt":"A JavaScript library is a library of pre-written JavaScript which allows for easier development of JavaScript-based applications, especially for AJAX and other web-centric technologies."},{"Name":"javascript-framework","Synonyms":"","Count":563,"Excerpt":"A JavaScript framework is a library of pre-written JavaScript which allows for easier development of JavaScript-based applications, especially for AJAX and other web-centric technologies."},{"Name":"unobtrusive-javascript","Synonyms":"","Count":340,"Excerpt":"Unobtrusive JavaScript is a general approach to the use of JavaScript in web pages."}]
What you can get from there:
All tags start with javascript
Synonyms
Tag counts
Nice tag descriptions
If you're looking for high-level logic, in a nutshell it's just a custom auto-complete that's blazing-fast.
Whenever you type a tag (i.e. a new word or one separated by a space from previous tags), an AJAX request would be made to the server with a JSON object which is then interpreted by the client-side script and presented in the usable layout.
Comparing the autocomplete JSON objects for letter "h" and word "html" should give you enough insight into how this particular implementation works (if prompted, these can be opened with any text editor).
On a somewhat unrelated note: the autocomplete responses have to be fast. Depending on the complexity of the data autocomplete is run against, you may find how IMDb magic search works intriguing.
Update:
Seeing your comment about accessing the content of the tag library, this may in fact be more of a meta question. I struggle to think of a scenario where using an API if any or just the tag library from an external resource would be beneficial to SO - however content here is provided under Creative Commons so you may be able to use it with proper attribution. This does not constitute legal advice :)
so what I want to mimic is the link share feature Facebook provides. You simply enter in the URL and then FB automatically fetches an image, the title, and a short description from the target website. How would one program this in javascript with node.js and other javascript libraries that may be required? I found an example using PHP's fopen function, but i'd rather not include PHP in this project.
Is what I'm asking an example of webscraping? Is all I need to do is retrieve the data from inside the meta tags of the target website, and then also get the image tags using CSS selectors?
If someone can point me in the right direction, that'd be greatly appreciated. Thanks!
Look at THIS post. It discusses scraping with node.js.
HERE you have lots of previous info on scraping with javascript and jquery.
That said, Facebook doesn't actually guess what the title and description and preview are, they (at least most of the time) get that info from meta tags present in the sites that want to be more accessible to fb users.
Maybe you could make use of that existing metadata to pull titles, descriptions and img previews. The docs on the available metadata is HERE.
Yes web-scraping is required and that's the easy part. The hard part is the generic algo to find headings and relevant texts and images.
How to scrape
You can use jsdom to download and create a DOM structure in your server and scrape that using jquery on your server. You can find a good tutorial at blog.nodejitsu.com/jsdom-jquery-in-5-lines-on-nodejs as suggested by #generalhenry above.
What to scrape
I guess a good way to find the heading would be:-
var h;
for(var i=6; i<=1; i++)
if(h = $('h'+i).first()){
break;
}
Now h will have the title or undefined if it fails. The alternative for this could be simply get the page's title tag. :)
As for the images. List all or first few images on that page which are reasonably large, i.e. so as to filter out sprites used for buttons, arrows, etc.
And while fetching the remote data make sure that ProcessExternalResources flag is off. This will ensure that script tags for ads do not pollute the fetched page.
And yes the relevant text would be in some tags after h.