How to make Apostrophe/Node.js content pages crawlable?

How to make Apostrophe/Node.js content pages crawlable? - javascript

Obviously Apostrophe CMS code is javascript-based, so I'm wondering to what extent Apostrophe pages are 'properly' indexable (i.e. "Ready for JavaScript Crawling & Indexing"?).
The reason I ask this is because of this article, in which Moz tested a number of javascript-derived content pages and found them lacking from an SEO perspective.
So can Apostrophe / Node.js pages be properly indexed by search engines?

I'm the lead developer of Apostrophe at P'unk Avenue.
The article you're referring to is talking about crawling URLs that exist only due to JavaScript in the browser, such as React apps with a JavaScript-based router in them.
The issue here is that Google can't naively crawl them just by fetching pages from the server and parsing links in the returned HTML. It has to either run the JavaScript itself like a browser would or use other tricks where the server does extra work just for Google, etc.
Apostrophe is powered by Node.js, which is JavaScript running on the server. The pages it sends out are basically indistinguishable from pages generated by any other CMS or even a static folder full of webpages — the server sends perfectly valid HTML that doesn't require any browser-side JavaScript at all in order to read all of the text and find all of the links. The fact that the server happens to have a JavaScript engine inside it is irrelevant to Google because it is receiving normal HTML from that server. That server could be running PHP or Ruby or Python or CrazyMadeUpLanguage... doesn't matter to Google as long as it's sending HTML to the world.

Related

Combining Python and Javascript in a chrome plugin

I'm writing a chrome extension that injects a content script into every page the user goes to. What i want to do is to get the output of a python function for some use in the content script (can't write it in javascript, since it requires raw sockets to connect to my remote SSL server).
I've read that one might use CGI and Ajax or the like, to get output from the python code into the javascript code, but i ran into 3 problems:
I cannot allow hosting the python code on a local server, since it is security sensitive data that only the local computer should be able to know.
Chrome demands that HTTP and HTTPS can not mix- if the user goes to an HTTPS website, i can't host the python code on a HTTP server.
I don't think Chrome even supports CGI on extensions-when i try to access a local file, all it does is print out the text (the python code itself) instead of what i defined to be its output (I tried to do so using Flask). As I said in 1, I shouldn't even try this anyway, but this is just a side note.
So my question is, how do i get the output of my python functions inside a Content Script, built with javascript?

Native Messaging may be your answer here.
You can designate a Native Host application that an extension or app will be able to invoke (probably through a background page in case of an extension), and that can be a Python script.
In fact, the sample app for it uses a Python script.

The only way to get the output of a Python script inside a content script built with Javascript is to call the file with XMLHttpRequest. As you noted, you will have to use an HTTPS connection if the page is served over HTTPS. A workaround for this is to make a call to your background script, which can then fetch the data in whichever protocol it likes, and return it to your content script.

a website preview - loading a webpage using javascript Or Server-side

I want to show website preview on a link similar to facebook when a user post a link. my question has been repeated in the following link ,but I am going to ask specific information throughout my solutions. I have 2 solutions for showing webpage preview which are as follows:1. server side html process 2. client side html process.
1. server-side html process
I used System.Net.WebClient().DownloadString(url) to retrieve the web page data in server side , and I tried to extract the most important information in the page ,but in most cases, main part of the page loads using javascript , therefore I do not have access to that information.
Another solution in server-side html process is to work with webBrowser and WebDocument objects. because I didn't work with these libraries and I don't know how much the Web server performance affect by applying this objects , I only present this solution for discussion .Therefore are there any server-side html graber which fetch all html data including javascript loaded html source?
2. Client Side Html process
The simplest approach for client side is to use the iframe tag, but it has two following problems:
a. I can not access to innerHTML of the frame for the links on other domains.
b. I can not load https webpages such as drop-box and facebook in the iframe
because of "x-frame options" error.
My question is that, is there any other client-side solution to retrieve dynamic html source(loaded by javascript) from 3rd party webpages (usually https)? Or can I solve above problems with some tricks.

I guess server side approach would be most viable option. On client side you can use proxy services which allow to solve cross domain limitation, for example, crossorigin.
To generate a preview, similar to one Facebook provides, you need to get Open Graph information for target page. Libraries to process open graph data available for multiple platforms. OpenGraph-Net could be used on .NET plarform.

Pentest pure JavaScript (qooxdoo) Website

I'm wondering how I could Pentest a website made completely in JavaScript, for example using the qooxdoo Framework.
Those websites do not contain any requests to the server which respond with HTML content. Only one Javascript file gets transmitted when loading the page (which is an almost empty html page with just the link to the javascript file) and the page is then beeing set up by the loaded JS file, without any line of HTML written by the developer.
Typically, there would be some Spidering/Crawling in most Web App Scanners (like Nexpose), which check a website for links and forms and crawl any link they find which directs to the same domain and test any parameter found on these links. I assume those scanners would not have any effect on a pure JS page.
Then there's the other possibility: A proxy server (like Burp Suite) which captures any traffic beeing sent to a server and is able to check any found parameters on this requests. This would probably work to test the API-Server which is located behind the Website (for example to find SQL injections).
But: Is there any way to test the client, for example for XSS (self or stored)?
Or more in general: What types of attacks would you typically need to check in such a pure JS web application? What tools could help with that?

Can a 3rd party JS app submit HTML snapshots?

I am relatively new to the world of HTML snapshots and JavaScript so I apologize if this is not that hard.
The app we make at our company uses JavaScript to dynamically load image and text content on to a webpage. As you know, JS rendered content doesn't get indexed by search engines. However, I have learned of the otpion called HTML snapshots where you can feed to Google and other search engines all the rendered HTML of the page and it will consume it as long as you follow their guidelines.
My question is that since my script is a 3rd party script that can be embedded on x amount of pages, can I still somehow leverage HTML snapshots or will my clients need to do that?

Although I have not work with this technology yet, I believe it depends by your application and by whom create the data (your client server or you library).
If a lot of content is generated at server side level, server should creates the snapshot.
If a lot of content is generated/manipulated at client, client could creates a HTML snapshot. For example using HtmlUnit.
More info on this page:
https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot

Windows 8 Market App, JS, access remote XML file?

I've ran into Same Origin Policy issues before while doing standard web development. I usually end up writing a vb.net web service as a proxy. However, now that I'm trying to dabble in Windows 8 development using Javascript (due to familiararity) I'm wondering what my options are to avoid the issue.
All I need to do is fetch a remote XML file and display information from it.

You can make a WinJS.xhr call to the xml file directly without problem as long as you have the Internet (Client) capability enabled (which it is by default). I do it all the time in several applications.
I'm assuming all you want to do is download an xml doc and work with the data.
You should check out http://msdn.microsoft.com/en-us/library/windows/apps/hh441295.aspx if you looking at sending cross document messages. If you want the deep discussion on dynamic web content, security contexts, etc. - this is a good place to start, though a bit dated http://channel9.msdn.com/Events/BUILD/BUILD2011/APP-476T

We Keep Coding

JavaScript is the programming language of the Web.