Meteor - How to exclude certain paths from being crawled?

Meteor - How to exclude certain paths from being crawled? - javascript

In Meteor, I have installed the spiderable package, which allows the application to be crawled by search engines. However, I want to exclude certain paths from being crawled.
For example, example.com/abc/[path] should not be crawled, whereas example.com/[path] should be.
I am unsure of how to do this. One guess is to include a robots.txt in the /public directory, and use regex as described here. However, the url doesn't contain the #! as it did in this question. Is that relevant?
My current implementation is a bit more complicated, and it's based on the following quote from the package's README.md:
In order to have links between multiple pages on a site visible to
spiders, apps must use real links (eg ) rather than
simply re-rendering portions of the page when an element is clicked.
At the moment, when the page is rendered, I test whether there's a /abc in the root of the path, and then set a persistent session variable. This allows me to make all paths in my pages' links not contain the /abc prefix. When a link is clicked, it will check whether the session variable is set and append to the path in an onBeforeAction() function, which allows the right template to be rendered. In doing so, I am hoping those links won't be visible to the spider, but I am unsure of the reliability of such a method.
tl;dr - How to exclude certain paths from being crawled in Meteor?

It kind of depends on what you're doing with the folders you don't want crawled. If they're just going to be used on the server side, you can use the /private/ folder. If you want them accessible, but uncrawlable, you can build in access to folders with a /.period/ in them, which makes them invisible to Meteor, but you can access via the connectHandlers and webApp properties similar to my answer here.
If you want them to be processed by Meteor as normal (e.g. javascript files) but then be inaccessible to the spiderable package, I'd suggest asking in meteor-core.

Related

Absolute links within own domain in Gatsby

I would like to create an absolute link to files (images, downloads, ...) in my static directory, so during development the link would be http://localhost/myimage.jpg and once deployed it should be https://www.example.com/myimage.jpg.
Is there an easy way (e.g. an environment variable etc.) to reference the current domain name in Gatsby?
I would like to achieve this behavior without changing any config files.
A relative link is not an option in my case. The reason is that I am using the plugin gatsby-plugin-react-intl which modifies all relative links by adding the locale in front of it, meaning, the above mentioned link would automatically turn into https://www.example.com/en/myimage.jpg instead of https://www.example.com/myimage.jpg.

The default behavior is what you described. Using the static folder, a structure like /static/myimage.jpg will be referenced as https://somedomain.com/myimage.jpg.
The same approach applies to PDF or any static asset you wish to make available.
Following that, a structure like: /static/images/myimage.jpg will become https://somedomain.com/images/myimage.jpg.
Keep in mind that the static folder structure is "cloned" inside the public one once the site is built so everything inside will become public. /static becomes the root path once cloned.
I know that this is the default behavior. The problem is, that I'm
using react-intl (or more specifically, gatsby-plugin-react-intl),
which modifies all these paths by adding the locale in front of it,
e.g. <img src='/myimage.jpg' /> becomes <img src='/en/myimage.jpg />.
Using gatsby-image for all purposes is not possible, because it
doesn't work with SVGs etc. I haven't figured out how to stop the
plugin from doing this and I thought knowing how to refer to a your
own domain in js (or react) would be something worth knowing.
Well, there are multiple ways to point to your current domain. From the window.location.origin (check caveats of using window or global objects in SSR) to the default location props exposed by Gatsby (because its extension from #reach/router, from React). Note that location props will be only available in top-level components (pages) but you can drill down or use it as you wish, as any other property.
In addition, the gatsby-plugin-react-intl plugin exposes some methods that provide the current language information, useful to build your own static path to the asset. Using:
const { locale } = useIntl();
Will give you the current locale. You can use it to build your own asset like:
<img src={`/${locale}/images/myimage.jpg`} />
This path will work whether you are in local development or in your hosted project domain, because of the relativity. If you still want to use the full domain, you can use the previously explained location (React-based approach) or the window.location (JavaScript-based approach).
Regarding the automatic modification of static assets and paths, I'm afraid it's the default plugin's behavior and I haven't found any way or exposed method to change it otherwise.

How to exclude a subfolder of compiled resources from a Sonar analysis?

I am trying to integrate Sonarqube analysis into the JavaScript sources of my project. It is a project using Spring components for the back-end, and as a first step, we did the integration of Java sources, without problem at that point.
We are using Sonarqube v5.6.3
The problem I am finding comes with the sonar.exclusions property. Apparently, that property can't exclude a folder that has already been added as sources (see question and answer explaining that exact issue).
I have the following lines in my pom.xml, which are not working properly; and that's understandable according to the aforelinked question:
<sonar.sources>src/main/java,src/main/docker,js-sources</sonar.sources>
<sonar.tests>src/test</sonar.tests>
<sonar.exclusions>**/target/*</sonar.exclusions>
The problem is: the front-end is made of several modules which are compiled one by one under their own /target sub-folder before being deployed all together into src/main/webapp. (They work as regular target folders: when a new compilation is launched, those folders get deleted/recreated.)
Those js-sources/moduleA/target, js-sources/moduleB/target, js-sources/moduleC/target folders are being automatically included as sources, and thus ignored by the exclusions directive. Those target folder still contain a /src subfolder, which makes it hard to use the limited Sonar patterns (full xpath-like selectors are not allowed) to include or exclude only certain paths.
As I don't think that the Sonarqube team was expecting everyone to add each little subfolder one by one (that's why they made patterns in the first term), I am looking for help: How do I exclude those per-module target folders living down the folder-tree inside my sources?
Another possibility would be that it is kind of a bug forcing us to store this config at a Jenkinsfile or even directly in the Jenkins config (at a job level), but I remain unsure and still think that something can be fixed in the way I am declaring the sources and exclusions.

Try
<sonar.exclusions>**/target/**/*</sonar.exclusions>

EDIT : while inclusions are useful in other cases, the accepted answer above is the correct one. I'm leaving mine, which follows, for the record and just as an example of using inclusions.
Try using inclusions rather than exclusions, I've setup a project as close to yours as I could guess from your description and I was able to ignore the target folders of the js-sources modules :
<properties>
<sonar.sources>src/main/java,js-sources</sonar.sources>
<sonar.inclusions>**/*.java, **/src/**/*.js</sonar.inclusions>
</properties>
You can read this as : 'scan all java files no matter where they are, scan only the javascript files that are found within the src of a subfolder of root'

What is the correct way to specify relative paths in streamed CSS?

I'm working in Firefox and relative paths are not working.
One caveat is that I stream my .css file using AJAX and add it to the DOM dynamically.
Another caveat is that my site is entered in one of two ways:
www.host.com (use this for production)
or
www.host.com/dev/ (use this for dev)
Images are either here:
www.host.com/host/images
or
www.host.com/dev/host/images
depending upon how you enter the site.
I can post any information needed and test out a solution.
I was using
../images/name.jpg
but the browser somehow took this for:
hosts.com/images/name.jpg
which does not exist.
This is a question about relative paths and implementing correctly.

Absolute Path URLs
Absolute paths are called that because they refer to the very specific location, including the domain name. The absolute path to a Web element is also often referred to as the URL. For example, the absolute path to this Web page is:
What is the correct way to specify relative paths in streamed CSS?
You typically use the absolute path with the domain to point to Web elements that are on another domain than your own. For example, if I want to link to google it would be ...
If you're referring to a Web element that is on the same domain that you're on, you don't need to use the domain name in the path of your link. Simply leave off the domain, but be sure to include the first slash (/) after the domain name.
It is a good idea to use absolute paths, without the domain name, on most Web sites. This format insures that the link or image will be usable no matter where you place the page. This may seem like a silly reason to use longer links, but if you share code across multiple pages and directories on your site, using absolute paths will speed up your maintenance.
Relative Path URLS
Relative paths change depending upon what page the links are located on. There are several rules to creating a link using the relative path:
links in the same directory as the page have no path information
listed filename
sub-directories are listed without any preceding slashes
weekly/filename
links up one directory are listed as ../filename
How to determine the relative path:
Determine the location of the page you are editing. This article is
located in the/library/weekly folder on my site.
Determine the location of the page or image you want to link to. The
Beginner's Resource Center is located here: /library/beginning/
Compare the locations and to decide how to point to it From this
article, I would need to step up one directory (to/library) and then
go back down to the beginning directory
Write the link using the rules listed above: ...

Relative paths change depending upon what page the links are located on. There are several rules to creating a link using the relative path:

The relative paths are always relative to the CSS location, not the web page location that references the CSS file. So the question is, what is the location of the CSS file to start with? If you make all paths relative to it, it should work for both your production and development URLs.

I need to test this out, but for dynamically inserted CSS all paths are relative to the root directory or www.host.com...where this resolves to...this is essentially saying all paths are actually absolute...this is the behavior I am seeing in FireFox.

ASP.NET: How to (programmatically) attach a <script> tag, containing a link to .js, to <head>?

This is the scenario:
I'm working on a new ASP.NET application that uses master pages for virtually all of the web pages in it, a few of them nested to 4 levels. Due to the size of the project, I was forced to organize the web pages into folders, at various depth levels. The global, Master (in uppercase) page, located at the root directory, contains some useful Javascript functions that are used all along the web app, and I wanted to place these functions together in a single .js file, in order to keep things, well, in order :D . (They're currently embedded into the Master page).
I discovered recently, however, that <script> tags placed on the <head> block can't have their paths specified as, for example, "~/Scripts/Utils.js", since ASP.NET does not seem to recognize these paths on <script> tags (yet, strangely enough, it does recognize them on <link> tags). I'm trying to avoid inserting a <script> tag on every single web page on the site specifying a relative path to the .js file (that's sort of why I wanted to use master pages in the first place).
So, in essence, given this scenario, I want to ask you, what's the recommended way of inserting, via code, <script> tags into the <head> block of a web page so that I can use, when specifying the link to the .js file, something like Something.Something(Something, Page.ResolveURL("~/Scripts/Utils.js")); in the global Master page, so it will resolve to the right path on all the web pages of the application, no matter what directory are they inside?
Or is this not the right approach, and I should be using something else entirely?

You can use the ClientScriptManager:
Page.ClientScript.RegisterClientScriptInclude("MyScript", ResolveUrl("~/Scripts/MyScript.js"));
The first argument is a unique key representing the script file, this is to stop subsequent scripts being registered with the same key. E.g. you may have some shared code that does the same thing and could be executed multiple times, so by specifying a key, you ensure the script is only registered the once.

Put javascript in one .js file or break it out into multiple .js files?

My web application uses jQuery and some jQuery plugins (e.g. validation, autocomplete). I was wondering if I should stick them into one .js file so that it could be cached more easily, or break them out into separate files and only include the ones I need for a given page.
I should also mention that my concern is not only the time it takes to download the .js files but also how much the page slows down based on the contents of the .js file loaded. For example, adding the autocomplete plugin tends to slow down the response time by 100ms or so from my basic testing even when cached. My guess is that it has to scan through the elements in the DOM which causes this delay.

I think it depends how often they change. Let's take this example:
JQuery: change once a year
3rd party plugins: change every 6 months
your custom code: change every week
If your custom code represents only 10% of the total code, you don't want the users to download the other 90% every week. You would split in at least 2 js: the JQuery + plugins, and your custom code. Now, if your custom code represents 90% of the full size, it makes more sense to put everything in one file.
When choosing how to combine JS files (and same for CSS), I balance:
relative size of the file
number of updates expected

Common but relevant answer:
It depends on the project.
If you have a fairly limited website where most of the functionality is re-used across multiple sections of the site, it makes sense to put all your script into one file.
In several large web projects I've worked on, however, it has made more sense to put the common site-wide functionality into a single file and put the more section-specific functionality into their own files. (We're talking large script files here, for the behavior of several distinct web apps, all served under the same domain.)
The benefit to splitting up the script into separate files, is that you don't have to serve users unnecessary content and bandwidth that they aren't using. (For example, if they never visit "App A" on the website, they will never need the 100K of script for the "App A" section. But they would need the common site-wide functionality.)
The benefit to keeping the script under one file is simplicity. Fewer hits on the server. Fewer downloads for the user.
As usual, though, YMMV. There's no hard-and-fast rule. Do what makes most sense for your users based on their usage, and based on your project's structure.

If people are going to visit more than one page in your site, it's probably best to put them all in one file so they can be cached. They'll take one hit up front, but that'll be it for the whole time they spend on your site.

At the end of the day it's up to you.
However, the less information that each web page contains, the quicker it will be downloaded by the end-viewer.
If you only include the js files required for each page, it seems more likely that your web site will be more efficient and streamlined

If the files are needed in every page, put them in a single file. This will reduce the number of HTTP request and will improve the response time (for lots of visits).
See Yahoo best practice for other tips

I would pretty much concur with what bigmattyh said, it does depend.
As a general rule, I try to aggregate the script files as much as possible, but if you have some scripts that are only used on a few areas of the site, especially ones that perform large DOM traversals on load, it would make sense to leave those in separate file(s).
e.g. if you only use validation on your contact page, why load it on your home page?
As an aside, you can sometimes sneak these files into interstitial pages, where not much else is going on, so when a user lands on an otherwise quite heavy page that needs it, it should already be cached - use with caution - but can be a handy trick when you have someone benchmarking you.
So, as few script files as possible, within reason.
If you are sending a 100K monolith, but only using 20K of it for 80% of the pages, consider splitting it up.

It depends pretty heavily on the way that users interact with your site.
Some questions for you to consider:
How important is it that your first page load be very fast?
Do users typically spend most of their time in distinct sections of the site with subsets of functionality?
Do you need all of the scripts ready the moment that the page is ready, or can you load some in after the page is loaded by inserting <script> elements into the page?
Having a good idea of how users use your site, and what you want to optimize for is a good idea if you're really looking to push for performance.
However, my default method is to just concatenate and minify all of my javascript into one file. jQuery and jQuery.ui are small and have very low overhead. If the plugins you're using are having a 100ms effect on page load time, then something might be wrong.
A few things to check:
Is gzipping enabled on your HTTP server?
Are you generating static files with unique names as part of your deployment?
Are you serving static files with never ending cache expirations?
Are you including your CSS at the top of your page, and your scripts at the bottom?
Is there a better (smaller, faster) jQuery plugin that does the same thing?

I've basically gotten to the point where I reduce an entire web application to 3 files.
vendor.js
app.js
app.css
Vendor is neat, because it has all the styles in it too. I.e. I convert all my vendor CSS into minified css then I convert that to javascript and I include it in the vendor.js file. That's after it's been sass transformed too.
Because my vendor stuff does not update often, once in production it's pretty rare. When it does update I just rename it to something like vendor_1.0.0.js.
Also there are minified versions of those files. In dev I load the unminified versions and in production I load the minified versions.
I use gulp to handle doing all of this. The main plugins that make this possible are....
gulp-include
gulp-css2js
gulp-concat
gulp-csso
gulp-html-to-js
gulp-mode
gulp-rename
gulp-uglify
node-sass-tilde-importer
Now this also includes my images because I use sass and I have a sass function that will compile images into data-urls in my css sheet.
function sassFunctions(options) {
options = options || {};
options.base = options.base || process.cwd();
var fs = require('fs');
var path = require('path');
var types = require('node-sass').types;
var funcs = {};
funcs['inline-image($file)'] = function (file, done) {
var file = path.resolve(options.base, file.getValue());
var ext = file.split('.').pop();
fs.readFile(file, function (err, data) {
if (err) return done(err);
data = new Buffer(data);
data = data.toString('base64');
data = 'url(data:image/' + ext + ';base64,' + data + ')';
data = types.String(data);
done(data);
});
};
return funcs;
}
So my app.css will have all of my applications images in the css and I can add the image's to any chunk of styles I want. Typically i create classes for the images that are unique and I'll just take stuff with that class if I want it to have that image. I avoid using Image tags completely.
Additionally, use html to js plugin I compile all of my html to the js file into a template object hashed by the path to the html files, i.e. 'html\templates\header.html' and then using something like knockout I can data-bind that html to an element, or multiple elements.
The end result is I can end up with an entire web application that spins up off one "index.html" that doesn't have anything in it but this:
<html>
<head>
<script src="dst\vendor.js"></script>
<script src="dst\app.css"></script>
<script src="dst\app.js"></script>
</head>
<body id="body">
<xyz-app params="//xyz.com/api/v1"></xyz-app>
<script>
ko.applyBindings(document.getTagById("body"));
</script>
</body>
</html>
This will kick off my component "xyz-app" which is the entire application, and it doesn't have any server side events. It's not running on PHP, DotNet Core MVC, MVC in general or any of that stuff. It's just basic html managed with a build system like Gulp and everything it needs data wise is all rest apis.
Authentication -> Rest Api
Products -> Rest Api
Search -> Google Compute Engine (python apis built to index content coming back from rest apis).
So I never have any html coming back from a server (just static files, which are crazy fast). And there are only 3 files to cache other than index.html itself. Webservers support default documents (index.html) so you'll just see "blah.com" in the url and any query strings or hash fragments used to maintain state (routing etc for bookmarking urls).
Crazy quick, all pending on the JS engine running it.
Search optimization is trickier. It's just a different way of thinking about things. I.e. you have google crawl your apis, not your physical website and you tell google how to get to your website on each result.
So say you have a product page for ABC Thing with a product ID of 129. Google will crawl your products api to walk through all of your products and index them. In there you're api returns a url in the result that tells google how to get to that product on a website. I.e. "http://blah#products/129".
So when users search for "ABC thing" they see the listing and clicking on it takes them to "http://blah#products/129".
I think search engines need to start getting smart like this, it's the future imo.
I love building websites like this because it get's rid of all the back end complexity. You don't need RAZOR, or PHP, or Java, or ASPX web forms, or w/e you get rid of those entire stacks.... All you need is a way to write rest apis (WebApi2, Java Spring, or w/e etc etc).
This separates web design into UI Engineering, Backend Engineering, and Design and creates a clean separation between them. You can have a UX team building the entire application and an Architecture team doing all the rest api work, no need for full stack devs this way.
Security isn't a concern either, because you can pass credentials on ajax requests and if your stuff is all on the same domain you can just make your authentication cookie on the root domain and presto (automatic, seamless SSO with all your rest apis).
Not to mention how much simpler server farm setup is. Load balance needs are a lot less. Traffic capabilities a lot higher. It's way easier to cluster rest api servers on a load balancer than entire websites.
Just setup 1 nginx reverse proxy server to serve up your index .html and also direct api requests to one of 4 rest api servers.
Api Server 1
Api Server 2
Api Server 3
Api Server 4
And your sql boxes (replicated) just get load balanced from the 4 rest api servers (all using SSD's if possible)
Sql Box 1
Sql Box 2
All of your servers can be on internal network with no public ips and just make the reverse proxy server public with all requests coming in to it.
You can load balance reverse proxy servers on round robin DNS.
This means you only need 1 SSL cert to since it's one public domain.
If you're using Google Compute Engine for search and seo, that's out in the cloud so nothing to worry about there, just $.

If you like the code in separate files for development you can always write a quick script to concatenate them into a single file before minification.
One big file is better for reducing HTTP requests as other posters have indicated.

I also think you should go the one-file route, as the others have suggested. However, to your point on plugins eating up cycles by merely being included in your large js file:
Before you execute an expensive operation, use some checks to make sure you're even on a page that needs the operations. Perhaps you can detect the presence (or absence) of a dom node before you run the autocomplete plugin, and only initialize the plugin when necessary. There's no need to waste the overhead of dom traversal on pages or sections that will never need certain functionality.
A simple conditional before an expensive code chunk will give you the benefits of both the approaches you are deciding on.

I tried breaking my JS in multiple files and ran into a problem. I had a login form, the code for which (AJAX submission, etc) I put in its own file. When the login was successful, the AJAX callback then called functions to display other page elements. Since these elements were not part of the login process I put their JS code in a separate file. The problem is that JS in one file can't call functions in a second file unless the second file is loaded first (see Stack Overflow Q. 25962958) and so, in my case, the called functions couldn't display the other page elements. There are ways around this loading sequence problem (see Stack Overflow Q. 8996852) but I found it simpler put all the code in one larger file and clearly separate and comment sections of code that would fall into the same functional group e.g. keep the login code separate and clearly commented as the login code.

We Keep Coding

JavaScript is the programming language of the Web.