What is Tree Shaking and Why Would I Need It? - javascript

I've started learning about Angular 2 and have come across this term "tree shaking" and I haven't been able to find any good explanation of it from a beginners' perspective.
I have two questions here:
What is tree shaking and why would I need it?
How do I use it?

I see you have three questions here; 1. What is tree shaking?
2. What's the need of it?
3. And, how do you use it?
1. What's tree shaking?
Tree shaking refers to dead code elimination. It means that unused modules will not be included in the bundle during the build process.
When we import and export modules in JavaScript, most of the time
there is unused code floating around. Excluding that unused code (also
referred as dead code) is called tree shaking.
Utilizing the tree shaking and dead code elimination can significantly reduce the code size we have in our application. The less code we send over the wire the more performant the application will be.
2. What's the need of tree shaking?
Tree Shaking helps us to reduce the weight of the application. For example, if we just want to create a “Hello World” Application in AngularJs 2 then it will take around 2.5MB, but by tree shaking we can bring down the size to just few hundred KBs, or maybe a few MBs.
3. How to use / implement tree shaking?
Tools like webpack will detect dead code and mark it as “unused module” but it won’t remove the code. Webpack relies on minifiers to cleanup dead code, one of them is UglifyJS plugin, which will eliminate the dead code from the bundle.
// modules.js
export function drive(props) {
return props.gas
}
export function fly(props) {
return props.miles
}
// main.js
import { drive } from modules;
/// some code
eventHandler = (event) => {
event.preventDefault()
drive({ gas: event.target.value })
}
/// some code
// fly() was never importent and won't be included in our bundle
It only works with import and export. It won’t work with CommonJS
require syntax.
Same applies to npm dependencies. great example is lodash, just import pick from 'lodash/pick' and your bundle will only include one small module instead of entire lodash library.

It just means that code that is in you project but not used/referenced anywhere will be dropped. Like if you import a full library just to use 1 function in it. It reduces compile code size.

🌲 The Tree Shaking process reduces the download size of an application
🌲 Tree shaking not exporting the modules that are not needed by our application in the bundle file, it is not going to remove the unused code from the bundle.
🌲 Webpack removes the links and UglifyJs Plugin removes the code

Related

According to whether the package exists in the window or not load the package dynamically

Does any build tool support this loading strategy, such as webpack or rollup.js.
Build every dependency to a single bundle, and when loading these dependencies, firstly search it in window['package'], if exist, use it. Otherwise dynamic load dependencies bundle to use.
Such app dependency is React, ReactDOM and UiLib.
The built result is:
React -> a.js
ReactDOM -> b.js
UiLib -> c.js
my code -> d.js
if window.React exist but window.ReactDOM and window.UiLib does not exist. d.js should dynamically load b.js and c.js and use window.React.
I know I can config React to externals, but this is a microapp used in many different apps, I'm not sure which packages exist in every global.
Nope. It is not possible directly. For a bundler, it is a binary choice between bundle or not to bundle.
Why? When a bundler encounters a library via import statements like - import React from 'react', it needs to know what global object it should substitute whenever it encounters react package across the entire application dependency graph. This must happen at compile-time. Additionally, loading a library with dynamic decision at runtime means you are introducing an asynchronous behavior in your code which your components or application cannot handle readily.
There are two form factors - a library and application. As far as library is considered, this is the only way to teach bundler (either bundle it or leave it via externals).
At an application level, you can write your own code to partially achieve what you seek with help of CDN. For this, you use externals and tell Webpack, for example, that react will be available as global React object on window namespace.
Now before your library is getting consumed, you have to add a dynamic code where to check for presence of React object.
function async initialize() {
if (!window.React) {
const React = await import(/* webpackIgnore: true */ 'https://unpkg.com/react#18/umd/react.development.js')
window.React = React;
initializeMicroapp();
} else {
initializeMicroapp();
return Promise.resolve();
}
}
Your initialize function for microapp is async and returns a promise. This is usually the pattern to go ahead with shell + micro-frontends.
On a side note, you can use module federation approach which is actually meant to solve exactly similar use-case. With module federation, you can teach Webpack that if the host/shell provides a library, then Webpack should simply ignore its bundled copy of that library while serving only other necessary code. However, I advice caution as it is a very specific pattern and neither de-facto nor de-jure at this point. It is recommended when you are having sufficient scale and many independent teams working on same product space.

Why is a function I expect to be shaken from a tree still there on a create-react-app build?

Similar to Tree shaking create-react-app? but more of how to validate and fix.
So I have a created a library of react hooks. In there I added an example to help me understand how tree-shaking would work.
import { useClock, useDeepState } from '#trajano/react-hooks';
export function App(): JSX.Element {
useClock();
useDeepState("foo");
return <div>Hello world</div>
}
However, there's a function called useAsyncSetEffect that I added in my library code base, but tracing through the code for useClock and useDeepState I don't hit that function at all, but when I look at the generated files I see a reference to useAsyncSetEffect.
Not really sure what's causing it, the library isn't large so the size is just a K gzipped, but I am curious as to why it is being included.
TL;DR
In package.json "module": "dist/index.modern.js",
How I got to the answer
After trying a few things I found out it's an addition needed on the package.json of my library. Since I am building both modern and cjs using
"build": "microbundle-crl --format modern,cjs",
It generates a version of the code that will make use of the module system. Since it makes use of the module system it is able to tree shake correctly and I reduced the code after.
44.62 KB (-11.02 KB) build\static\js\2.7a693cc3.chunk.js
800 B build\static\js\runtime-main.21ea8395.js
619 B (-601 B) build\static\js\main.21098aa9.chunk.js
Incidentally it also reduced the code in the 2. chunk which houses more of the React code and it's nice that it significantly reduced it size by 11k
The note about Keeping babel from transpiling ES6 modules to CommonJS modules provided me with the necessary hint. In addition since I know my code was side effect free I also added "sideEffects": false,

How to use js modules from non-module files

I'm a beginner at using js modules.
I'm working on a fairly simple web application. It uses typescript and angular 2, which heavily relies on modules.
Most of my app ts files 'import' one or many js modules (usually mostly angular 2 modules).
As I understand, because my app ts files have a top level 'import', they are automatically considered a js module by typescript.
However, I want any of my app ts files to be accessible by any other of my app ts files, without having to 'import' each other. But because they are now modules themselves, ts requires me to do that...
Is it possible?
It seems crazy to me that for each of my app ts file, I should have to declare every other of my app ts files that are used in there (I like to have tiny files with a single class/interface). In addition, this relies on relative paths which breaks as soon as I restructure my folder structure.
Am I thinking about this the wrong way?
You must have a js file which is an entry point to your application right?.. So in that file just import all the modules which you want to access without importing and attach them to the window object. Since the window object is available globally, you can access your module from anywhere without importing the corresponding module. For example,
Consider this scenario:
You have a module in a file called module1.ts
The entry point of your application is a file called index.ts
And you have a module2 where you require something from module1
// module1.ts
function add(first: number, second: number): number {
return first + second
}
export {add}
in your index.ts
// index.ts
import {add} from '<path to module1>/module1';
window.add = add
Now in your module2
// module2.ts
window.add(1, 2)
Since the window object is available globally you can attach as many properties to it as you like.
As far as the type resolution is concerned you can declare a window module with the add function you require in a .d.ts file as follows:
declare module window {
add: (first: number, second: number) => number
}
Declaring dependencies (e.g modules) for each file is a double-edged sword.
The advantage is that there is no 'magic' - you know exactly where each function, variable, class etc. is coming from. This makes it much easier to know what libraries / frameworks are being used and where to look to troubleshoot issues. Compare it to opposite approach that Ruby on Rails uses with Ruby Gems, where nothing is declared and everything is auto-loaded. From personal experience I know it becomes an absolute pain to try to workout where some_random_method is coming from and also what methods / classes I have access to.
You're right that the disadvantage is that it can become quite verbose with multiple imports and moving relative files. Modern editors and IDEs like WebStorm and Visual Studio Code have tools to automatically update the relative paths when you move a file and also automatically add the imports when you reference code in another module.
One practical solution for multiple imports is to make your own 'group' import file. Say you have a whole bunch of utility functions that you use in all your files - you can import them all into a single file and then just reference that file everywhere else:
//File: helpers/string-helpers.ts
import {toUppercase} from "./uppercase-helper";
import {truncate} from "./truncate-helper";
export const toUppercase = toUppercase;
export const truncate = truncate;
Then in any other file:
import * as StringHelpers from "../path-to/helpers/string-helpers";
...
let shoutingMessage = StringHelpers.toUppercase(message);
The disadvantage of this is that it may break tree shaking, where tools such as webpack remove unused code.
Is it possible
Not in any easy way. The ts file is a module and uses e.g. module.exports (if commonjs) that will need to be shimmed out. And that is just the runtime story. The TypeScript story will be harder and one way would be to make a .d.ts file for the module stating the contents as global.
Like I said. Not worth doing. Modules are the way forward instead of making something hacky.
It's not crazy at all. You are definitively thinking in the wrong way.
Actually what you don't like it's a common feature in all modern programming languages and it makes the code and structure of the app a lot clearer and simple to understand.
Without imports and going to old school way looks very crazy to me :)
You can have only chaos with so many global variables.

Webpack 2 - Code splitting top-level dependencies

Final Edit
The tl;dr resolution of this is that it's impossible. Though the top answer below does have some good information.
Consider the code below, from contacts.js. This is a dynamically loaded module, loaded on demand with System.import elsewhere in the code.
If SharedUtil1 is also used in other modules which are also dynamically loaded with System.import, how would I go about having SharedUtility1 excluded from all of these modules, and only loaded on demand the first time it's needed?
A top-level System.import of SharedUtil1 won't work, since my export depends on it: exports can only be placed in the top level of a module's code, not in any sort of callback.
Is this possible with Webpack? I'm on version 2.0.7 beta.
import SharedUtil1 from '../../SharedUtilities/SharedUtility1';
class Contacts{
constructor(data){
this.data = data;
this.sharedUtil1 = new SharedUtil1();
}
}
export default Contacts;
UPDATE 1
I thought the bundle loader was what I wanted, but no, that turns your imported module into a different function that you call with a callback to get to the actual module, once it's done loading asynchronously. This means you can't transparently make module X load asynchronously without making breaking changes to your code, to say nothing of the fact that you're back to the problem originally described, that if your top-level module depends on the now-asynchronously loaded dependency, there's no way to export it, since exports must be at the top level.
Is there no way in Webpack to denote that dependency X is to be loaded on-demand, if needed, and have any imported modules which import it to transparently wait out the importation process? I would think this use case would be a sine qua non for any remotely large application, so I have to think I'm just missing something.
UPDATE 2
Per Peter's answer, I attempted to get deduplication working, since the commonChunk plugin relates to sharing code between end points, as he mentioned, and since require.ensure places the loaded code into a callback, thereby preventing you from ES6 exporting any code that depends on it.
As far as deduplication, contacts.js and tasks.js both load the same sharedUtil like so
import SharedUtil1 from '../../sharedUtilities/sharedUtility1';
I tried running webpack as
webpack --optimize-dedupe
and also by adding
plugins: [
new webpack.optimize.DedupePlugin()
]
to webpack.config. In both cases though the sharedUtil code is still placed in both the contacts and tasks bundles.
After reading your blog post I finally understand what you intended. I got a bit confused by the word "Top-level dependencies".
You have two modules (async-a and async-b) which are loaded on-demand from anywhere (here a module main) and both have a reference on a shared module (shared).
- - -> on-demand-loading (i. e. System.import)
---> sync loading (i. e. import)
main - - -> async-a ---> shared
main - - -> async-b ---> shared
By default webpack creates a chunk tree like this:
---> chunk uses other chunk (child-parent-relationship)
entry chunk [main] ---> on-demand chunk 1 [async-a, shared]
entry chunk [main] ---> on-demand chunk 2 [async-b, shared]
This is fine when shared < async-a/b or the probability that async-a and async-b are used both by the same user is low. It's the default because it's the simplest behaviors and probably what you would expect: one System.import => one chunk. In my opinion it's also the most common case.
But if shared >= async-a/b and the probability that async-a and async-b is loaded by the user is high, there is a more efficient chunking option: (a bit difficult to visualize):
entry chunk [main] ---> on-demand chunk 1 [async-a]
entry chunk [main] ---> on-demand chunk 2 [async-b]
entry chunk [main] ---> on-demand chunk 3 [shared]
When main requests async-a: chunk 1 and 3 is loaded in parallel
When main requests async-b: chunk 2 and 3 is loaded in parallel
(chunks are only loaded if not already loaded)
This is not the default behavior, but there is a plugin to archive it: The CommonChunkPlugin in async mode. It find the common/shared modules in a bunch of chunks and creates a new chunks which includes the shared modules. In async mode it does load the new chunk in parallel to the original (but now smaller) chunks.
new CommonsChunkPlugin({
async: true
})
// This does: (pseudo code)
foreach chunk in application.chunks
var shared = getSharedModules(chunks: chunk.children, options)
if shared.length > 0
var commonsChunk = new Chunk(modules: shared, parent: chunk)
foreach child in chunk.children where child.containsAny(shared)
child.removeAll(shared)
foreach dependency in chunk.getAsyncDepenendenciesTo(child)
dependeny.addChunk(commonsChunk)
Keep in mind that the CommonsChunkPlugin has a minChunks option to define when a module is threaded as shared (feel free to provide a custom function to select the modules).
Here is an example which explains the setup and output in detail: https://github.com/webpack/webpack/tree/master/examples/extra-async-chunk
And another one with more configuration: https://github.com/webpack/webpack/tree/master/examples/extra-async-chunk-advanced
If I've understood you correctly, you want to prevent the same dependency being loaded multiple times when different code chunks declare it as a dependency.
Yes this is possible; how to do it depends on both context in your application and whether it is in ES6 or ES5.
ECMA Script 5
Webpack 1 was built in ECMA Script 5 and typically uses either CommonJS or RequireJS syntax for module exporting and importing. When using this syntax, the following features can be used to prevent duplicate code:
Deduplication prevents duplicate files being included in the compiled
code by creating copies of the duplciate functions instead of
redefining them.
Named Chunks allows chunks to be declared as dependencies but not immediately evaluated; all occurrences of the same chunk will use the same instance.
CommonsChunkPlugin allows a chunk to be shared across multiple entry points (only applies to multiple page websites).
Deduplication
From the webpack documentation:
If you use some libraries with cool dependency trees, it may occur
that some files are identical. Webpack can find these files and
deduplicate them. This prevents the inclusion of duplicate code into
your bundle and instead applies a copy of the function at runtime. It
doesn’t affect semantics.
emphasis is mine, not from source
As described by the documentation, the code splitting remains unchanged; each module that needs sharedUtil1 should declare the require as normal. To prevent the same dependency being loaded multiple times, a webpack setting is enabled that causes webpack to explicitly check files for duplication before including them at runtime.
This option is enabled with
--optimize-dedupe resp. new webpack.optimize.DedupePlugin()
Named Chunks
From the webpack documentation:
The require.ensure function accepts an additional 3rd parameter. This
must be a string. If two split point pass the same string they use the
same chunk...
require.include can be useful if a module is in multiple child chunks.
A require.include in the parent would include the module and the
instances of the modules in the child chunks would disappear.
In short, the loading of the modules is delayed until later in the compiling. This allows duplicate definitions to be stripped before they are included. The documentation provides examples.
Common Chunk Plugin
From the webpack documentation:
The CommonsChunkPlugin can move modules that occur in multiple entry
chunks to a new entry chunk (the commons chunk). The runtime is moved
to the commons chunk too. This means the old entry chunks are initial
chunks now.
This is very specific to sharing chunks between multiple pages, it is not relevant in other circumstances.
ECMA Script 6
Support for advanced module import features is... a work in progress. To get a feel for where things are, see the following links:
Ongoing What's new in Webpack 2
2015/12/20 Tree-Shaking
Ongoing Webpack 2 Roadmap
Here's a good summary of ES6 modules and webpack: ES6 Modules with TypeScript and Webpack
The above information is likely to become out-of-date fast.
Suggestion
For your own sanity, I suggest:
If optimisation matters:
Revert to CommonJS / RequireJS syntax and upgrade to ECMA Script 6 when Webpack 2 is stable and released.
If ECMA Script 6 syntax matters:
Use the standard ECMA Script 6 import export format and add optimisation features as they become available.
There is simply too much flux to try and use advanced module loading features in the sill unstable webpack 2. Wait for things to settle down and for some really good plugins to become available before even attempting it.
Per the Webpack creator, this is impossible. Plain and simple. See Peter's answer for plenty of other good info regarding Webpack and ES6.
The pasted image was the result of a misunderstanding. See the same user's answer above.
System.import has been deprecated in Webpack. Webpack now favors import() which requires a polyfill for promises.
Code Splitting - Using import()

If I optimize my RequireJS project using r.js, do I have to change the path and dependency configuration?

I'm new to RequireJS. I understand it for the most part. However, the r.js optimization process confuses me. Two questions:
Doesn't concatenating all source into a single file defeat the purpose of RequireJS's lazy-loading abilities?
If I do optimize using r.js and have everything in a single file, do I then have to manually update the path info in the config to point to that single file? And do the dependencies I've defined as individual modules have to now be changed throughout the entire application to point to this single file? Here's just a pretend source to illustrate how I'm currently setup:
requirejs.config({
paths : {
mod1 : 'app/common/module1',
mod2 : 'app/common/module2',
mod3 : 'app/common/module3',
},
});
-- MOD 1
define(["mod2", "mod3"], function(mod2, mod3) {
// do something
}
Does that now have to be manually updated after optimization to look like this?
requirejs.config({
paths : {
optimizedMod : 'build-dir/optimizedModule',
},
});
-- MOD 1
define(["optimizedMod"], function(optimizedMod) {
// do something
}
Re. 1. No, it doesn't. r.js analyzes your dependency tree and (by default) only includes modules you'd need to load on application startup anyway. The dependencies that are required dynamically won't be included, they'll be lazy-loaded at runtime (unless you set findNestedDependencies to true).
However, lazy-loading is arguably not the main benefit of using RequireJS, a bigger thing is modularisation itself. Being forced to manage dependencies makes it harder to write code that's not testable or refactorable - bad architecture can be immediatelly spotted (lengthy dependencies lists, "god" modules, etc.)
Re. 2. This is precisely the reason why you shouldn't be naming your own modules or mapping them in the paths configuration element. paths should be used for third party libraries and not your own code, centralising name->path mappings reduces flexibility. Once you refer to dependencies via their paths (relative to baseUrl) r.js can rewrite them at build time:
define(["app/common/module2", "app/common/module3"], function(mod2, mod3) {
// do something
}

Categories