nodejs: Identify file clone

nodejs: Identify file clone - javascript

how to identify if two files are same in javascript(nodejs), one is just a renamed copy of other?
Use case: I am trying to write a script for syncing a HDD (hdd1) and its clone (hdd2). 95% only video files (size: ~1 GB, count: ~4000). Sometimes I rename the files in hdd1 and move them different folders. So while syncing, instead of delete and fresh copy from hdd1 to hdd2, I just want to rename and move the files( identified ones) in hdd2 to match its location in hdd1.

Like mentioned by mscdex, there's probably already a tool out there that does what you're looking for (like rsync).
If you're more interested in doing it from scratch as a learning experience, then what you're looking for is called a checksum or hash of a file. Generating a checksum for each file gives you a sort of finger print for a file. You can then use this to compare against the checksum or other files, and if they're the same, the checksums will match as well.
Node.js's Crypto library gives you methods for generating checksums. This blog entry walks through some of this.

Related

Preprocessing an S3 asset with CDK

Is there an idiomatic way to preprocess an asset before it is uploaded to S3 with #aws-cdk/aws-s3-assets?
What I am currently doing is loading my local file (for example, a text file), modifying it, then writing it into a directory that I have in my .gitignore. I then pass the new file to new Asset(...)
The main reason I would like an alternative approach is that I have to come up with a convention for creating the temporary file names that are eventually passed into the constructor of new Asset(...), since it is possible that my construct is used in multiple places and I don't want there to be contention over the file.
I did see that there is a bundling property on the AssetOptions, but it appears to be geared towards using Docker to bundle the asset. This seems like overkill for my use case, since the preprocessing that I am talking about is basically a string replace.
Thanks in advance!

How to automatically make array of all files in website folder?

Let's say I have a folder of about 100 images on my website called "IMG"
Now let's say I have a div element: <div id="templateDiv"></div>
Using javascript, how would I add all images from "IMG/" into that div without adding <img src="IMG/IMGNAME.jpg"> for every image?
Sorry, I'm not very good at explaining.
Just ignore the fact that would take ages to load.
EDIT
Ok my bad explanation skills have made me change my question.
How do I automatically make array of all files in website directory?

Okay, your question is incredibly unclear, but from reading all your other comments and things, it seems you simply want to get an array that contains the filename of every file in a directory? If that's what you want, then it won't be possible (I don't believe) since only the server knows which files are where, and you can't request the contents of a directory from a server using JavaScript.
However if you were using Node.js on a local directory, then it could be done. But I don't believe that's your case.
That being said, you have three alternative options:
Name every image file 1.png, 2.png, 3.png, etc. Then use a for loop and get each one using (i + 1) + ".png"
If you can't rename the files, but the files are named via user input, you could collect the user's input at the time of file creation and add the name of the newly created file into another file/an array/localStorage so that it could be retrieved later.
If you can't rename the files, but the filenames are also never known to the program that needs them, then you could create an array of all the filenames (manually) and iterate over that to find all the files that you want.
Please, somebody let me know if I'm wrong and if there actually is a way to make a request to a server that tells the client all the files in a directory. That seems incredibly unlikely though.
Another potential solution just came to mind. You could write a PHP script (or Node.js or any server-side language, really) that scans a directory, creates a list of all the filenames there, and then sends that back over HTTP. Then you could simply make an XMLHttpRequest to that PHP file and have it do the work. How does that sound?

What is the most efficient way to display images from a ZIP archive on an offline desktop app?

I have an offline Electron + React desktop app which uses no server to store data. Here is the situation:
An user can create some text data and add any number of images to them. When they save the data, a ZIP with the following structure is created:
myFile.zip
├ main.json // Text data
└ images
├ 00.jpg
├ 01.jpg
├ 02.jpg
└ ...
If they want to edit the data, they can open the saved ZIP with the app.
I use ADM-ZIP to read the ZIP content. Once open, I send the JSON data to Redux since I need them for the interface display. The images, however, are not displayed on the component that read the archive and can be on multiple "pages" of the app (so different components). I display them with an Image component that takes a name props and return an img tag. My problem here is how to get their src from the archive.
So here is my question: what could be the most efficient way to get the data of the images contained in the ZIP given the above situation?
What I am currently doing:
When the user open their file, I extract the ZIP in a temp folder and get the images paths from there. The temp folder is only deleted when the user close the file and not when they close the app (I don't want to change this behaviour).
// When opening the archive
const archive = new AdmZip('myFiles.zip');
archive.extractAllTo(tempFolderPath, true);
// Image component
const imageSrc = `${tempFolderPath}/images/${this.props.name}`;
Problem: if the ZIP contains a lot of images, the user must have enough space on their disk to get them extracted properly. Since the temp folder is not necessarily deleted when the app is closed, it means that the disk space will be taken until they close the file.
What I tried:
Storing the ZIP data in the memory
I tried to save the result of the opened ZIP in a props and then to get my images data from it when I need them:
// When opening the archive
const archive = new AdmZip('myFile.zip');
this.props.saveArchive(archive); // Save in redux
// Image component
const imageData = this.props.archive.readFile(this.props.name);
const imageSrc = URL.createObjectURL(new Blob([imageData], {type: "image/jpg"}));
With this process, the images loading time is acceptable.
Problem: with a large archive, storing the archive data in the memory might be bad for the performances (I think?) and I guess there is a limit on how much I can store there.
Opening the ZIP again with ADM-ZIP every time I have to display an image
// Image component
const archive = new AdmZip('myFile.zip');
const imageData = archive.readFile(this.props.name);
const imageSrc = URL.createObjectURL(new Blob([imageData], {type: "image/jpg"}));
Problem: bad performances, extremely slow when I have multiple images on the same page.
Storing the images Buffer/Blob in IndexedDB
Problem: it's still stored on the disk and the size is much bigger than the extracted files.
Using a ZIP that doesn't compress the data
Problem: I didn't see any difference compared to a compressed ZIP.
What I considered:
Instead of a ZIP, I tried to find a file type that could act as a non-compressed archive and be read as a directory but I couldn't find anything like that. I also tried to create a custom file with vanilla Node.js but I'm afraid that I don't know the File System API enough to do what I want.
I'm out of ideas so any suggestions are welcome. Maybe I didn't see the most obvious way to do... Or maybe that what I'm currently doing is not that bad and I am overthinking the problem.

What do you mean by "most efficient" is not exactly very clear. I'll make some assumptions and respond ahead according them.
Cache Solution
Basically what you have already done. It's pretty efficient loading it all at once (extracting to a temporary folder) because whenever you need to reuse something, the most spendable task doesn't have to be done again. It's a common practice of some heavy applications to load assets/modules/etc. at their startup.
Obs¹: Since you are considering the lack of disk space as a problem, if you want to stick with this is desirable that you handle this case programmatically and give the user some alert, since having little free space on storage is critical.
TL;DR - Spendable at first, but faster later for asset reutilization.
Lazy Loading
Another very common concept which consists in basically "load as you need, only what you need". This is efficient because this approach assures your application only loads the minimum needed to run and then load things as demanded by the user.
Obs¹: It looks pretty much on what you have done at your try number 2.
TL;DR - Faster at startup, slower during runtime.
"Smart" Loading
This is not a real name, but it describes satisfactorily what i mean. Basicaly, focus on understanding the goal of your project and mix both of previous solutions according to your context, so you can achieve the best overall performance in your application, reducing trade offs of each approach.
Ex:
Lazy loading images per view/page and keep a in-memory cache with
limited size
Loading images in background while the user navigates
Now, regardless of your final decision, the follow considerations should not be ignored:
Memory will always be faster to write/read than disk
Partially unzipping (setting specific files) is possible in many packages (including ADM-ZIP) and is always faster than unzipping everything, specially if the ZIP file is huge.
Using IndexedDB or a custom file-based database like SQLite offers overall good result for a huge number of files and a "smart" approach, since querying and organizing data is easier through those.
Always keep in mind the reason of every application design choice, the best you understand why you are doing something, the better your final result will be.
Topic 4 being said, in my opinion, in this case you really overthought a little, but that's not a bad thing and it's really admirable to have this kind of concern about how can you do things on the best possible way, I do this often even if it's not necessary and it's good for self improvement.
Well, I wrote a lot :P
I would be very pleased if all of this help you somehow.
Good luck!
TL;DR - There is not a closed answer based on your question, it depends a lot on the context of your application; some of your tries are already pretty good, but putting some effort into understanding the usability context to handle image loading "smartly" would surely award you.

The issue of using ZIP
Technically streaming the content would be the most efficient way but streaming zipped content from inside a ZIP is hard and inefficient by design since the data is fragmented and needs to parse the whole file to get the central directory
A much better format for your specific use case is 7zip it allows you to stream out single file and does not require you to read out the whole file to get the information about this single files.
The cost of using 7zip
You will have to add a 7zip binary to your program, luckily it's very small ~2.5MB.
How to use the 7zip
const { spawn } = require('child_process')
// Get all images from the zip or adjust to glob for the needed ones
// x, means extract while keeping the full path
// (optional instead of x) e, means extract flat
// Any additional parameter after the archive is a filter
let images = spawn('path/to/7zipExecutable', ['x', 'path/to/7zArchive', '*.png'])
Additional consideration
Either you use the Buffer data and display them inside the HTML via Base64 or you locate them into a cache folder that gets cleared as part of the extraction process. Depending on granular you want to do this.
Keep in mind that you should use appropriate folder for that under windows this would be %localappdata%/yourappname/cache under Linux and MacOS I would place this in a ~/.yourappname/cache
Conclusion
Streaming data is the most efficient (lowest memory/data footprint)
By using a native executable you can compress/decompress much faster then via pure js
the compression/decompression happens in another process avoiding your application from beeing blocked in the execution and rendering
ASAR as Alternative
Since you mentioned you considered to use another format that is not compressed you should give the asar file format a try it is a build in format in electron that can be accessed by regular filepaths via FS no extraction etc.
For example you have a ~/.myapp/save0001.asar which contains a img folder with multiple images in it, you would access that simply by ~/.myapp/save0001/img/x.png

Webpack generate different chunks with the same contenthash

I've a Webpack 4.1 configuration that use code splitting and output chunks names using a pattern like myproj-[name]-[contenthash].chunk.js.
I'm copying all of the production bundle files, for every version, in the same directory on the server, being sure (until now) that chunks are unique and I have no clashing.
Today I found an issue releasing a new version of the application: I've a file named myproj-modulex-0bb2f31cc0ca424a07d8.chunk.js that was also generated with the old version (that's the scope of contenthash, isn't it?). I'm expecting that the content of the file is identical but it isn't.
There's only one character changed (the array index). The chunk start with...
(window.webpackJsonp_XXXX=window.webpackJsonp_XXXX||[]).push([[7],{"2d0274e27fde9220edd9"...
...while the old version was using ...push([[6],....
One of the difference of the new version from the old ones is that I added new code splitting points.
So: it seems that new split points changed chunks order, but webpack still use the same generated filename (probably because contenthash is referred to the real module content?).
The issue is critical: when the new file is copied on the server it overwrite the old file and so client using old version are not working anymore because chunk is loaded in a wrong position on the push array (I guess).
Error is:
"Error: Loading chunk 6 failed.
(missing: https://.../myproj-xxx-0bb2f31cc0ca424a07d8.chunk.js)"
There's a way to fix this issue, maybe naming pushed chunks, or specifying the order, or generated different hashes? chunkhash ?

Webpack uses ids as a chunk references and those ids are not guaranteed to remain the same for the same chunks among different builds. contenthash is used for files extracted by ExtractTextWebpackPlugin. The same source content will get the same contenthash but the generated file may differ due to id changes.
Try using myproj-[name]-[chunkhash].chunk.js instead.
Also take a look at optimization.moduleIds and optimization.chunkIds settings.

How do I manually parse NodeJS and discern required files

What would I use to find which resources are required by a NodeJS file?
For example, if I had a file called "file.js" containing this:
import x from './x';
const y = require('./y');
// Some more code
How do I parse that file and extract './x' and './y'?
Why would you do this?
I'm playing with the idea of an architectural tool. To do this, I want to know which files are being required by the targeted source code.
I know that Webpack follows this information when it creates bundles, so that it can stack the required files in an appropriate order in a single concatenated (well, minified) file.
I don't need to do the concatenation, but I want to find which files would be used.
When I find out which files are being used by which files, I plan to assist a user in organising them in an orderly manner (e.g. by pointing out circular dependencies).

For trivial cases, you could try feeding the source to some JS parser and search the AST for calls to require(); as long as require() is called with a string constant as a parameter, it shouldn't be hard to determine the dependencies. More complex situations could cause problems, though.

We Keep Coding

JavaScript is the programming language of the Web.