Read a bunch of JSON files, transform them, and save them - javascript

I'm trying to achieve this with Gulp.
Read every .json file in a given directory including subdirectories.
Transform them in some way, for example add a new root level, etc.
Save them into a new directory keeping original structure.
The point where I'm lost is how to pipe reading/writing JSON to src.
I have the following skeleton now.
gulp.task("migratefiles", function () {
return gulp.src("files/**/*.json")
.pipe(/* WHAT HERE? */)
.pipe(gulp.dest("processed"));
});

There's a number of way you can do this:
(1) Use the gulp-json-transform plugin:
var jsonTransform = require('gulp-json-transform');
gulp.task("migratefiles", function () {
return gulp.src("files/**/*.json")
.pipe(jsonTransform(function(json, file) {
var transformedJson = {
"newRootLevel": json
};
return transformedJson;
}))
.pipe(gulp.dest("processed"));
});
Pros:
Easy to use
Supports asynchronous processing (if you return a Promise)
Gives access to path of each file
Cons:
Only rudimentary output formatting
(2) Use the gulp-json-editor plugin:
var jeditor = require('gulp-json-editor');
gulp.task("migratefiles", function () {
return gulp.src("files/**/*.json")
.pipe(jeditor(function(json) {
var transformedJson = {
"newRootLevel": json
};
return transformedJson;
}))
.pipe(gulp.dest("processed"));
});
Pros:
Easy to use
Automatically recognizes the indentation your input files use (two spaces, four spaces, tabs etc.) and formats your output files accordingly
Supports various js-beautify options
Cons:
Doesn't seem to support asynchronous processing
Doesn't seem to have a way to access path of each file
(3) Do it manually (by directly accessing the vinyl file object using map-stream):
var map = require('map-stream');
gulp.task("migratefiles", function () {
return gulp.src("files/**/*.json")
.pipe(map(function(file, done) {
var json = JSON.parse(file.contents.toString());
var transformedJson = {
"newRootLevel": json
};
file.contents = new Buffer(JSON.stringify(transformedJson));
done(null, file);
}))
.pipe(gulp.dest("processed"));
});
Pros:
Full control/access over everything
Supports asynchronous processing (through a done callback)
Cons:
Harder to use

Related

cordova-plugin-file clean overwrite of file

I'm looking for a clean and safe way to fully overwrite existing files when using the cordova-plugin-file Cordova plugin. What I've found, is that even when using the exclusive: false option, if the new file contents is shorter than the existing file, the remainder of the existing file persists at the end of the new file.
Example. I have an existing file with the contents of 0123456789, and want to replace it with abcd. When using exclusive: false, the file I end up with afterwards has the contents of abcd456789
Obviously this causes complications when reading back, especially when I expect these files to be correct json.
I've been unable to find other answers that don't just simply say use exclusive: false.
So far, I can work around this by manually deleting the file first, then writing to it, but this leaves me a point where I'm at risk of losing the entire file data if the app closes at the wrong moment.
Another option may be to write to a temp file, then remove the existing one, then copy the temp, then remove the temp. And when reading, check for the file I want, if it's not there, check for a temp file for it, then copy and clean up if exists. This feels like a very long-winded work around for something that should be an option.
Am I missing something here?
This is my existing work around, though it does not handle the potential of app closing yet. Is there a nicer way before I have to go down that rabbit hole?
private replaceFileAtPath<T>(path: string, data: T): void {
FileService.map.Settings.getFile(path, { create: true }, fileEntry => {
fileEntry.remove(() => {})
FileService.map.Settings.getFile(path, { create: true }, fe =>
this.writeFile(fe, data)
)
})
}
private writeFile<T>(file: FileEntry, data: T, cb?: () => void): void {
file.createWriter(writer => {
const blob = new Blob([JSON.stringify(data)], { type: 'application/json' })
writer.write(blob)
})
}
i guess i found the way to fix this.
You can use HTML5 FileWriter truncate() method.
file.createWriter(writer => {
const blob = new Blob([JSON.stringify(data)], { type: 'application/json' })
const truncated = false;
writer.onwriteend = function() {
if (!truncated) {
truncated = true;
this.truncate(this.position);
return;
}
};
writer.write(blob)
})

JSON.stringify is very slow for large objects

I have a very big object in javascript (about 10MB).
And when I stringify it, it takes a long time, so I send it to backend and parse it to an object( actually nested objects with arrays), and that takes long time too but it's not our problem in this question.
The problem:
How can I make JSON.stringify faster, any ideas or alternatives, I need a javaScript solution, libraries I can use or ideas here.
What I've tried
I googled a lot and looks there is no better performance than JSON.stringify or my googling skills got rusty!
Result
I accept any suggestion that may solve me the long saving (sending to backend) in the request (I know its big request).
Code Sample of problem (details about problem)
Request URL:http://localhost:8081/systemName/controllerA/update.html;jsessionid=FB3848B6C0F4AD9873EA12DBE61E6008
Request Method:POST
Status Code:200 OK
Am sending a POST to backend and then in JAVA
request.getParameter("BigPostParameter")
and I read it to convert to object using
public boolean fromJSON(String string) {
if (string != null && !string.isEmpty()) {
ObjectMapper json = new ObjectMapper();
DateFormat dateFormat = new SimpleDateFormat(YYYY_MM_DD_T_HH_MM_SS_SSS_Z);
dateFormat.setTimeZone(TimeZone.getDefault());
json.setDateFormat(dateFormat);
json.configure(DeserializationFeature.ACCEPT_SINGLE_VALUE_AS_ARRAY, true);
WebObject object;
// Logger.getLogger("JSON Tracker").log(Level.SEVERE, "Start");
try {
object = json.readValue(string, this.getClass());
} catch (IOException ex) {
Logger.getLogger(JSON_ERROR).log(Level.SEVERE, "JSON Error: {0}", ex.getMessage());
return false;
}
// Logger.getLogger("JSON Tracker").log(Level.SEVERE, "END");
return this.setThis(object);
}
return false;
}
Like This
BigObject someObj = new BigObject();
someObj.fromJSON(request.getParameter("BigPostParameter"))
P.S : FYI this line object = json.readValue(string, this.getClass());
is also very very very slow.
Again to summarize
Problem in posting time (stringify) JavaScript bottle nick.
Another problem parsing that stringified into an object (using jackson), and mainly I have svg tags content in that stringified object as a style column, and other columns are strings, int mainly
As commenters said - there is no way to make parsing faster.
If the concern is that the app is blocked while it's stringifying/parsing then try to split data into separate objects, stringily them and assemble back into one object before saving on the server.
If loading time of the app is not a problem you could try to ad-hoc incremental change on top of the existing app.
... App loading
Load map data
Make full copy of the data
... End loading
... App working without changes
... When saving changes
diff copy with changed data to get JSON diff
send changes (much smaller then full data)
... On server
apply JSON diff changes on the server to the full data stored on server
save changed data
I used json-diff https://github.com/andreyvit/json-diff to calc changes, and there are few analogs.
Parsing is a slow process. If what you want is to POST a 10MB object, turn it into a file, a blob, or a buffer. Send that file/blob/buffer using formdata instead of application/json and application/x-www-form-urlencoded.
Reference
An example using express/multer
Solution
Well just as most big "repeatable" problems go, you could use async!
But wait, isn't JS still single-threaded even when it does async... yes... but you can use Service-Workers to get true async and serialize an object way faster by parallelizing the process.
General Approach
mainPage.js
//= Functions / Classes =============================================================|
// To tell JSON stringify that this is already processed, don't touch
class SerializedChunk {
constructor(data){this.data = data}
toJSON() {return this.data}
}
// Attach all events and props we need on workers to handle this use case
const mapCommonBindings = w => {
w.addEventListener('message', e => w._res(e.data), false)
w.addEventListener('error', e => w._rej(e.data), false)
w.solve = obj => {
w._state && await w._state.catch(_=>_) // Wait for any older tasks to complete if there is another queued
w._state = new Promise((_res, _rej) => {
// Give this object promise bindings that can be handled by the event bindings
// (just make sure not to fire 2 errors or 2 messages at the same time)
Object.assign(w, {_res, _rej})
})
w.postMessage(obj)
return await w._state // Return the final output, when we get the `message` event
}
}
//= Initialization ===================================================================|
// Let's make our 10 workers
const workers = Array(10).fill(0).map(_ => new Worker('worker.js'))
workers.forEach(mapCommonBindings)
// A helper function that schedules workers in a round-robin
workers.schedule = async task => {
workers._c = ((workers._c || -1) + 1) % workers.length
const worker = workers[workers._c]
return await worker.solve(task)
}
// A helper used below that takes an object key, value pair and uses a worker to solve it
const _asyncHandleValuePair = async ([key, value]) => [key, new SerializedChunk(
await workers.schedule(value)
)]
//= Final Function ===================================================================|
// The new function (You could improve the runtime by changing how this function schedules tasks)
// Note! This is async now, obviously
const jsonStringifyThreaded = async o => {
const f_pairs = await Promise.all(Object.entries(o).map(_asyncHandleValuePair))
// Take all final processed pairs, create a new object, JSON stringify top level
final = f_pairs.reduce((o, ([key, chunk]) => (
o[key] = chunk, // Add current key / chunk to object
o // Return the object to next reduce
), {}) // Seed empty object that will contain all the data
return JSON.stringify(final)
}
/* lot of other code, till the function that actually uses this code */
async function submitter() {
// other stuff
const payload = await jsonStringifyThreaded(input.value)
await server.send(payload)
console.log('Done!')
}
worker.js
self.addEventListener('message', function(e) {
const obj = e.data
self.postMessage(JSON.stringify(obj))
}, false)
Notes:
This works the following way:
Creates a list of 10 workers, and adds a few methods and props to them
We care about async .solve(Object): String which solves our tasks using promises while masking away callback hell
Use a new method: async jsonStringifyThreaded(Object): String which does the JSON.stringify asynchronously
We break the object into entries and solve each one parallelly (this can be optimized to be recursive to a certain depth, use best judgement :))
Processed chunks are cast into SerializedChunk which the JSON.stringify will use as is, and not try to process (since it has .toJSON())
Internally if the number of keys exceeds the workers, we round-robin back to the first worker and overschedule them (remember, they can handle queued tasks)
Optimizations
You may want to consider a few more things to improve performance:
Use of Transferable Objects which will decrease the overhead of passing objects to service workers significantly
Redesign jsonStringifyThreaded() to schedule more objects at deeper levels.
You can explore libraries like fast-json-stringify which use a template schema and use it while converting the json object, to boost the performance. Check the below article.
https://developpaper.com/how-to-improve-the-performance-of-json-stringify/

Convert HTML tags to WordML with JavaScript

Do you know any way to convert HTML tags to WordML only using JavaScript. I need to get the content of a DOM element and convert what is inside to WordML.
Looking on npm there doesn't seem to be a library for this already.
So I think you're going to have to make your own. That being said, WordML is just a particular flavor of XML, right? This is the WordML you are referring to?
So to get the content of of a DOM element is pretty easy, you can do that with jQuery.
var ele = $('#wordml-element');
From there you will now want to convert it into WordML compatible XML. You could try using the xml library on npm for this.
So you will be transforming tree structured DOM elements into tree structured XML elements. The recommended pattern for doing this is known as the Visitor Pattern.
From there you will be left with an XML structure which you can further manipulate using the same pattern. At the end you will convert the XML structure into a string and that is what you will save to a file.
Now since I don't really know the structure of the HTML you are trying to convert into WordML I can only give you a very general code solution to the problem, which may look something like this:
var xml = require('xml')
function onTransformButtonClick() {
var options = {} // see documentation
var ele = $('#wordml-element')[0]
var wordml = transformElement(ele)
var text = xml(wordml, options);
fileSave(text);
}
function transformElement(ele) {
switch(ele.tagName) { // You could use attributes or whatever
case 'word-document':
return transformDocument(ele);
case 'word-body':
return transformBody(ele);
case 'word-p':
return transformParagraph(ele);
case 'word-r':
return transformRun(ele);
case 'word-text':
return transformText(ele);
}
}
function transformDocument(ele) {
var wordDocument = xml.element({...})
ele.childNodes.forEach(function (child) {
wordDocument.push(transformElement(child))
})
return [wordDocument]
}
function transformBody(ele) {
// create new element via xml library...
}
function transformParagraph(ele) {
// create new element via xml library...
}
function transformRun(ele) {
// create new element via xml library...
}
function transformText(ele) {
// create new element via xml library...
}
The specific implementations of which I will leave up to you since I don't know enough details to give you a more detailed answer.

Adding keys to an object inside readFile

I'm learning my way through node and gulp and trying to do something that there may already be a solution for but I'm doing it as a learning exercise. The idea is that I want to scan all the files in a directory, read the files and look for the gulp.task line, read in the task name and the comment above it. This information will be used to generate an object then sent to a file in order to make something readable by gulp-list.
I'm stuck trying to add items into the object during the reading of the file. Nothing I have tried so far enables me to add a key and value to the object.
Any help you can give would be great. Also if you know of another (potentially easier way) I would be really interested to hear. I've had a look at gulp-task-list but this does not seem to support the multiple file approach I want to use.
var gulp = require('gulp')
fs = require('fs');
var path = './gulp/tasks';
var taskList = {};
// DESC: Rebuilds the list
gulp.task('build:list', function() {
fs.readdir(path, function(err, files) {
if (err) throw err;
files.forEach(function (file) {
fs.readFile(path+'/'+file, function(err, data) {
if (err) throw err;
lines = data.toString().split("\n");
lines.forEach(function (item, index, array) {
if (match = item.match(/^gulp\.task\(\'(.*)\'/)) {
console.log(match[1]);
taskList[match[1]] = true;
}
})
});
})
})
console.log(taskList);
});
So I found a solution, I figured out that it's probably not possible to alter a variable out of scope while in an async function. I'm not entirely sure why but I am sure I will learn that over time unless anyone wants to point me in the right direction.
My solution, in full including writing out the JSON file
var gulp = require('gulp')
fs = require('fs')
gutil = require('gulp-util');
var path = './gulp/tasks';
var taskList = {};
// Rebuilds the task list
gulp.task('build:list', function() {
files = fs.readdirSync(path);
files.forEach(function (file) {
var contents = fs.readFileSync(path+'/'+file);
var lines = contents.toString().split("\n");
lines.forEach(function (item, index, array) {
if (match = item.match(/^gulp\.task\(\'(.*?)\'/)) {
taskList[match[1]] = array[index - 1];
}
})
});
fs.writeFileSync('./tasks.json', JSON.stringify(taskList));
gutil.log("Task list built");
});
The second solution I thought of might be a lot easier, I thought maybe I can read each file and concat all of the files into a single file which then might be able to replace the gulpfile.js file which then might allow me to use another tool to get all the task names and descriptions. Or possibly concat the files together and then instead of process multiple file I can process one. If I come up with one of these solutions I will update this.

Javascript OOP events

I want to create an object that can parse a certain filetype. I've looked at some of the files in the File API and I want my object to work about the same. So basically, what I want is this:
A function, called CustomFileParser. I want to be able to use it as the following:
var customFileParser = new CustomFileParser();
customFileParser.parsed = paresed;
customFileParser.progress = progress;
customFileParser.parse(file);
function parsed(event){
//The file is loaded, you can do stuff with it here.
}
function progess(event){
//The file load has progressed, you can do stuff with it here.
}
So I was thinking on how to define this object, but I'm not sure how to define these events and how I should do this.
function customFileParser(){
this.parse = function(){
//Do stuff here and trigger event when it's done...
}
}
However, I'm not sure how to define these events, and how I can do this. Anyone can give me a hand?
Javscript is prototype-based OOP language, not class-based like most other popular languages. Therefore, the OOP constructs are a bit different from what you might be used to. You should ignore most websites that try to implement class-based inheritance in JS, since that's not how the language is meant to be used.
The reason people are doing it because they are used to the class-based system and are usually not even aware that are alternatives to that, so instead of trying to learn the correct way, they try to implement the way that they are more familiar with, which usually results in loads and loads of hacks or external libraries that are essentially unnecessary.
Just use the prototype.
function CustomFileParser(onParsed, onProgress) {
// constructor
this.onParsed = onParsed;
this.onProgress = onProgress;
};
CustomFileParser.prototype.parse = function(file) {
// parse the file here
var event = { foo: 'bar' };
this.onProgress(event);
// finish parsing
this.onParsed(event);
};
And you can use it like so
function parsed(event) {
alert(event);
}
function progress(event) {
alert(event);
}
var customFileParser = new CustomFileParser(parsed, progress);
var file = ''; // pseudo-file
customFileParser.parse(file);
From what it sounds to me i think you need your program to look like this
function customFileParser( onparse , progress){
this.onparse = onparse;
this.progressStatus = 0;
this.progress = progress;
this.parser = function (chunk)
}
this.parse = function(){
// Do stuff of parsing
// Determine how much data is it
// Now make a function that parses a bit of data in every run
// Keep on calling the function till the data is getting parsed
// THat function should also increase the percentage it think this can be done via setTimeout.
// After every run of the semi parser function call the progress via something like
this.parser();
if(progressStatus <100){
this.progress(this.progressStatus);
}else{
this.parsed();
}
}
}
and u can create instance of that object like
var dark = new customFileParser( function () { // this tells what to
do what parsed is complete } , function (status) { // this tells what
to do with the progress status } ) ;
using the method i suggested. you can actually define different methods for all the instances of the object you have !

Categories