Filter text file between two dates

Filter text file between two dates - javascript

So I have a text file test.txt with lines similar to:
08/12/2021
test1
test2
test3
... (some entries)
12/12/2021
test21
test22
test23
... (some entries)
24/12/2021
What should I to write next in order to filter the text file to get the lines between the two newest dates??
const fs = require('fs');
fs.watchFile('test.txt', (eventType, filename) => {
fs.readFile('test.txt', 'utf-8', (err, data) => {
const arr = data.toString().replace(/\r\n/g,'\n').split('\n');
...
The output will be something such as:
test21
test22
test23
... (some entries)
Which are the entries between the two newest dates.
Update:
The text file is actually constantly writing in entries and will input the current date at the end of the day. Which now I am trying to extract the entries between the previous and newest date for further process

I don't know about javascript but it seems you are looking like this one
Is there any way to find data between two dates which are presend in same string and store it to json object

You can do one thing, find the indexOf start date and end date then you can slice the contents. You can do something like this:
try {
const data = fs.readFileSync('test.txt', {encoding:'utf8', flag:'r'}),
start = data.indexOf('start date'),
end = data.lastIndexOf('end data');
const trimText = data.slice(start, end);
} catch(e) {
console.log(e)
}
This method will work well for small file if the file is large we need to read it asynchronously and check for the start and end date while reading it.

const test_data = `08/12/2021
test0
test2
test3
12/12/2021`;
console.log(
test_data.split(/\d+\/\d+\/\d+\n?/g).slice(1, -1)
);

Related

Parse the data read from csv file with Nodejs ExcelJS package

With NodeJs I need to fill the Excel file with the data fetched from the csv file.
I am using the ExcelJS npm package.
I sucessfully read the data from csv fiel and write it in console.log() but the problem is, it is very strange format.
Code:
var Excel = require("exceljs");
exports.generateExcel = async () => {
let workbookNew = new Excel.Workbook();
let data = await workbookNew.csv.readFile("./utilities/file.csv");
const worksheet = workbookNew.worksheets[0];
worksheet.eachRow(function (row: any, rowNumber: number) {
console.log(JSON.stringify(row.values));
});
};
Data looks like this:
[null,"Users;1;"]
[null,"name1;2;"]
[null,"name2;3;"]
[null,"name3;4;"]
[null,"Classes;5;"]
[null,"class1;6;"]
[null,"class2;7;"]
[null,"class3;8;"]
[null,"Teachers;9;"]
[null,"teacher1;10;"]
[null,"teacher2;11;"]
[null,"Grades;12;"]
[null,"grade1;13;"]
[null,"grade2;14;"]
[null,"grade3;15;"]
So the Excel file which I need to fill with this data is very complex.. In specific cells I need to insert the users, in other sheet I need some images with grades, etc...
The Main question for me is:
How can I parse and store the data which is displayed in my console.log() in separate variables like Users in separate variable, Grades in separate variable and Teachers in separate variable.
Example for users:
users = {
title: "Users",
names: ["name1", "name2", "name3"],
};
There is no need to be exactly the same as example, but the something that can be reused when I will read different csv files with same structure and so I could easily access to the specific data and put it in specific cell in the Excel file.
Thank you very much.

I prepared example, how could you pare your file. As it was proposed in one answer above we use fast-csv. The parsing is quite simple you split by separator and than took line[0] which is first element.
const fs = require('fs');
const csv = require('#fast-csv/parse');
fs.createReadStream('Test_12345.csv')
.pipe(csv.parse())
.on('error', error => console.error(error))
.on('data', function (row) {
var line = String(row)
line = line.split(';')
console.log(`${line[0]}`)
})
.on('end', rowCount => console.log(`Parsed ${rowCount} rows`));
If we put for input like this:
Value1;1101;
Value2;2202;
Value3;3303;
Value4;4404;
your output is in this case like this:
Value1
Value2
Value3
Value4

How to perform string comparison in textfile Line by Line with JAVASCRIPT

I have requirement to save data to a textfile.
2)Then consequently perform a comparison for each line in the text file.
If there is line of word that exist on the file report
So far am able to read through the content and display on the console
But I dont know how to go about making the comparison.
Example: Given I ran my code the first time and saved on the textfile
For example =My script ran the first time its Saved data{a,b,c} in fileLoc
My intentions When the script runs the next time if the same data "{a,b,c} existed in fileLoc.
I want to report this line(s).
I want to be able to capture these matches.
Note fileLoc will never change, my script just saves data every time in the same file=> "rawDeal.txt" with a time stamp, but my ask is to look for ways to perform some sort of string caparison of each lines.
***** Here is the code I have to read what was already in the file
Please any direction is greatly appreciated.
NB==>>I am using fs.appendFile to add to the file every time I save a new data.
So ideally newly added data will be at the bottom ,but I want to check if any of the data already exist at any line above.
//***** create fs package
fs = require('fs');
readline = require('readline');
//const fs = require('fs');
//****** fileLoc to save info
fileLoc ="C:/rawDeal.txt"
//********* ('\r\n') in input.txt as a single line break.
require('fs').readFileSync(fileLoc , 'utf-8').split(/\r?\n/).forEach(function(line){
console.log(line);

You just want to compare 2 files, line by line. Check edit in the bottom
// file1.txt
apple
mangoes
orange
//file2.txt
apple
mangoes
orange
//noMatchFile.txt
apples
bananas
oranges
// index.js
const fs = require("fs")
const { assert } = require("console") // for testing
const file1Location = "./file1.txt"
const file2Location = "./file2.txt"
const file3Location = "./noMatchFile.txt"
function CheckFiles(file1Location, file2Location) {
const file1Lines = fs.readFileSync(file1Location, 'utf-8').split(/\r?\n/)
const file2Lines = fs.readFileSync(file2Location, 'utf-8').split(/\r?\n/)
// after these two readFile() functions, we have all the data we need
for(let lineIndex in file1Lines) {
if(file1Lines[lineIndex] !== file2Lines[lineIndex]) return false
}
return true
}
// Let's test ourselves
function main() {
assert(CheckFiles(file1Location, file2Location) === true, 'expected file1 and file2 to match')
assert(CheckFiles(file1Location, file3Location) === false, 'expected file1 and file3 not to match')
}
main()
Copy the solution to your machine and run it.
The assert()s would complain if the condition doesn't suffice
Edit:
After clarifying what you try to do in the comments, the solution is a bit different.
You have a file that records data, and each change is recorded as a new line.
Then you should split the file by lines, and compare line-by-line.
If there's any change, the function returns false:
function CheckLines(file) {
const fileLines = fs.readFileSync(file, 'utf-8').split(/\r?\n/)
for(let index = 0; index < fileLines.length - 1; index++){
if(fileLines[index] !== fileLines[index+1]) return false
}
return true
}

How can I convert this text to JSON by nodejs?

How can I convert this text to JSON by nodejs?
Input :
---
title:Hello World
tags:java,C#,python
---
## Hello World
```C#
Console.WriteLine(""Hello World"");
```
Expected output :
{
title:"Hello World",
tags:["java","C#","python"],
content:"## Hello World\n```C#\nConsole.WriteLine(\"Hello World\"");\n```"
}
What I've tried to think :
use regex to get key:value array, like below:
---
{key}:{value}
---
then check if key equals tags then use string.split function by , to get tags values array else return value.
other part is content value.
but I have no idea how to implement it by nodejs.

If the input is in a known format then you should use a battle tested library to convert the input into json especially if the input is extremeley dynamic in nature, otherwise depending on how much dynamic is the input you might be able to build a parser easily.
Assuming the input is of a static structure as you posted then the following should do the work
function convertToJson(str) {
const arr = str.split('---').filter(str => str !== '')
const tagsAndTitle = arr[0]
const tagsAndTitleArr = tagsAndTitle.split('\n').filter(str => str !== '')
const titleWithTitleLabel = tagsAndTitleArr[0]
const tagsWithTagsLabel = tagsAndTitleArr[1]
const tagsWithoutTagsLabel = tagsWithTagsLabel.slice(tagsWithTagsLabel.indexOf(':') + 1)
const titleWithoutTitleLabel = titleWithTitleLabel.slice(titleWithTitleLabel.indexOf(':') + 1)
const tags = tagsWithoutTagsLabel.split(',')
const result = {
title: titleWithoutTitleLabel,
tags,
content: arr[1].slice(0, arr[1].length - 1).slice(1) // get rid of the first new line, and last new line
}
return JSON.stringify(result)
}
const x = `---
title:Hello World
tags:java,C#,python
---
## Hello World
\`\`\`C#
Console.WriteLine(""Hello World"");
\`\`\`
`
console.log(convertToJson(x))

Looks like you're trying to convert markdown to JSON. Take a look at markdown-to-json.
You can also use a markdown parser (like markdown-it) to get tokens out of the text which you'd have to parse further.

In this specific case, if your data is precisely structured like that, you can try this:
const fs = require("fs");
fs.readFile("input.txt", "utf8", function (err, data) {
if (err) {
return console.log(err);
}
const obj = {
title: "",
tags: [],
content: "",
};
const content = [];
data.split("\n").map((line) => {
if (!line.startsWith("---")) {
if (line.startsWith("title:")) {
obj.title = line.substring(6);
} else if (line.startsWith("tags")) {
obj.tags = line.substring(4).split(",");
} else {
content.push(line);
}
}
});
obj.content = content.join("\n");
fs.writeFileSync("output.json", JSON.stringify(obj));
});
Then you just wrap the whole fs.readFile in a loop to process multiple inputs.
Note that you need each input to be in a separate file and structured EXACTLY the way you mentioned in your question for this to work. For more general usage, probably try some existing npm packages like others suggest so you do not reinvent the wheel.

Sorting a data stream before writing to file in nodejs

I have an input file which may potentially contain upto 1M records and each record would look like this
field 1 field 2 field3 \n
I want to read this input file and sort it based on field3 before writing it to another file.
here is what I have so far
var fs = require('fs'),
readline = require('readline'),
stream = require('stream');
var start = Date.now();
var outstream = new stream;
outstream.readable = true;
outstream.writable = true;
var rl = readline.createInterface({
input: fs.createReadStream('cross.txt'),
output: outstream,
terminal: false
});
rl.on('line', function(line) {
//var tmp = line.split("\t").reverse().join('\t') + '\n';
//fs.appendFileSync("op_rev.txt", tmp );
// this logic to reverse and then sort is too slow
});
rl.on('close', function() {
var closetime = Date.now();
console.log('Read entirefile. ', (closetime - start)/1000, ' secs');
});
I am basically stuck at this point, all I have is the ability to read from one file and write to another, is there a way to efficiently sort this data before writing it

DB and sort-stream are fine solutions, but DB might be an overkill and I think sort-stream eventually just sorts the entire file in an in-memory array (on through end callback), so I think performance will be roughly the same, comparing to the original solution.
(but I haven't ran any benchmarks, so I might be wrong).
So, just for the hack of it, I'll throw in another solution :)
EDIT:
I was curious to see how big a difference this will be, so I ran some benchmarks.
Results were surprising even to me, turns out sort -k3,3 solution is better by far, x10 times faster then the original solution (a simple array sort), while nedb and sort-stream solutions are at least x18 times slower than the original solution (i.e. at least x180 times slower than sort -k3,3).
(See benchmark results below)
If on a *nix machine (Unix, Linux, Mac, ...) you can simply use
sort -k 3,3 yourInputFile > op_rev.txt and let the OS do the sorting for you.
You'll probably get better performance, since sorting is done natively.
Or, if you want to process the sorted output in Node:
var util = require('util'),
spawn = require('child_process').spawn,
sort = spawn('sort', ['-k3,3', './test.tsv']);
sort.stdout.on('data', function (data) {
// process data
data.toString()
.split('\n')
.map(line => line.split("\t"))
.forEach(record => console.info(`Record: ${record}`));
});
sort.on('exit', function (code) {
if (code) {
// handle error
}
console.log('Done');
});
// optional
sort.stderr.on('data', function (data) {
// handle error...
console.log('stderr: ' + data);
});
Hope this helps :)
EDIT: Adding some benchmark details.
I was curious to see how big a difference this will be, so I ran some benchmarks.
Here are the results (running on a MacBook Pro):
sort1 uses a straightforward approach, sorting the records in an in-memory array.
Avg time: 35.6s (baseline)
sort2 uses sort-stream, as suggested by Joe Krill.
Avg time: 11.1m (about x18.7 times slower)
(I wonder why. I didn't dig in.)
sort3 uses nedb, as suggested by Tamas Hegedus.
Time: about 16m (about x27 times slower)
sort4 only sorts by executing sort -k 3,3 input.txt > out4.txt in a terminal
Avg time: 1.2s (about x30 times faster)
sort5 uses sort -k3,3, and process the response sent to stdout
Avg time: 3.65s (about x9.7 times faster)

You can take advantage of streams for something like this. There's a few NPM modules that will be helpful -- first include them by running
npm install sort-stream csv-parse stream-transform
from the command line.
Then:
var fs = require('fs');
var sort = require('sort-stream');
var parse = require('csv-parse');
var transform = require('stream-transform');
// Create a readble stream from the input file.
fs.createReadStream('./cross.txt')
// Use `csv-parse` to parse the input using a tab character (\t) as the
// delimiter. This produces a record for each row which is an array of
// field values.
.pipe(parse({
delimiter: '\t'
}))
// Use `sort-stream` to sort the parsed records on the third field.
.pipe(sort(function (a, b) {
return a[2].localeCompare(b[2]);
}))
// Use `stream-transform` to transform each record (an array of fields) into
// a single tab-delimited string to be output to our destination text file.
.pipe(transform(function(row) {
return row.join('\t') + '\r';
}))
// And finally, output those strings to our destination file.
.pipe(fs.createWriteStream('./cross_sorted.txt'));

You have two options, depending on how much data is being processed. (1M record count with 3 columns doesn't say much about the amount of actual data)
Load the data in memory, sort in place
var lines = [];
rl.on('line', function(line) {
lines.push(line.split("\t").reverse());
});
rl.on('close', function() {
lines.sort(function(a, b) { return compare(a[0], b[0]); });
// write however you want
fs.writeFileSync(
fileName,
lines.map(function(x) { return x.join("\t"); }).join("\n")
);
function compare(a, b) {
if (a < b) return -1;
if (a > b) return 1;
return 0;
}
});
Load the data in a persistent database, read ordered
Using a database engine of your choice (for example nedb, a pure javascript db for nodejs)
EDIT: It seems that NeDB keeps the whole database in memory, the file is only a persistent copy of the data. We'll have to search for another implementation. TingoDB looks promising.
// This code is only to give an idea, not tested in any way
var Datastore = require('nedb');
var db = new Datastore({
filename: 'path/to/temp/datafile',
autoload: true
});
rl.on('line', function(line) {
var tmp = line.split("\t").reverse();
db.insert({
field0: tmp[0],
field1: tmp[1],
field2: tmp[2]
});
});
rl.on('close', function() {
var cursor = db.find({})
.sort({ field0: 1 }); // sort by field0, ascending
var PAGE_SIZE = 1000;
paginate(0);
function paginate(i) {
cursor.skip(i).take(PAGE_SIZE).exec(function(err, docs) {
// handle errors
var tmp = docs.map(function(o) {
return o.field0 + "\t" + o.field1 + "\t" + o.field2 + "\n";
});
fs.appendFileSync("op_rev.txt", tmp.join(""));
if (docs.length >= PAGE_SIZE) {
paginate(i + PAGE_SIZE);
} else {
// cleanup temp database
}
});
}
});

i had quite similar issue, needed to perform an external sort.
I figured out, after waste a few time on it that i could load up the data on a database and then query out the desired data from it.
It not even matter if the inserts aren't ordered, as long as my query result could be.
Hope it can work for you too.
In order to insert your data on a database, there are plenty of tools on node to perform such task. I have this pet project which does a similar job.
I'm also sure that if you search the subject, you'll find much more info.
Good luck.

Write objects into file with Node.js

I've searched all over stackoverflow / google for this, but can't seem to figure it out.
I'm scraping social media links of a given URL page, and the function returns an object with a list of URLs.
When I try to write this data into a different file, it outputs to the file as [object Object] instead of the expected:
[ 'https://twitter.com/#!/101Cookbooks',
'http://www.facebook.com/101cookbooks']
as it does when I console.log() the results.
This is my sad attempt to read and write a file in Node, trying to read each line(the url) and input through a function call request(line, gotHTML):
fs.readFileSync('./urls.txt').toString().split('\n').forEach(function (line){
console.log(line);
var obj = request(line, gotHTML);
console.log(obj);
fs.writeFileSync('./data.json', obj , 'utf-8');
});
for reference -- the gotHTML function:
function gotHTML(err, resp, html){
var social_ids = [];
if(err){
return console.log(err);
} else if (resp.statusCode === 200){
var parsedHTML = $.load(html);
parsedHTML('a').map(function(i, link){
var href = $(link).attr('href');
for(var i=0; i<socialurls.length; i++){
if(socialurls[i].test(href) && social_ids.indexOf(href) < 0 ) {
social_ids.push(href);
};
};
})
};
return social_ids;
};

Building on what deb2fast said I would also pass in a couple of extra parameters to JSON.stringify() to get it to pretty format:
fs.writeFileSync('./data.json', JSON.stringify(obj, null, 2) , 'utf-8');
The second param is an optional replacer function which you don't need in this case so null works.
The third param is the number of spaces to use for indentation. 2 and 4 seem to be popular choices.

obj is an array in your example.
fs.writeFileSync(filename, data, [options]) requires either String or Buffer in the data parameter. see docs.
Try to write the array in a string format:
// writes 'https://twitter.com/#!/101Cookbooks', 'http://www.facebook.com/101cookbooks'
fs.writeFileSync('./data.json', obj.join(',') , 'utf-8');
Or:
// writes ['https://twitter.com/#!/101Cookbooks', 'http://www.facebook.com/101cookbooks']
var util = require('util');
fs.writeFileSync('./data.json', util.inspect(obj) , 'utf-8');
edit: The reason you see the array in your example is because node's implementation of console.log doesn't just call toString, it calls util.format see console.js source

If you're geting [object object] then use JSON.stringify
fs.writeFile('./data.json', JSON.stringify(obj) , 'utf-8');
It worked for me.

In my experience JSON.stringify is slightly faster than util.inspect.
I had to save the result object of a DB2 query as a json file, The query returned an object of 92k rows, the conversion took very long to complete with util.inspect, so I did the following test by writing the same 1000 record object to a file with both methods.
JSON.Stringify
fs.writeFile('./data.json', JSON.stringify(obj, null, 2));
Time: 3:57 (3 min 57 sec)
Result's format:
[
{
"PROB": "00001",
"BO": "AXZ",
"CNTRY": "649"
},
...
]
util.inspect
var util = require('util');
fs.writeFile('./data.json', util.inspect(obj, false, 2, false));
Time: 4:12 (4 min 12 sec)
Result's format:
[ { PROB: '00001',
BO: 'AXZ',
CNTRY: '649' },
...
]

Could you try doing JSON.stringify(obj);
Like this:
var stringify = JSON.stringify(obj);
fs.writeFileSync('./data.json', stringify, 'utf-8');

Just incase anyone else stumbles across this, I use the fs-extra library in node and write javascript objects to a file like this:
const fse = require('fs-extra');
fse.outputJsonSync('path/to/output/file.json', objectToWriteToFile);

Further to #Jim Schubert's and #deb2fast's answers:
To be able to write out large objects of order which are than ~100 MB, you'll need to use for...of as shown below and match to your requirements.
const fsPromises = require('fs').promises;
const sampleData = {firstName:"John", lastName:"Doe", age:50, eyeColor:"blue"};
const writeToFile = async () => {
for (const dataObject of Object.keys(sampleData)) {
console.log(sampleData[dataObject]);
await fsPromises.appendFile( "out.json" , dataObject +": "+ JSON.stringify(sampleData[dataObject]));
}
}
writeToFile();
Refer https://stackoverflow.com/a/67699911/3152654 for full reference for node.js limits

We Keep Coding

JavaScript is the programming language of the Web.

Filter text file between two dates - javascript

I don't know about javascript but it seems you are looking like this one Is there any way to find data between two dates which are presend in same string and store it to json object

const test_data = `08/12/2021 test0 test2 test3 12/12/2021`; console.log( test_data.split(/\d+\/\d+\/\d+\n?/g).slice(1, -1) );

Related

Parse the data read from csv file with Nodejs ExcelJS package

How to perform string comparison in textfile Line by Line with JAVASCRIPT

How can I convert this text to JSON by nodejs?

Sorting a data stream before writing to file in nodejs

Write objects into file with Node.js

Categories

Resources