Split a text file into multiple text files nodejs - javascript

I am fairly new to NodeJS. I have a text large file of data like below and I need to split each chunk into multiple text files dynamically. How can I achieve this using Nodejs?
LargeFile.txt
Name: John
Age: 18
Address: Washington
Name: Doe
Age: 23
Name: Randy
Address: Tennessee
Expected outcomes should be like this
John.txt
Name: John
Age: 18
Address: Washington
Doe.txt
Name: Doe
Age: 23
Randy.txt
Name: Randy
Address: Tennessee

Is your problem to separate content by line spaces? If so, you can use the split() method and split your text by \n\n or \r\n\r\n (the characters for two newlines).
// required module
const fs = require('fs');
// read the file content
const str = fs.readFileSync('/path/to/alldata.txt');
// detect newline character
let newline = '\n';
let twonewlines = '\n\n';
if (str.indexOf('\r\n\r\n') > -1) {
newline = '\r\n';
twonewlines = '\r\n\r\n';
}
// split
let arr = str.split(twonewlines);
// save items as new files
arr.forEach((data, idx)=> {
/* format of data will be:
* Name: XX
* Age: YY
* Address: ZZ
*/
// get name
let firstRow = data.slice(0, data.indexOf(newline)); // get "Name: XX"
let name = firstRow.split(': ')[1]; // get "XX"
// write to file
fs.writeFileSync(`/path/to/${name}.txt`, data);
});
You can utilize the Promise or Callback version of the File System (fs) module for better performance.

Related

splitting an string list with proper aligning the string elements

Example Company 1,company ltd 2,company, Inc.,company Nine nine, ltd,company ew So here is example of the string, I want to split it like that it consider Company 1 as one company and company, Inc. as one, but here got situation in company, Inc. it condidering 2 companies while this logic. how can I resolve this? Lke with such strings company, Inc. I want to consider it one element only
const company = company.split(",");
Here the string can be anything, this is just example for the string, but it can be any name. So I am looking for generic logic which works for any string, having same structure of string.
Note $ ==(,) represents as separation point, kept to get clarity that from that point I need to separate the string
Object:
Example 1
{
_id: 5de4debcccea611e4d14d4d5
companies: One Bros. Inc. & Might Bros. Dist. Corp.$Pages, Inc.$Google Inc. Search$Aphabet Inc. tech.
}
Example 2
{
_id: 5de4debccc333611e4d14d4f5
companies: Google Comp. Inc.$Google Comp. Inc. Estd.$Tree, Ltd.$Tree, Ltd.
}
First I split on 'ompany' rather than 'company', because you have one instance of 'Company' with a capital C -- see the output of the first console log within a comment below.
Then I put things back together using reduce -- map is not the right choice here, as I need an array that is one fewer than the size of the fragments I generated. Then though since I need an array that corresponds to the number of strings we want to return, which is one fewer than the number of fragments, the first thing I do inside my reduce is ensure I do not look beyond the end of the array.
Then I split each fragment and pop off the last element, which just puts either "C" or "c" back together with "ompany". Then I replace any trailing ',c' from the next fragment with an empty string, and add the result to the company. Finally I add the entire result to the array I'm generating with reduce. See comment results at bottom. Also here it is on repl.it: https://repl.it/#dexygen/splitOnCompanyStringLiteral
This is a fairly concise way to do this but again if you can do anything to improve your data, you won't have to use such unnecessarily complicated code.
const companiesStr = "Company 1,company ltd 2,company, Inc.,company Nine nine, ltd,company ew";
const companySuffixFragments = companiesStr.split("ompany");
console.log(companySuffixFragments);
/*
[ 'C', ' 1,c', ' ltd 2,c', ', Inc.,c', ' Nine nine, ltd,c', ' ew' ]
*/
const companiesArr = companySuffixFragments.reduce((companies, fragment, index, origArr) => {
if (index < companySuffixFragments.length - 1) {
let company = fragment.split(',').pop() + 'ompany'
company = company + origArr[index + 1].replace(/,c$/, '');
companies.push(company);
}
return companies
}, []);
console.log(companiesArr);
/*
[ 'Company 1',
'company ltd 2',
'company, Inc.',
'company Nine nine, ltd',
'company ew' ]
*/
First change , with any other symbol. I am using & here and then split string with ,
var str= 'Company 1,company ltd 2,company, Inc.,company Nine nine, ltd,company ew';
str = str.replace(', Inc.','& Inc.');
/*str = str.replace(', ltd','& ltd');*/
console.log(str.split(',').map((e)=>{return e.replace('&',',').trim()}));
try with the below solution.
var str = ["company 1","company ltd 2","company", "Inc.","company Nine nine", "ltd","company ews"];
var str2 =str.toString()
var str3 = str2.split("company")
function myFunction(item, index,arr){if(item !=""){let var2 = item.replace(/,/g," ");var2 = "Company"+var2;arr[index]=var2;} }
str3.forEach(myFunction)
OUtput:
str3
(6) ["", "Company 1 ", "Company ltd 2 ", "Company Inc. ", "Company Nine nine ltd ", "Company ews"]
And remove the first element of the array.
As has been commented I'd try to get a more clean String so that you don't have to write "strange" code to get what you need.
If you can't do that right now this code should solve your problem:
let string = 'Company 1,company ltd 2,company, Inc.,company Nine nine, ltd,company
ew';
let array = string.split(',');
const filterFnc = (array) => {
let newArr = [],
i = 0;
for(i = 0; i < array.length; i++) {
if(array[i].toLowerCase().indexOf('company') !== -1) {
newArr.push(array[i]);
} else {
newArr.splice(newArr.length - 1, 1, `${array[i - 1]}, ${array[i]}`);
}
}
return newArr;
};
let filteredArray = filterFnc(array);

How do I add <b> (for bolding font) to part of text inside string interpolation for sns message?

I want to take a string in this format I am a level ${level} coder, where ${level} will be some value passed in. But I want only specific word in this sentence bolded. So lets say in this example I want "level" and "coder" bolded. How do I achieve this?
Current Behavior:
Even if I do <b> or <strong> inside `` the tags just get converted to string. It doesn't actually bold the text for me.
Update: This is exactly what I am doing with aws sns. But I want to achieve this with string interpolation.
let snsData = {
Message: < strong > "This is an automated message" < /strong> + '\n' +
"You have successfully uploaded the following:" + '\n'
`File name: ${snsFileName}\n
Number of lines: ${numberOfLines}\n
If there are any issues, please contact XXX for assistance.`,
Subject: 'Successfully Uploaded to XX',
TopicArn: 'XXXXX'
};
Addendum, so this is entirely unique to your instance and I suggest better familiarizing yourself with how string / interpolation and objects work but for the sake of learning, cheers;
const $ = function(id) { return document.getElementById(id) },
level = 'expert',
str = `I am a level <strong>${level} coder</strong>`,
snsFileName = 'testFileNameBlah',
numberOfLines = 99,
snsData = {
Message: '<strong>This is an automated message</strong><br/>' +
'You have successfully uploaded the following:<br/>' +
`File name: <strong>${snsFileName}</strong><br/>
Number of lines: <strong>${numberOfLines}</strong><br/>` +
'If there are any issues, please contact XXX for assistance.<br/>',
Subject: 'Successfully Uploaded to XX',
TopicArn: 'XXXXX'
};
$('blah').innerHTML = str + '<hr>';
$('fixme').innerHTML = snsData.Message + snsData.Subject;
<span id="blah"></span>
<h2>Addendum</h2>
<p id="fixme"></p>

Text Scraping in JavaScript

I have a dynamic text which looks something like this
my_text = "address ae fae daq ad, 1231 asdas landline 213121233 -123 mobile 513121233 cell (132) -142-3127
email sdasdas#gmail.com , sdasd as#yahoo.com - ewqas#gmail.com"
The text starts with an 'address'. As soon as we see 'address' we need to scrape everything from there until either 'landline'/'mobile'/'cell' appears. From there on, we want to scrape when all the phone text (without altering spaces in between). We start from the first occurrence of either 'landline'/'mobile'/'cell' and stop as soon as we find 'email' appear.
Finally we scrape the email part (without altering spaces in between)
'landline'/'mobile'/'cell' can appear in any order and sometimes some may not appear.
For example, the text could have looked like this as well.
my_text = "address ae fae daq ad, 1231 asdas
cell (132) -142-3127 landline 213121233 -123
email sdasdas#gmail.com , sdasd as#yahoo.com - ewqas#gmail.com"
There's a little more engineering that needs to be done to form arrays of subtext contained in address, phones and email text.
Subtexts of addresses are always separated with commas (,).
Subtexts of emails can be separated with commas (,) or hyphens (-).
My output should be a JSON dictionary which looks something like this:
resultant_dict = {
addresses: [{
address: "ae fae daq ad"
}, {
address: "1231 asdas"
}],
phones: [{
number: "213121233 -123",
kind: "landline"
}, {
number: "513121233",
kind: "mobile"
}, {
number: "(132 -142-3127",
kind: "cell"
}],
emails: [{
email: "sdasdas#gmail.com",
connector: ""
}, {
email: "sdasd as#yahoo.com",
connector: ","
}, {
email: "ewqas#gmail.com",
connector: "-"
}]
}
I am trying to achieve this thing using regular expressions or any other way in JavaScript. I can't figure out how to write this as I am a novice programmer.
Your requirements are a bit twisted... Plural for map keys, section names as a key for each item... Moreover, what about a dedicated array for each "kind" of phone? We can get the expected result for sure, but it's seems pretty useless at first glance. Anyway, here a starting point:
var str = 'address ae fae daq ad, 1231 asdas landline 213121233 -123 mobile 513121233 cell (132) -142-3127 email sdasdas#gmail.com , sdasd as#yahoo.com - ewqas#gmail.com';
// find sections
var s = 'address|landline|mobile|cell|email';
var reSections = new RegExp('(' + s + ').*?(?=' + s + '|$)', 'g');
var slices = str.match(reSections);
document.body.innerHTML += (
'<b>Step 1 - Find sections</b>' +
'<pre>' + JSON.stringify(slices, 0, 2) + '</pre>\n'
);
// make a map
var map = {
address: [],
phone: [],
email: []
};
var reTrim = /^\s+|\s+$/g;
var reSanitize = /\s+(-|,)\s+/g;
var reSection = /^(\w+)(.*)$/;
slices.forEach(function (section) {
var m = section.match(reSection);
var category = 'email address'.indexOf(m[1]) !== -1 ? m[1] : 'phone';
var values = m[2].replace(reSanitize, ',').split(',');
map[category] = map[category].concat(values.map(function (value) {
return { kind: m[1], value: value.replace(reTrim, '') };
}));
});
document.body.innerHTML += (
'<b>Step 2 - Make a map</b>' +
'<pre>' + JSON.stringify(map, 0, 2) + '</pre>\n'
);
A bit hackish solution but works.
Try this :
mymap={};a=str;keys=["address","cell","landline","email"];for(var k in keys){a=a.replace(keys[k],"##"+keys[k])}; console.log(a);b=a.split("##");for(var f in b){x=b[f].split(" ");mymap[x[0]]=x.slice(1).join(" ")}; console.log(mymap);
mymap will contain all the fields which you are looking for. You can parse it to create JSON in your format.

how to parse & format content of text into object

as the title says I need to extract content out of long text with certain fields.
I have this text as below
Name: David Jones
Office Address: 148 Hulala Street Date: 24/11/2013
Agent No: 1234,
Address: 259 Yolo Road Start Date: 22/11/2013 Due Date: 29/11/2013
Type: Human Properties: None Ago: 29
And I have these labels for specific fields in the text
Name, Office Address, Date, Agent No, Address, Type, Properties, Age
And the result I want to get is
Name: 'David Jones',
Office Address: '148 Hulala Street',
Date: '24/11/2013',
Agent No: '1234',
Address: '259 Yolo Road',
Type: 'Human'
Properties: 'None',
Age: ''
that has completely parsed the content with each field. Important thing to note here is the original text can possibly have typo (E.g., Ago instead of Age) and extra fields that do not exist in the list of labels (E.g., Start Date and Due Date do not exist in the label list). So the code will ignore any un-matching text and try to find only matching result.
I tried to resolve this by going through loops for each line, check if a line contains the field, and see if the line also contains more fields.
Currently I have the following code.
structure = ['Name','Office Address','Date','Agent No','Address','Type','Properties','Age'];
obj = {};
for (i = 0; i < textLines.length; i++) {
matchingFields = [];
for (j = 0; j < structure.length; j++) {
if (textLines[i].indexOf(structure[j] + ':') !== -1) {
if (matchingFields.length === 0 && textLines[i].indexOf(structure[j] + ':') === 0) {
matchingFields.push(structure[j]);
structure.splice(structure.indexOf(structure[j--]), 1);
} else if (textLines[i].indexOf(structure[j] + ':') > textLines[i].indexOf(matchingFields[matchingFields.length-1])) {
matchingFields.push(structure[j]);
structure.splice(structure.indexOf(structure[j--]), 1);
}
}
for (j = 0; j < matchingFields.length; j++) {
if (j !== matchingFields.length-1) {
obj[matchingFields[j]] = textLines[i].slice(textLines[i].indexOf(matchingFields[j]) + matchingFields[j].length, textLines[i].indexOf(matchingFields[j+1]));
} else {
obj[matchingFields[j]] = textLines[i].slice(textLines[i].indexOf(matchingFields[j]) + matchingFields[j].length);
}
obj[matchingFields[j]] = obj[matchingFields[j]].replace(':', '');
if (obj[matchingFields[j]].indexOf(' ') === 0) {
obj[matchingFields[j]] = obj[matchingFields[j]].replace(' ', '');
}
if (obj[matchingFields[j]].charAt(obj[matchingFields[j]].length-1) === ' ') {
obj[matchingFields[j]] = obj[matchingFields[j]].slice(0, obj[matchingFields[j]].length-1);
}
}
}
In some cases it could work fine but with 'Office Address: ' and 'Address: ' existing value for 'Office Address:' goes into 'Address:'. Besides, the code looks messy and ugly. Also seems like kind of brute forcing.
I guess there should be a better way. For example using regular expression or something similar. but no external library.
If you have any idea I will appreciate it for sharing.
Assuming the properties are separated by newline characters, you create an object mapping each attribute to its value using:
var str = "Name: David Jones\nOffice Address: 148 Hulala Street\nDate: 24/11/2013\nAgent No: 1234,\nAddress: 259 Yolo Road\\nType: Human Properties: None Age: 29";
var output = {};
str.split(/\n/).forEach(function(item){
var match = (item.match(/([A-Za-z\s]*):\s([A-Za-z0-9\s\/]*)/));
output[match[1]] = match[2];
});
console.log(output)
This may help:
> a.substr(a.indexOf("Name"), a.indexOf("Office Address")).split(":")
["Name", " David Jones "]

Splitting Name into last name, first name, middle initial

I have a user input from a textbox that contains a user's name
input can look like this:
var input = "Doe, John M";
However, input can be a whole lot more complex.
like:
var input = "Doe Sr, John M"
or "Doe, John M"
or "Doe, John, M"
or even "Doe Sr, John,M"
What I'd like to do is separate the last name (with the sr or jr) the first name, and then the middle initial.
So, these strings become :
var input = "Doe#John#M" or "Doe Sr#John#M" or "Doe#John#M"
I've tried this regular expression,
input = input.replace(/\s*,\s*/g, '#');
but this doesn't take into account the last middle initial.
I'm sure this can probably be done via RegEx but splitting the string into arrays is often faster and a little less complex (IMO). Try this function:
var parseName = function(s) {
var last = s.split(',')[0];
var first = s.split(',')[1].split(' ')[1];
var mi = s[s.length-1];
return {
first: first,
mi: mi,
last: last
};
};
You call it just passing in the name e.g. parseName('Doe, John M') and it returns an object with first, mi, last. I created a jsbin you can try that tests the formats of names you show in your question.
Checkout humanparser on npm.
https://www.npmjs.org/package/humanparser
Parse a human name string into salutation, first name, middle name, last name, suffix.
Install
npm install humanparser
Usage
var human = require('humanparser');
var fullName = 'Mr. William R. Jenkins, III'
, attrs = human.parseName(fullName);
console.log(attrs);
//produces the following output
{ saluation: 'Mr.',
firstName: 'William',
suffix: 'III',
lastName: 'Jenkins',
middleName: 'R.',
fullName: 'Mr. William R. Jenkins, III' }
The following seems to match your requirements
input = input.replace(/\s*,\s*|\s+(?=\S+$)/g, '#');

Categories