Parse CSV into SQL Database - javascript

I've been tasked with parsing a CSV file and storing its data into an SQL database using Node.js. I'm a complete beginner with Node.js but have done similar tasks before in Rails. The CSV file given isn't like the previous ones I've used however and is in a different format.
Format of csv: http://imgur.com/a/gsQkl
I'm looking for any pointers on how to handle this task. Thanks

This question has 2 aspects,
How to do the task
How to do it with Node.JS
Regarding the first aspect - if you know how to do it with Rails it means that you should already know that the CSV example that you've provided is not just a table - it includes hierarchy which can be treated in multiple ways - either add category indicator and date fields to every row in order to flatten the table - or, create separate tables and connect them with external keys. Anyway - this has nothing to do with Node.JS, and you'll most probably have to "massage" your data before you enter it into SQL the database.
Regarding the second questions - in Node.JS you'll find modules to handle almost every task that you can imagine (some things can be done natively with the core modules, Google would be a good start in most cases)
in your case you'd need modules to handle CSV parsing, and SQL server connection
For CSV parsing you can use: https://github.com/wdavidw/node-csv
For SQL - you didn't mention which server are you using (SQL is a language used by many different database servers), assuming that you use one of the populars - this are the relevant modules:
MySQL - https://github.com/mysqljs/mysql
Microsoft SQL Server - https://github.com/patriksimek/node-mssql
PostgreSQL - https://github.com/brianc/node-postgres
Each one has its own interface - read the docs for further information

Related

How to do full-text search in a MySQL table using Fuse.js and Redis?

I have a table with a thousand records in it and I want to do a google like search full-text/fuzzy search.
I read about MySQL v8's Full-Text search and let's say we don't have that functionality yet.
There is this JavaScript library called Fuse.js that do fuzzy-search which is what I need.
I can combine it by creating a API that returns the table data in JSON format and then pass it to Fuse.js to do a fuzzy-search.
Now, I think it's not recommended to load all data from table every time someone wants to search.
I read about Redis, and the first thing that came in my mind is to save all table data in Redis using JSON.stringify and just call it every time instead of querying the database. Then whenever a data is added in the table, I will also update the contents of the data in Redis.
Is there a better way to do this?
That is a very common caching pattern.
If you need a more efficient way to store and retrieve your JSON to/from Redis you might want to consider one of the available Redis Modules.
e.g.
RedisJSON allows you to efficiently store, retrieve, project (jsonpath) and update in place.
RediSearch allows you to have full text search over Redis Hash and efficiently retrieve data according to the user's query.
Last
RedisJSON2 (aka RedisDoc) combines both modules above, meaning efficient JSON store and retrieve with Full Text support

How to automatically search and filters all engineers from linkedin and store results in excel?

Does anyone know how i can parse LinkedIn accounts? Or Any tool( not paid ).
For example:
I will look for "Software Engineer" from Dallas,TX.
Tool will automatically pick all candidates from linkedin or for example first 100 candidates, and store their First Name, Last Name , LinkedinLink and Experience in excel document? ( Or from specific company)
Is it should be done threw API, or there specific account which allow to do this? Or does anyone knows tools which will help to do this? Or Script?
I need to parse a large amount of candidates , 100+ maybe 1000+ and store them.
I have multiple thoughts about implementation but i feel that it 100% already implemented.
https://developer.linkedin.com/docs/rest-api
Use linked in APIs to fetch data and process it however you would like. I don't know how much of 'private' fields you can get access to but names seem to be there.
I use nodeJS to process excel data - xlsx is a very good option but it only allows synchronous execution so you would have to spawn another process. It also has filter function so you can do whatever you want with it.
The problem that I had faced with parsing large data into excel is that excel file is a compressed xml format so it takes a long time to parse both reading and writing. A faster option would be to create and read csv which excel can naturally do as well.

API design for file CSV import, best practice approach?

I need to design a REST API for importing an employee CSV file with 30 columns. Number of records in the file may vary based on the size of business,could be 10, could be 5000.
Here's my approach to design
POST /Employees - will add one employee record (will have 30
attributes)
POST /Employees?bulk - will accept JSon with multiple
employee records. In this case the user may add one record as by
passing json object.
Post /Employees?file - The API will accept a CSV file (under certain size) and the parsing and processing will be done one on the server.
In case of the first two options, the user is expected to read CSV and convert to JSON before sending.
Questions
Is this a best practice design?
Should I provide javascript library for reading CSV and converting to acceptable json format? When does one provide a JavaScript library?
Are any examples of such APIs that I can use to model the design?
Because I am not familiar with javascript, so my answer will focus on question 1, about how to design an api for importing large amounts of data.
There are generally two ways, synchronous and asynchronous.
Synchronous Way
In order to avoid a long wait for importing data into the database, we should limit the number of data rows per request. If the data imported by user exceeds the limitation, the frontend needs to split the data into multiple requests. For a better user experience, we can display the current import progress.
Asynchronous Way
Compared to synchronous way, it's a little more complicated to implement asynchronous.
1.We can upload csv or json file to Amazon S3, then send file address to the server via api.
2.The asynchronous worker start importing data into the database after downing file from s3. In order to avoid database blocking, we also have to import in batches.
3.In order to get import progress, the frontend polls via api or server notify the frontend after the import is complete.
Both ways have pros and cons, which way you choose depends on the trad-off between the amount of data and the complexity of the implementation.

postgresql stored procedures vs server-side javascript functions

In my application I receive json data in a post request that I store as raw json data in a table. I use postgresql (9.5) and node.js .
In this example, the data is an array of about 10 quiz questions experienced by a user, that looks like this:
[{"QuestionId":1, "score":1, "answerList":["1"], "startTime":"2015-12-14T11:26:54.505Z", "clickNb":1, "endTime":"2015-12-14T11:26:57.226Z"},
{"QuestionId":2, "score":1, "answerList":["3", "2"], "startTime":"2015-12-14T11:27:54.505Z", "clickNb":1, "endTime":"2015-12-14T11:27:57.226Z"}]
I need to store (temporarily or permanently) several indicators computed by aggregating data from this json at quizz level, as I need these indicators to perform other procedures in my database.
As of now I was computing the indicators using javascript functions at the time of handling the post request and inserting the values in my table alongside the raw json data. I'm wondering if it wouldn't be more performant to have the calculation performed by a stored trigger function in my postgresql db (knowing that the sql function would need to retrieve the data from inside the json raw data).
I have read other posts on this topic, but it was asked many years ago and not with node.js, so I thought people might have some new insight on the pros and cons of using sql stored procedures vs server-side javascript functions.
edit: I should probably have mentioned that most of my application's logic already mostly lies in postgresql stored procedures and views.
Generally, I would not use that approach due to the risk of getting the triggers out of sync with the code. In general, the single responsibility principle should be the guide: DB to store data and code to manipulate it. Unless you have a really pressing business need to break this pattern, I'd advise against it.
Do you have a migration that will recreate the triggers if you wipe the DB and start from scratch? Will you or a coworker not realise they are there at a later point when reading the app code and wonder what is going on? If there is a standardised way to manage the triggers where the configuration will be stored as code with the rest of your app, then maybe not a problem. If not, be wary. A small performance gain may well not be worth the potential for lost developer time and shipping bugs.
Currently working somewhere that has gone all-in on SQL functions.. We have over a thousand.. I'd strongly advise against it.
Having logic split between Javascript and SQL is a real pain when debugging issues especially if, like me, you are much more familiar with JS.
The functions are at least all tracked in source control and get updated/created in the DB as part of the deployment process but this means you have 2 places to look at when trying to follow the code.
I fully agree with the other answer, single responsibility principle, DB for storage, server/app for logic.

Object inside Array as External File for Express

I searched around and found pieces of what I want to do and how to do it but I have a feeling combining them all is a process I shouldnt do and that I should write it a different way.
Currently I had a small application that would use a MSSQL library in Node to Query a SQL Server with a sql command and get the results and store it as an object. Which then I use express and some javascript to decipher it or modify it before calling it with an AJAX call and responding it as a proper JSON object.
SQLDB -> NodeJs -> API -> localhost
My problem now is I want to repurpose this and expand it. Currently storing the SQLDB responses as objects inside an array is becoming a huge memory problem. Considering some of these requests can be hundred thousands of rows with hundreds of columns, the node process begins eatingup outrageous amounts of RAM.
I then thought, maybe I could easily just take that object when it comes through in the result and write it to file. Then when ajax calls come to expressjs, it can read from the file and res.json from there.
Will this work if say 50-200 some people request data at the same time? Or should I look for another method?

Categories