API design for file CSV import, best practice approach? - javascript

I need to design a REST API for importing an employee CSV file with 30 columns. Number of records in the file may vary based on the size of business,could be 10, could be 5000.
Here's my approach to design
POST /Employees - will add one employee record (will have 30
attributes)
POST /Employees?bulk - will accept JSon with multiple
employee records. In this case the user may add one record as by
passing json object.
Post /Employees?file - The API will accept a CSV file (under certain size) and the parsing and processing will be done one on the server.
In case of the first two options, the user is expected to read CSV and convert to JSON before sending.
Questions
Is this a best practice design?
Should I provide javascript library for reading CSV and converting to acceptable json format? When does one provide a JavaScript library?
Are any examples of such APIs that I can use to model the design?

Because I am not familiar with javascript, so my answer will focus on question 1, about how to design an api for importing large amounts of data.
There are generally two ways, synchronous and asynchronous.
Synchronous Way
In order to avoid a long wait for importing data into the database, we should limit the number of data rows per request. If the data imported by user exceeds the limitation, the frontend needs to split the data into multiple requests. For a better user experience, we can display the current import progress.
Asynchronous Way
Compared to synchronous way, it's a little more complicated to implement asynchronous.
1.We can upload csv or json file to Amazon S3, then send file address to the server via api.
2.The asynchronous worker start importing data into the database after downing file from s3. In order to avoid database blocking, we also have to import in batches.
3.In order to get import progress, the frontend polls via api or server notify the frontend after the import is complete.
Both ways have pros and cons, which way you choose depends on the trad-off between the amount of data and the complexity of the implementation.

Related

How to handle aws dynamo db 400 KB record limit without changing my codebase

In aws dynamo db we cannot store more than 400KB data in a single record [Reference].
Based on suggestions online I can either compress the data before storing or upload part of it to aws s3 bucket which I am fine by
But my application (javascript/express server plus many js lambdas/microservices) is too large and adding the above logic which require a heavy re-write and extensive testing. Currently there is an immediate requirement from a big client that demands >400KB storage in db, so is there any alternative way to solve the problem that doesn't make me change my existing code to fetch the record from db.
I was thinking more in these lines:
My backend makes a dynamo db call to fetch the record as its doing now (we use a mix of vogels and aws-sdk to make db calls) -> The call is intercepted by a lambda (or something else) which handles the necessary compression/decompression/s3 with dynamodb and returns the data to the backend.
Is the above approach possible to do and if yes then how can i go about implementing it? Or if you have a better way, please do tell.
PS. Going forward I will definitely re-write my codebase to take care of this, what I am asking for is an immediate stopgap solution.
Split the data into multiple items. You’ll have to change a little client code but hopefully you have a data access layer so it’s just a small change in one place. If you don’t have a DAL, from now on always have a DAL. :)
For the payload of a big item, use the regular item as the manifest which can point at the segmented items. Then batch get items those segmented items.
This assumes compression alone isn’t always sufficient. If it is, do that.

Best practice for database data rearrangement/transformation?

I have a MySQL database and retrieve data using php on a website where I would like to visualize the data in different ways. For this I also need to transform the data (e.g. creating sums, filtering etc.).
My question is now at which step of the data flow this kind of transformation makes most sense, especially regarding performance and flexibility. I am not a very experienced programmer, but I think these are the options I have:
1.) Prepare views in the database which are already providing the desired transformed data.
2.) Use a PHP script that SELECT's the data in the transformed way.
3.) Just have a SELECT * FROM table statement in PHP and load everything in a json, read it in js and transform the data to a desired version.
What is best practice to transform the data?
It all depends in the architectural design of your application.
At this time SOA (Service Oriented Architecture) is a very used approach. If you use It, logic use to be in the services. Database is used as a data repository, and UI manage final data in light weight format, only really needed information.
So in this case, e.g. your option number 2 is most appropriated.

How to automatically search and filters all engineers from linkedin and store results in excel?

Does anyone know how i can parse LinkedIn accounts? Or Any tool( not paid ).
For example:
I will look for "Software Engineer" from Dallas,TX.
Tool will automatically pick all candidates from linkedin or for example first 100 candidates, and store their First Name, Last Name , LinkedinLink and Experience in excel document? ( Or from specific company)
Is it should be done threw API, or there specific account which allow to do this? Or does anyone knows tools which will help to do this? Or Script?
I need to parse a large amount of candidates , 100+ maybe 1000+ and store them.
I have multiple thoughts about implementation but i feel that it 100% already implemented.
https://developer.linkedin.com/docs/rest-api
Use linked in APIs to fetch data and process it however you would like. I don't know how much of 'private' fields you can get access to but names seem to be there.
I use nodeJS to process excel data - xlsx is a very good option but it only allows synchronous execution so you would have to spawn another process. It also has filter function so you can do whatever you want with it.
The problem that I had faced with parsing large data into excel is that excel file is a compressed xml format so it takes a long time to parse both reading and writing. A faster option would be to create and read csv which excel can naturally do as well.

Parse CSV into SQL Database

I've been tasked with parsing a CSV file and storing its data into an SQL database using Node.js. I'm a complete beginner with Node.js but have done similar tasks before in Rails. The CSV file given isn't like the previous ones I've used however and is in a different format.
Format of csv: http://imgur.com/a/gsQkl
I'm looking for any pointers on how to handle this task. Thanks
This question has 2 aspects,
How to do the task
How to do it with Node.JS
Regarding the first aspect - if you know how to do it with Rails it means that you should already know that the CSV example that you've provided is not just a table - it includes hierarchy which can be treated in multiple ways - either add category indicator and date fields to every row in order to flatten the table - or, create separate tables and connect them with external keys. Anyway - this has nothing to do with Node.JS, and you'll most probably have to "massage" your data before you enter it into SQL the database.
Regarding the second questions - in Node.JS you'll find modules to handle almost every task that you can imagine (some things can be done natively with the core modules, Google would be a good start in most cases)
in your case you'd need modules to handle CSV parsing, and SQL server connection
For CSV parsing you can use: https://github.com/wdavidw/node-csv
For SQL - you didn't mention which server are you using (SQL is a language used by many different database servers), assuming that you use one of the populars - this are the relevant modules:
MySQL - https://github.com/mysqljs/mysql
Microsoft SQL Server - https://github.com/patriksimek/node-mssql
PostgreSQL - https://github.com/brianc/node-postgres
Each one has its own interface - read the docs for further information

Reduce requested file size or reduce number of browser calculations?

I have some data that I want to display on a web page. There's quite a lot of data so I really need to figure out the most optimized way of loading and parsing it. In CSV format, the file size is 244K, and in JSON it's 819K. As I see it, I have three different options:
Load the web page and fetch the data in CSV format as an Ajax request. Then transform the data into a JS object in the browser (I'm using a built-in method of the D3.js library to accomplish this).
Load the web page and fetch the data in JSON format as an Ajax request. Data is ready to go as is.
Hard code the data in the main JS file as a JS object. No need for any async requests.
Method number one has the advantage of reduced file size, but the disadvantage of having to loop through all (2700) rows of data in the browser. Method number two gives us the data in the end-format so there's no need for heavy client-side operations. However, the size of the JSON file is huge. Method number three has the advantage of skipping additional requests to the server, with the disadvantage of a longer initial page load time.
What method is the best one in terms of optimization?
In my experience, data processing times in Javascript are usually dwarfed by transfer times and the time it takes to render the display. Based on this, I would recommend going with option 1.
However, what's best in your particular case really does depend on your particular case -- you'll have to try. It sounds like you have all the code/data you need to do that anyway, so why not run a simple experiment to see which one works best for you.

Categories