I have an excel model that is used for running scenario simulation. There is a push to move the analysis to python/javascript for efficiency reasons and eventually move to the web. Below is a snaphot of how the excel model is setup.
The columns represents days which determine working and non working days in a calendar. The rows have variables which quantify how the day is progressing. In other words, the day based var (row) is populated on input. The individual columns then calculate formulas based on the day variable. I have oversimplified to illustrate however there are at least 100 rows with different variables and the 365 days to simulate. Finally, there is an optimization row which is a variable that will be altered to find the best solution.
Now, i need to move this data structure to either Javascript or Python. I understand I need to use 2D arrays to accomplish this task. Any packages and/ or methods that i can utilize to execute this model will be helpful.
Related
I want to prepare a line stock highchart like this example : https://www.highcharts.com/demo/stock/lazy-loading
In the given example, when you load the chart for the first time, it calls https://demo-live-data.highcharts.com/aapl-historical.json and fetches some points, to be precise 0-165 records (if you check the network tab and ajax call). At a same time All option is selected in the time range tool.
If you drill down further or go for any specific time range, it will bring more data always from the server.
Question: If you have millions of data points, consider from 2000 to 2022 years, then for All option, what are you going to display. what should be the initial data set or result or filter ?
NOTE: I will have millions of data points from 2000 to 2022 years going forward. When I load the chart for the first time, out of these millions points, what should come back from the server ?
Just for your reference, you can check example of time series data that I'm going to have in mock-data=>i.js folder/file which is NOT being used anywhere in below example as of now.
Highcharts 1.7 million points example : https://stackblitz.com/edit/js-wng4y6?file=index.js
P.S. : I'm new to stockhighcharts and I don't seem to find any proper explanation anywhere. Trying to reach out to the community for further help.
Server-side data grouping should be done based on the range for with you are trying to group the data, so All means nothing - however, in your case this will be 2 years.
For the data grouping you might also consider the chart size (this is done by default for a dataGrouping feature running client-side in Highcharts Stock). When relevant information is passed to the server it should return a set of grouped data points.
About grouping logic you can find more in the api options where is present method of approximation inside a group.
https://api.highcharts.com/highstock/series.area.dataGrouping.approximation
Sending this much data to highcharts to be processed is asking for issues. I highly recommend making a local highcharts server (something they support) and have this done within your system. See it here
This is very important when it comes to security as well (if your data is sensitive), having it race across the internets to highcharts and then sent back to you leaves it open to the world.
From here, you can also specify the start and end time of each render, and have that change based on user input. Personally I would generally display the last 5 days or something, and then if someone wanted to, they could pull the slider all the way back for the last significant amount of time.
But, to answer your question, when you send a data object to highcharts, either local server or the highcharts server, you will get a base64 image back that you can directly imbed in your UI.
I have multiple datasets here that i took from Kaggle. There are multiple csv files and each csv file is made specifically for sit, stand, walking, running etc. The data is taken from sensors like accelerometers and gyroscopes. The values in datasets are of axis like x, y and z.
Sample Data
Here is a sample dataset of jogging. Now i need to make classifiers in my program so that my program can detect itself whether the data is of jogging, sitting, standing etc. I want to mix all the datasets in a single csv file and then upload it into my webapge and then i want the javascript code to start detecting whether a particular row is of sitting, standing, jogging etc. I don't want any code help but instead i just need a little explanation or a way to start coding it. How can i started making such classifier? I know it is kind of broad question but i think i have tried to explain myself in best way possible. Once my program has detected every row with specific activity it will count all the activities separately and then show it in a table format in webpage.
In order to answer properly to your question, it would be very helpful to know which is your level of understanding and experience with machine learning.
If you are a beginner I would suggest to try to run and understand a couple of tutorials that can be easily found on the web.
If you need an idea of which is the "standard" approach for machine learning development, I will try to give you a general idea of the process.
You can summarize the process in these main steps:
Data pre-processing-> Data splitting -> Feature selection -> Model Training -> Validation -> Deployment
Data pre-processing is meant to clean and format the data: removing NA values, decision about categorical variables, outlier analysis,.... This is a complex step that depends on the application. In your case I would start checking that the data in the different data-sets are homogeneous, i.e. the features have the same meaning across csv and corresponding features respect the same distribution. While the meaning of each feature should be explained in the description of your csv, the check of the distributions could be easily done plotting the box-plots for each feature and csv. If distributions of the same feature across different csv files don't overlap you should investigate further the issue.
An important step in the design of a good model is the splitting of the data. You should split your data in training/validation set (training/validation/test for a more comprehensive approach). This step allows you to train your model on the training set and test the model on the validation set computing unbiased performance of your model. I suggest here to become familiar with concepts as: Cross Validation, stratified-cross-validation, nested-cross-validation for hyper-parameter tuning, overfitting, bias,.... The Validation of the model will give you an idea of the expected performance that it will have on unseen data. If you are considering the use of more than one model, you can use the validation results to choose the "best" one. I suggest here a comparison using the confidence interval or if possible a significance test (e.g t-test, anova,...). Before the deployment the model is trained on all the available data.
The choice of the model depends on the data that you are using: number of samples, number of features, type of variable (numerical, categorical),....
I'm not an expert of javascript, but I believe (just a feeling) that python and R are more common choices for developing Machine learning applications. Both have libraries specifically developed for the task and you can find a lot of materials and tutorial around.
With a bit of more context I think that I could be more specific.
I hope it helps
I need to move my local project to a webserver and it is time to start saving things locally (users progress and history).
The main idea is that the webapp every 50ms or so will calculate 8 values that are related to the user which is using the webapp.
My questions are:
Should i use MySQL to store the data? At the moment im using a plain text file with a predefined format like:
Option1,Option2,Option3
Iteration 1
value1,value2,value3,value4,value5
Iteration 2
value1,value2,value3,value4,value5
Iteration 3
value1,value2,value3,value4,value5
...
If so, should i use 5 (or more in the future) columns (one for each value) and their ID as Iteration? Keep in mind i will have 5000+ Iterations per session (roughly 4mins)
Each users can have 10-20 sessions a day.
Will the DB become too big to be efficient?
Due to the sample speed a call to the DB every 50 ms seems a problem to me (especially since i have to animate the webpage heavily). I was wondering if it would be better to implement a Save button which populate all the DB with all the 5000+ values in one go. If so what could it be the best way?
Would it be better to save the *.txt directly in a folder in the webserver? Something like DB/usernameXYZ/dateZXY/filename_XZA.txt . To me yes, way less effort. If so which is the function that allows me to do so (possible JS/HTML).
The rules are simple, and are discussed in many Q&A here.
With rare exceptions...
Do not have multiple tables with the same schema. (Eg, one table per User)
Do not splay an array across columns. Use another table.
Do not put an array into a single column as a commalist. Exception: If you never use SQL to look at the individual items in the list, then it is ok for it to be an opaque text field.
Be wary of EAV schema.
Do batch INSERTs or use LOAD DATA. (10x speedup over one-row-per-INSERT)
Properly indexed, a billion-row table performs just fine. (Problem: It may not be possible to provide an adequate index.)
Images (a la your .txt files) could be stored in the filesystem or in a TEXT column in the database -- there is no universal answer of which to do. (That is, need more details to answer your question.)
"calculate 8 values that are related to the user" -- to vague. Some possibilities:
Dynamically computing a 'rank' is costly and time-consuming.
Summary data is best pre-computed
Huge numbers (eg, search hits) are best approximated
Calculating age from birth date - trivial
Data sitting in the table for that user is, of course, trivial to get
Counting number of 'friends' - it depends
etc.
My application has 3 main models: companies, posts, and postdata. I'm providing an in depth analytics dashboard and am having trouble figuring out what is the best method to structure the models in mongodb for the best performance.
The postdata contains the fields: date, number of posts (for that date), average post length (for that date), and company id
The post contains the fields: date, post text, post length
In the dashboard view I want to display two graphs and two pieces of data.
graphs: one of the number of posts by date, and the other the average post length by date.
data: total number of posts for a date range, average post length for a date range
Currently in the views, I loop through the postdata collection to create a total posts number for that date range, and an average post length for that date range. I know I probably shouldn't be doing that much work in the views, but how else can I get the data I'm looking for? Should I get rid of the postdata collection and just use underscore and countBy to create the data for charts? What will give me the best performance / is the preferred method.
I would take a look at Marionette, it adds a few nice features to Backbone, one of which being collection views This is a nice way to separate out the view for your graphs.
If you think you'd like to grow the functionality of your dashboard to become more and more complex, then I would ditch the postdata model and do analysis on the client side, you can use libraries like d3, crossfilter, and rickshaw. This will give you a lot of flexibility to quickly add features. The advantages of keeping the postdata model would be: simplicity on the front end and best front end performance.
I think looping through collection shouldn't give that much impact on performance.
Although you can try something like this: additionally to BB Collection create a simple object with dates acting as keys and data object as values for calculations only. To query it you'll have to create an array of dates in requested range. It may work pretty fast if range will be relatively small but as range increases performance will get closer to just looping through the whole thing. I admit, it sounds a little crazy even to me. Some experimenting will definitely be needed.
I've been pondering moving our current admin system over to a JS framework for a while and I tested out AngularJS today. I really like how powerful it is. I created a demo application (source: https://github.com/andyhmltn/portfolio-viewer) that has a list of 'items' and displays them in a paginated list that you can order/search in realtime.
The problem that I'm having is figuring out how I would replicate this kind of behaviour with a larger data set. Ideally, I want to have a table of items that's sortable/searchable and paginated that's all in realtime.
The part that concerns me is that this table will have 10,000+ records at least. At current, that's no problem as it's a PHP file that limits the query to the current page and appends any search options to the end. The demo above only has about 15-20 records in. I'm wondering how hard it would be to do the same thing with such a large amount of records without pulling all of them into one JSON request at once as it'll be incredibly slow.
Does anyone have any ideas?
I'm used to handle large datasets in JavaScript, and I would suggest you to :
use pagination (either server-sided or client-sided, depending on the actual volume of your data, see below)
use Crossfilter.js to group your records and adopt a several-levels architecture in your GUI (records per month, double click, records per day for the clicked month, etc.)
An indicator I often use is the following :
rowsAmount x columnsAmount x dataManipulationsPerRow
Also, consider the fact that handling large datasets and displaying them are two very differents things.
Indeed pulling so many rows in one request would be a killer. Fortunately Angular has the ng-grid component that can do server-side paging (among many other things). Instructions are provided in the given link.