I have a JSON file that I used to create my database in MongoDB, which I am using to build a website. Here is a snippet, before I explain what I'm trying to do.
[
{
"name": "Field 1",
"description": "",
"completed": false,
"category": "Field",
"resources": [],
"items": [
{
"name": "Field 2",
"description": "",
"completed": false,
"category": "Field",
"resources": [],
"items": [
{
"name": "Field 3",
"description": "",
"completed": false,
"category": "Field",
"resources": [],
"items": [
{
"name": "Topic 1",
"description": "",
"completed": false,
"category": "Topic",
"resources": [],
"items": []
},
{
"name": "Topic 2",
"description": "",
"completed": false,
"category": "Topic",
"resources": [],
"items": []
}
My data (and the way I'd like to use it) is heavily dependent on the parent and children of any individual object. My database has 8 documents, but there's over many embedded objects in total, all embedded at different depths in the 'items' array of each object.
I just created my first draft of the site, which loads the original 8, and upon clicking any of them, it then lists the immediate children in the 'items' array, and it continues doing that until the end of that specific path. To do this, I rely on the index of each object in its array, and keep track of a path
But I would also like my users to be able to navigate directly to the page for a certain object without starting at the top and navigating through. If I wanted to directly access the object for Field 3 in the above example, what's the best way to do this with a single function or piece of code that could work with any of them?
The JSON file that I've used to create my database is 22,000+ lines long, and I'd love to not have to go back and change it more than is absolutely necessary. I was thinking I could add an ID somehow, and if that's unique, I could use that. The names of some objects will be the same, depending on where they are in the data.
EDIT: Bonus question - Would this sort of data be best stored in a relational database? I thought non-relational would be best because of the nested functionality, but I suppose I could make it work with either.
Related
I've been asked to look at flatting API responses on a Node.js project I'm working on but I'm not entirely sure why it needs to be flattened. No responses are nested beyond 2 levels so to me it's not too bad. Can anybody tell me why an API response would be flattened? I'd be keen to know the pros and cons aswell an hear any suggestions on how to do it? I'm currently looking at npm package flat.
Here's an example response:
{
"users": [
{
"id": 1,
"name": "John Doe",
"email": "john#doe.com",
"suppliers": [
{
"id": 1,
"name": "Supplier1",
}
]
},
{
"id": 2,
"name": "Jane Doe",
"email": "jane#doe.com",
"suppliers": [
{
"id": 1,
"name": "Supplier1",
},{
"id": 2,
"name": "Supplier2",
}
]
}
]
}
Can anybody tell me why an API response would be flattened? I'd be
keen to know the pros and cons aswell an hear any suggestions on how
to do it?
Pros:
An API needs to be flattened if the consumer of that API requires it to be flattened. For example, if the purpose of the API is to provide data that gets loaded into a table view of some kind, it would be much more convenient to the consumer if the API returned the data flattened to match the shape of that table.
Cons:
Flattening hierarchical data often times requires either limiting the number of child elements that can be returned or creating more rows that makes sense.
Consider your data flattened in these two approaches:
Flattened approach #1
{
"users": [
{
"id": 1,
"name": "John Doe",
"email": "john#doe.com",
"suppliers_1_id": 1,
"suppliers_1_name": "Supplier1",
"suppliers_2_id": null,
"suppliers_2_name": null
}, {
"id": 2,
"name": "Jane Doe",
"email": "jane#doe.com",
"suppliers_1_id": 1,
"suppliers_1_name": "Supplier1",
"suppliers_2_id": 2,
"suppliers_2_name": "Supplier2"
}
]
}
In this example, it has to be decided ahead of time what the maximum number of suppliers that can be returned. I really dislike this kind of design for storing data, but often it is the requested way for data to be displayed in a table. If for example there are 3 suppliers for a user, you would have no way to return that data without adding more columns. It very quickly becomes unmanageable unless you have a very small and finite maximum number of child rows.
Flattened approach #2
{
"users": [
{
"id": 1,
"name": "John Doe",
"email": "john#doe.com",
"supplier_id": 1,
"supplier_name": "Supplier1"
}, {
"id": 2,
"name": "Jane Doe",
"email": "jane#doe.com",
"supplier_id": 1,
"supplier_name": "Supplier1"
}, {
"id": 2,
"name": "Jane Doe",
"email": "jane#doe.com",
"supplier_id": 2,
"supplier_name": "Supplier2"
}
]
}
This approach simply repeats each user for the number of suppliers. This format is convenient for making it possible to convert the flattened data back to hierarchical data. It is a common approach when retrieving hierarchical data from a relational database in a single rowset. A client application can easily reassemble the original hierarchy by grouping by user id. The negative to this approach is it returns more than one row per top-level object, which can cause larger data payloads. If you have a single, internal consumer of the API, this approach may work. If it is for a public API, you'll have to spend more time documenting and supporting the API, because it may not make sense to your API consumers.
I'm not entirely sure why it needs to be flattened.
Whoever is asking you to flatten it should specify that actual shape they are looking for, and that may provide clarity. I've described two possible ways to flatten your data, but there are plenty more possibilities.
I have scraped following website: https://www.eex-transparency.com/homepage/power/czech-republic/production/availability/non-usability/non-usability using Selenium. I am scraping all the table data. It works well, but it takes rather a long time to run the script. Thus, I started searching for alternative and came across several topics here on StackOverflow using API to send request to server, but after hours of trying and searching for example I gave up, because I don't get several things:
How to reverse engineer API to send the right request?
Which url link should I use?
This is what I came up with:
import json
import requests
url = "https://www.eex-transparency.com/ajax/en/navigation/ajaxGetNavi/12"
data = {
"id": "16",
"title": "Czech Republic",
"url": "https:\\/\\/www.eex-transparency.com\\/homepage\\/power\\/czech-republic",
"class": "country",
"description": "",
"children": [
{
"id": "649",
"title": "Production",
"url": False,
"class": "",
"description": "",
"children": [
{
"id": "650",
"title": "Capacity",
"url": False,
"class": "",
"description": "",
"children": [
{
"id": "651",
"title": "Installed Capacity",
"url": "https:\\/\\/www.eex-transparency.com\\/homepage\\/power\\/czech-republic\\/production\\/capacity\\/installed-capacity",
"class": "",
"description": ""
}
]
}
]
}
]
}
response = requests.get(url, data=data)
file = response.json()
In general, maybe someone could explain, what steps should I take in order scrape the latter webpage, I am particularly interested how to find the correct info from Chrome (-> Inspect -> Network -> XHR) and how from the latter info to build data variable (that I input into requests)?
You can use Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
https://github.com/scrapy/scrapy/
The Project
I'm currently using FullCalendar to display a constantly changing calendar containing various information. When the user clicks on a day, a modal appears displaying a title, description, and links to files.
My current JSON object looks like:
{
"title": "myTitle",
"start": "2015-12-18T09:00:00",
"end": "2015-12-18T10:00:00",
"item_number": "1",
"description": "Test Document",
"items":[
{
"docName":"Document 1",
"docUrl":"docName.pdf"
},
{
"docName":"Document 2",
"docUrl":"docName-2.pdf"
},
{
"docName":"Document 3",
"docUrl":"docName-3.pdf"
},
],
"id": 0
}
The setup:
There are three teams:
Schedulers - This team modifies the schedule, then notifies me to edit the JSON file, updating the calendar.
Editors - This team creates then sends the documents to me to upload to the server and modify the JSON file.
Developers - This team puts everything together. Some days, you might have 60-90 edits throughout the day.
Currently, the JSON document is manually modified by the developers while we test.
My Plan:
Since the schedulers are not very tech-savvy, what I'm doing is having them modify a Google Spreadsheet that is published as a CSV and converted to JSON through PHP. This creates the following JSON object:
{
"title": "myTitle",
"start": "2015-12-18T09:00:00",
"end": "2015-12-18T10:00:00",
"item_number": "1",
"description": "Test Document"
}
The editors will create their documents and upload using Dropzone. A JSON object is created referencing the file(s):
"items":[
{
"docName":"Document 1",
"docUrl":"docName.pdf"
},
{
"docName":"Document 2",
"docUrl":"docName-2.pdf"
},
{
"docName":"Document 3",
"docUrl":"docName-3.pdf"
}
]
The two JSON objects are combined and an ID is assigned:
{
"title": "myTitle",
"start": "2015-12-18T09:00:00",
"end": "2015-12-18T10:00:00",
"item_number": "1",
"description": "Test Document",
"items":[
{
"docName":"Document 1",
"docUrl":"docName.pdf"
},
{
"docName":"Document 2",
"docUrl":"docName-2.pdf"
},
{
"docName":"Document 3",
"docUrl":"docName-3.pdf"
}
],
"id": 0
}
When changes occur – maybe there's a change to a document's name, documents are added or removed, or dates change – the individual JSON objects are re-created and re-combined.
The JSON file has hundreds of objects and, what I'm having trouble with is inserting the "items" key in the correct object. For instance, objects with IDs 0-5 are created. "items" of ID=0 is modified. Update "items" of ID=0 and not 1-5.
The Question(s)
Using PHP or JavaScript, how can I link these two JSON objects correctly?
Would it be better to feed all of this information into a database (MySQL) and then construct the JSON file?
i come from a MySQL workflow and now i am trying to do something in Firebase, and i have some dilemmas with structuring my data, because i don't know how would i query it.
fallowing the example below where i have some users and some comments, what is a good way to find
how many post likes a user had
how many post comments a user had
what were the posts that a user liked
...
i was thinking on adding that information to a user, like:
"user:2": {
"name": "Doe",
"email": "doe#bob.com",
"post_likes": {
"post:1": 1,
"post:2": 1
},
"post_comments": {
"post:1": 15,
"post:2": 5
}
}
but this seems redundant and duplicates data..
i need to find a way to search in posts everything that has to do with user:1, maybe i need to create another object that connects the users and posts ??
any ideas?
{
"array": {
"users": {
"user:1": {
"name": "John",
"email": "john#bob.com"
},
"user:2": {
"name": "Doe",
"email": "doe#bob.com"
}
},
"posts": {
"post:1": {
"name": "First Post",
"content": "some post content",
"comments": {}
},
"post:2": {
"name": "Second Post",
"content": "some other post content",
"likes": 2,
"comments": {
"comment:1": {
"uid": "user:1",
"email": "some comment"
},
"comment:2": {
"uid": "user:2",
"email": "some other comment"
}
}
}
}
}
}
How to structure data is not a simple question to answer and is highly dependent on how the data will be read. The general rule is to work hard on writes to make reads easy. Firebase offers two guides on understanding data and on structuring data.
They won't fit nicely into this text box, so you'll find better answers there. Here's a birds-eye view.
It's JSON data.
Flatten your data. Don't nest just because you can. Treat each logical data component like you would in SQL and keep it in its own path.
Avoid arrays in distributed systems. Sequential numeric ids are error prone.
Utilize indices to create sub-lists of master data rather than trying to nest lists in lists in lists.
I am tackling frontend development (AngularJS) and rather than pull data from the backend (which isn't complete but renders everything to JSON), I am looking to just use hardcoded JSON.
However, I am new to this and can't seem to find anything about complex JSON structure. In a basic sense, my web app has users and the content they create. So, in my mind, there will be two databases, but I'm not sure if I'm approaching it correctly.
Users - username, location, created content, comments, etc.
"user": [
{
"userID": "12",
"userUserName": "My Username",
"userRealName": "My Real Name",
"mainInterests": [
{
"interest": "Guitar"
},
{
"interest": "Web Design"
},
{
"interest": "Hiking"
}
],
"userAbout": "All about me...",
"userComments": [
{
"comment": "this is a comment", "contentID" : "12"
},
{
"comment": "this is another comment", "contentID" : "123"
}
],
}
]
Created Content - title, description, username, comments, etc.
"mainItem": [
{
"mainID": "1",
"mainTitle": "Guitar Lessons",
"mainCreatorUserName": "My Username",
"mainCreatorRealName": "My Real Name",
"mainTags": [
{
"tag": "Intermediate"
},
{
"tag": "Acoustic"
},
{
"tag": "Guitar"
}
],
"mainAbout": "Learn guitar!",
"mainSeries": [
{
"videoFile": "file.mov",
"thumbnail": "url.jpg",
"time": "9:37",
"seriesNumber": "1",
"title": "Learn Scales"
},
{
"videoFile": "file.mov",
"thumbnail": "url.jpg",
"time": "8:12",
"seriesNumber": "2",
"title": "Learn Chords"
}
],
"userComments": [
{
"comment": "this is a comment", "userID" : "12"
},
{
"comment": "this is another comment", "userID" : "123"
}
]
}
]
And there is more complexity than that, but I just would like to understand if I'm approaching this right. Maybe I'm even approaching this entirely incorrectly (for instance, CRUD vs. REST? Does it matter here? As I understand it, REST implies that each of the objects above are resources with their own unique URI? So would JSON rendered be impacted?). I really am not sure. But ultimately, I need to use the JSON structure properly pull data into my frontend. Assumably, whatever said structure is will be mirrored and rendered in the backend.
Edit* thank you guys for the replies. I think part of my question, where I clarify "complex", is missing. So I'd like to explain. I guess more than the JSON itself, I really mean the structure of the data. For instance, in my example, I am structuring my data to all be beneath two unique objects (user and content). Is this correct? Or should I think about my data more diverse? For instance, technically I could have a comments database (where each comment is the main object). Or is that still implied in my dataset? Perhaps my question isn't even about JSON as much as it is the data structure which will happen to get rendered in JSON. Hopefully this clarifies what I mean by complex?
Any and all help is appreciated.
I'm not sure why you're making what seems to be objects into single-item arrays (as evidenced by the opening square brackets). Other than that, it looks fine to me. Generally speaking single items (like "User") are structured as an object and multiples are arrays of objects.
As for the Angular stuff, if you want to pull direct from a JSON file as a test, take a look here:
var services = angular.module('app.services', [])
services.factory('User', function($http) {
var User = function(data) {
return data;
}
User.findOne = function(id) {
return $http.get('/test_user.json').then(function(response) {
return new User(response.data);
});
};
return User;
});
I also recomment looking into Deployed for doing development without access to live data services.