I am trying to figure out the best database schema for the following. I will be using Postgres along with Nodejs unless there is something better suited to this task.
I apologize if the answer is obvious. I am new to all of this.
I have a list of train-id's and a list of intermodal containers. I need to be able to query by intermodal container to check which train-id it is on, and also to query by train-id to get a list of all intermodal containers on this train. I would also like to be able to query historical information.
The issue I have is the train-id's repeat once per month so I can't use them as a primary key, or query just based on train-id as it would return containers on the train-id from previous months as well.
The best that I have come up with so far is to create a composite key consisting of the date of departure and the train-id and query based on this to get the list of containers. The day of departure is already included in the train-id, however the month of departure is not. I'm not sure how to make this user friendly though as preferably the user would not have to specify the month of departure.
For querying by container-id I believe I could just limit the result to 1 to only get the most recent train it is on, or is there a better way of doing this?
There will be other details stored in this database as well such as ETA's, car numbers, etc. However the above is what I'm having difficulty with currently.
Related
I am building a job board using Node.JS / MongoDB. After a job listing is purchased by the user, it is added to the database and, using a TTL index, it deletes after 30 days. I’m wondering if there’s a way to change a field vs. deleting the entire document? I ask because I would want to give the user the option to “renew” their listing after the expiration period. What would be the best way to approach this?
You can add a field called "renewalDate", initialize it to the create date and then query items that have a renewalDate that is less than 30 days back from now for items to display. Then, to "renew" a listing, you just set the renewalDate to a more current date so it will appear in the query again.
You could then run a periodic task (once a night or once a week) to permanently delete any documents that are old enough that they aren't even eligible for renewal any more. Or you could use the TTL feature to manage this.
I have a use case where I need to do complicated string matching on records of which there are about 5.1 Million of. When I say complicated string matching, I mean using library to do fuzzy string matching. (http://blog.bripkens.de/fuzzy.js/demo/)
The database we use at work is SAP Hana which is excellent for retrieving and querying because it's in memory so I would like to avoid pulling data out of there and re-populating it in memory on the application layer but at the same time I cannot take advantages of the libraries (there is an API for fuzzy matching in the DB but it's not comprehensive enough for us).
What is the middle ground here? If I do pre-processing and associate words in the DB with certain keywords the user might search for I can cut down the overhead but are there any best practises that are employed when It comes to this ?
If it matters. The list is a list of Billing Descriptors (that show up on CC statements) therefore, the user will search these descriptors to find out which companies the descriptor belongs too.
Assuming your "billing descriptor" is a single column, probably of type (N)VARCHAR I would start with a very simple SAP HANA fuzzy search, e.g.:
SELECT top 100 SCORE() AS score, <more fields>
FROM <billing_documents>
WHERE CONTAINS(<bill_descr_col>, <user_input>, FUZZY(0.7))
ORDER BY score DESC;
Maybe this is already good enough when you want to apply your js library on the result set. If not, I would start to experiment with the similarCalculationMode option, like 'similarcalculationmode=substringsearch' etc. And I would always have a look at the response times, they can be higher when using some of the options.
Only if response times are to high, or many active concurrent users are using your query, I would try to create a fuzzy search index on your search column. If you need more search options, you can also create a fullext index.
But that all really depends on you use case, the values you want to compare etc.
There is a very comprehensive set of features and options for different use cases, check help.sap.com/hana/SAP_HANA_Search_Developer_Guide_en.pdf.
In a project we did a free style search on several address columns (name, surname, company name, post code, street) and we got response times of 100-200ms on ca 6 Mio records WITHOUT using any special indexes.
Hi Stackoverflow Family,
So this is a pretty big question; be prepared to read quite a bit.
Basically my team and I working on an Employee scheduling application (we're already well into it, so please nothing about changing our stack or anything like that). Anyways, we're building this with PHP, Mongo, JavaScript, JQuery, and Bootstrap; and I'm working more on the db side of things.
What I really want to find out after asking this question is to see if my current approach to the db is right; I've basically created several collections and documents that reference each other in order to access or reference specific data that I'm looking for. If that doesn't really make sense check out my schema below:
Employee Collection - Contains indexes such as name, employeeNumber, address, etc, employee availability, position & department.
(My issue here is I want it to reference other collections which contain their Shift information, but I can only really do that when I insert a document).
Shift Collection - Contains indexes such as shiftNumber, shiftStartTime, endTime (This collection, I basically want to reference to employee, such that for creating each employee I have their shift time connected to it).
**Schedule Collection ** - Now for the Schedule collection, this is the one that confuses me the most; I basically want our Calendar UI to be able to look through our schedule collection and be able to pull all the shifts in a certain day, or in a specific week. But I have no idea as to how I can approach this from the backend.
So far what I've done with the Schedule Collection is that I've mathematically created a Calendar year and placed that within the Schedule; basically it contains a document called Year, and in that Year it contains every day of the week with information such as day number, week number, leapYear, etc.
Anyways, I hope this is enough information; my main confusion is the with the main schedule, I think I nearly have the Employee collection functioning properly since it references the Department class with no issues. I just mainly can't figure out how to implement a full schedule in mongo!
Thanks guys!
I am working with a database that was handed down to me. It has approximately 25 tables, and a very buggy query system that hasn't worked correctly for a while. I figured, instead of trying to bug test the existing code, I'd just start over from scratch. I want to say before I get into it, "I'm not asking anyone to build the code for me". I'm not that lazy, all I want to know is, what would be the best way to lay out the code? The existing query uses "JOIN" to combine the results of all the tables in one variable, and spits it into the query. I have been told in other questions displaying this code, that it's just too much, and far too many bugs to try to single out what is causing the break.
What would be the most efficient way to query these tables that reference each other?
Example: Person chooses car year, make, model. PHP then gathers that information, and queries the SQL database to find what parts have matching year, vehicle id's, and parts compatible. It then uses those results to pull parts that have matching car model id's, OR vehicle id's(because the database was built very sloppily, and compares all the different tables to produce: Parts, descriptions, prices, part number, sku number, any retailer notes, wheelbase, drive-train compatibility, etc.
I've been working on this for two weeks, and I'm approaching my deadline with little to no progress. I'm about to scrap their database, and just do data entry for a week, and rebuild their mess if it would be easier, but if I can use the existing pile of crap they've given me, and save some time, I would prefer it.
Would it be easier to do a couple queries and compare the results, then use those results to query for more results, and do it step by step like that, or is one huge query comparing everything at once more efficient?
Should I use JOIN and pull all the tables at once and compare, or pass the input into individual variables, and pass the PHP into javascript on the client side to save server load? Would it be simpler to break the code up so I can identify the breaking points, or would using one long string decrease query time, and server loads? This is a very complex question, but I just want to make sure there aren't too many responses asking for clarification on trivial areas. I'm mainly seeking the best advice possible on how to handle this complicated situation.
Rebuild the database then make a php import to bring over the data.
I have recently started to use Firebase. However, since I am not familiar with NoSQL databases I am having a little trouble structurizing it.
I am developing a timesheet application, several users can input their starting and ending hours each day they go to work and it will be saved into a database.
At the moment the structure of my firebase looks like this :
However, I am having some trouble accessing this data in my application. On top of that this just doesn't feel right. First I wanted to just add a new entry under 'timesheet' every time a user inputs something, but obviously, I do not want a user to be able to add 2 entries for one day either.
I know that there's probably some complex way to stop a user from doing this, but I feel that this could all be solved in an easier way if I just saw how I should best structure this database.
Later I want to loop through all the days in the current month for a specific user to show him in a table all his starting/ending hours for each day of the month.
Update: I was thinking about denormalizing my database, but would that really help anything?
You should read this blog post, written by the Firebase team.
TL;DR is that you should denormalize your data because Firebase is optimized for certain kinds of operations (you don't need to fully understand the nitty gritty of that optimization to use Firebase properly).
"I am having some trouble accessing this data in my application"
What kind of trouble?
"I do not want a user to be able to add 2 entries for one day either."
That isn't really a denormalization problem; with your structure now, you could just check a path for null.
new Firebase('path/to/timesheets/April/'+ queryDate) === null, where queryDate is the date you want to check for.
If the above returns true, your user hasn't submitted a timesheet. If so, you shouldn't allow them to.
"I want to loop through all the days in the current month for a specific user to show him in a table all his starting/ending hours for each day of the month."
You can! Iterate with a for loop through all the values nested under the object that's returned when you ask for new Firebase("path/to/April").