I've got a table which manages user scores, e.g.:
id scoreA scoreB ... scoreX
------ ------- ------- ... -------
1 ... ... ... ...
2 ... ... ... ...
Now i wanted to create a scoreboard which can be sorted by each of the scores (only descending).
However, I can't just query the entries and send them to the client (which renders them with Javascript) as the table contains thousands of entries and sending all of those entries to the client would create unreasonable traffic.
I came to the conclusion that all non-relevant entries (entries which may not show up in the scoreboard as the score is too low) should be discarded on the server-side with the following rule of thumb:
If any of the scores is within the top ten for this specific score keep the entry.
If none of the scores is within the top ten for this specific score discard it.
Now I ran into the question if this can be done efficiently with (My)SQL or if this processing should take place in the php-code querying the database to keep the whole thing performant.
Any help is greatly appreciated!
Go with rows, not columns, for storing scores. Have composite index on userid,score. A datetime column could also be useful. Consider not having the top 10 snapshot table anyway, just the lookup that you suggest. So an order by score desc and Limit 10 in query.
Not that the below reference is the authority on Covering Indexes, but to throw the term out there for your investigation. Good luck.
you can try to use INDEX for specific and performance enhances.
This will query specific results for your kind of problem.
Read about it here
good luck, buddy.
I would first fire a query to obtain the top 10. Then fire the query to get the results, using the top 10 in your sql.
I can't formulate the query until I know what you mean by top 10 - give an example.
Related
I have a question about how to have the pagination work in server side having mysql as database. I have figured most of it out and it works perfectly fine but i was thinking is there any way to have the count query eliminated. Let's say user searches for the word jack and the results count are 25 paginated by 10 per page. my paginator needs to know the amount of all rows to show correct amount in paginator. So the way i am doing it now is first i use a sql query like this to have the count of all the rows which correspond to that criteria:
"SELECT COUNT(id) AS count FROM " + table + query
Then i do another query against database like this that uses LIMIT and OFFSET options to have that exact page:
"SELECT * FROM " + table + query + " LIMIT ? OFFSET ?"
Then i return two objects to client. first the count of all rows and the seconds the rows user needs to see now. My question is is this the most efficient way to do this or is there any better way to do it too?
You can achieve this with one query, but it will have burden on outputted data i.e. if you limit 1000 records for example, then total_records will show the number 1000 times with all rows in the result set. But at the same time, it will reduce 1 query:
SELECT
column1,
column2,
column3,
(SELECT COUNT(id) FROM `table`) AS total_records
FROM
`table`
LIMIT 0, 10
I didn't see anything wrong with your approach (although you can send the query to database in one trip). With the traditional way of pagination in database, you must know the total records, so it's just how to get the count.
improvements are mostly to do it in a different way.
Improvement 1: infinite scroll, this is get ride of pagination. May not be what you wanted, but we are seeing more and more website adopting this way. Does the user really need to know how many pages for a free text search?
Improvement 2: use ElasticSearch, instead of database. It's built for free text search and will definitely perform better than database. You can also get count (hits) and pages in one search request.
I have an application where I need to measure timestamp based parameter values from each device. The information is heavily structured and the reason I haven't looked into databases is because I have to get all the data for 100 x 1000 = 100k rows every few minutes. I want to delete the data corresponding to the oldest timestamp in each group. I am using Python for programming but even JavaScript would do. I could not find the limit parameter in the Python CSV official module. Help is super appreciated.
Item 1
Timestamp, parameter1, parameter2...parameterN
...
100 rows
Item 2
Timestamp, parameter1, parameter2...parameterN
...
100 rows
...1000 items
Note: There are no headers to separate any rows, the Item 1,2 etc. are shown for representational purposes.
I need to be able to add new row every few minutes under each group and get rid of the old one effectively keeping the numbers at 100 per group
There's no limit parameter, because a reader is just an iterator, and Python has generic ways to do anything you might want to do with any iterator.
with open(path) as f:
r = csv.reader(f)
First 100:
itertools.islice(r, 100)
Last 100:
collections.deque(r, maxlen=100)
Max 100 by 3rd column:
heapq.nlargest(100, r, key=operator.itemgetter(2))
… and so on.
Store your data internally like this
dict [key] [timestamp][array of values]
data={}
if 'bob' in data:
data['bob']={}
data['bob'][timestamp]=list(values)
else:
data['bob'][new_timestamp]=list(values)
After 2 iterations your data array will look like
data['bob'][15000021][1,2,3,4,5]
data['bob'][15003621][5,6,7,8,9,0]
If you want the latest ... just get the unique keys for bob - and delete
- either anything more than n items (bob's values sorted by timestamp)
- or if the timestamp is less than now() - 2 days [or whatever your rule]
I use both mechanisms in similar datasets. I strongly suggest you then save this data, in case your process exists.
Should your data contain an OrderedDictionary (which would make the removal easier) - please not pickle will fail, however the excellent module dill (I am not kidding) is excellent, and handles all datatypes and closes much nicer IMHO.
** Moving from Comments **
I'm assuming reading the file from the bottom up help you... This can be done by prepending entries to the beginning of the file.
With that assumption you just need to rewrite the file on each entry. Read the new file to an array, push() the new entry, shift() the list and write to new file.
Alternatively you can continue to push() to the file and only read the first 100 entries. After doing your read you can remove the file and start a new one if you expect to consistently get more than 100 entries between reads, or you can clean the file to just 100 entries
So as we know firebase won't let order by multiple childs. I'm looking for a solution to filter my data so at the end I will be able to limit it to 1 only. So if I won't to get the lowest price it will be something like that:
ref.orderByChild("price").limitToFirst(1).on...
The problem is that I also need to filter it by dates (timestamp)
so for that only I will do:
.orderByChild("timestamp").startAt(startValue).endAt(endValue).on...
So for now that's my query and then I'm running on all results and checking for that one row that has the lowest price. my Data is pretty big and contains around 100,000 rows. I can changed it however I want.
for the first query that gets the lowest price but all timestamps causes that the returned row might be the lowest price but not in my dates range. However this query takes ONLY 2 seconds compared to the second one which takes 20 including my code to get the lowest price.
So, what are your suggestions on how to do it best? I know I can make another index which contains the timestamp and the price but those are different data values and it makes it impossible.
full data structure:
country
store
item
price,
timestamp
just to make it even more clear, I have 2 inner loops which runs over all countries and then over all stores. so the real query is something like that:
ref.child(country[i]).child(store[j]).orderByChild("timestamp").startAt(startValue).endAt(endValue).on...
Thanks!
I've recently started using Interactive Reports in my Oracle APEX application. Previously, all pages in the application used Classic Reports. The Interactive Report in my new page works great, but, now, I'd like to add a summary box/table above the Interactive Report on the same page that displays the summed values of some of the columns in the Interactive Report. In other words, if my Interactive Report displays 3 distinct manager names, 2 distinct office locations, and 5 different employees, my summary box would contain one row and three columns with the numbers, 3, 2, and 5, respectively.
So far, I have made this work by creating the summary box as a Classic Report that counts distinct values for each column in the same table that my Interactive Report pulls from. The problem arises when I try to filter my interactive report. Obviously, the classic report doesn't refresh based on the interactive report filters, but I don't know how I could link the two so that the classic report responds to the filters from the interactive report. Based on my research, there are ways to reference the value in the Interactive Report's search box using javascript/jquery. If possible, I'd like to reference the value from the interactive table's filter with javascript or jquery in order to refresh the summary box each time a new filter is applied. Does anyone know how to do this?
Don't do javascript parsing on the filters. It's a bad idea - just think on how you would implement this? There's massive amounts of coding to be done and plenty of ajax. And with apex 5 literally around the corner, where does it leave you when the APIs and markup are about to change drastically?
Don't just give in to a requirement either. First make sure how feasible it is technically. And if it's not, make sure you make it abundantly clear what the implications are in regard of time consumption. What is the real value to be had by having these distinct value counts? Maybe there is another way to achieve what they want? Maybe this is nothing more than an attempted solution, and not the core of the real problem. Stuff to think about...
Having said that, here are 2 options:
First method: Count Distinct Aggregates on Interactive reports
You can add these to the IR through the Actions button.
Note though, that this aggregate will be THE LAST ROW! In the example I've posted here, reducing the rows per page to 5 would push the aggregate row to the pagination set 3!
Second Method: APEX_IR and DBMS_SQL
You could use the apex_ir API to retrieve the IR's query and then use that to do a count.
(Apex 4.2) APEX_IR.GET_REPORT
(Apex 5.0) APEX_IR.GET_REPORT
Some pointers:
Retrieve the region ID by querying apex_application_page_regions
Make sure your source query DOES NOT contain #...# substitution strings. (such as #OWNER#.)
Then get the report SQL, rewrite it, and execute it. Eg:
DECLARE
l_report apex_ir.t_report;
l_query varchar2(32767);
l_statement varchar2(32000);
l_cursor integer;
l_rows number;
l_deptno number;
l_mgr number;
BEGIN
l_report := APEX_IR.GET_REPORT (
p_page_id => 30,
p_region_id => 63612660707108658284,
p_report_id => null);
l_query := l_report.sql_query;
sys.htp.prn('Statement = '||l_report.sql_query);
for i in 1..l_report.binds.count
loop
sys.htp.prn(i||'. '||l_report.binds(i).name||' = '||l_report.binds(i).value);
end loop;
l_statement := 'select count (distinct deptno), count(distinct mgr) from ('||l_report.sql_query||')';
sys.htp.prn('statement rewrite: '||l_statement);
l_cursor := dbms_sql.open_cursor;
dbms_sql.parse(l_cursor, l_statement, dbms_sql.native);
for i in 1..l_report.binds.count
loop
dbms_sql.bind_variable(l_cursor, l_report.binds(i).name, l_report.binds(i).value);
end loop;
dbms_sql.define_column(l_cursor, 1, l_deptno);
dbms_sql.define_column(l_cursor, 2, l_mgr);
l_rows := dbms_sql.execute_and_fetch(l_cursor);
dbms_sql.column_value(l_cursor, 1, l_deptno);
dbms_sql.column_value(l_cursor, 2, l_mgr);
dbms_sql.close_cursor(l_cursor);
sys.htp.prn('Distinct deptno: '||l_deptno);
sys.htp.prn('Distinct mgr: '||l_mgr);
EXCEPTION WHEN OTHERS THEN
IF DBMS_SQL.IS_OPEN(l_cursor) THEN
DBMS_SQL.CLOSE_CURSOR(l_cursor);
END IF;
RAISE;
END;
I threw together the sample code from apex_ir.get_report and dbms_sql .
Oracle 11gR2 DBMS_SQL reference
Some serious caveats though: the column list is tricky. If a user has control of all columns and can remove some, those columns will disappear from the select list. Eg in my sample, letting the user hide the DEPTNO column would crash the entire code, because I'd still be doing a count of this column even though it will be gone from the inner query. You could block this by not letting the user control this, or by first parsing the statement etc...
Good luck.
On our web application, the search results are displayed in sortable tables. The user can click on any column and sort the result. The problem is some times, the user does a broad search and gets a lot of data returned. To make the sortable part work, you probably need all the results, which takes a long time. Or I can retrieve few results at a time, but then sorting won't really work well. What's the best practice to display sortable tables that might contain lots of data?
Thanks for all the advises. I will certainly going over these.
We are using an existing Javascript framework that has the sortable table; "lots" of results means hundreds. The problem is that our users are at some remote site and a lot of delay is the network time to send/receive data from the data center. Sorting the data at the database side and only send one page worth of results at a time is nice; but when the user clicks some column header, another round trip is done, which always add 3-4 seconds.
Well, I guess that might be the network team's problem :)
Using sorting paging at the database level is the correct answer. If your query returns 1000 rows, but you're only going to show the user 10 of them, there is no need for the other 990 to be sent across the network.
Here is a mysql example. Say you need 10 rows, 21-30, from the 'people' table:
SELECT * FROM people LIMIT 21, 10
You should be doing paging back on the database server. E.g. on SQL 2005 and SQL 2008 there are paging techniques. I'd suggest looking at paging options for whatever system you're looking at.
What database are you using as there some good paging option in SQL 2005 and upwards using ROW_NUMBER to allow you to do paging on the server. I found this good one on Christian Darie's blog
eg This procedure which is used to page products in a category. You just pass in the pagenumber you want and the number of products on the page etc
CREATE PROCEDURE GetProductsInCategory
(#CategoryID INT,
#DescriptionLength INT,
#PageNumber INT,
#ProductsPerPage INT,
#HowManyProducts INT OUTPUT)
AS
-- declare a new TABLE variable
DECLARE #Products TABLE
(RowNumber INT,
ProductID INT,
Name VARCHAR(50),
Description VARCHAR(5000),
Price MONEY,
Image1FileName VARCHAR(50),
Image2FileName VARCHAR(50),
OnDepartmentPromotion BIT,
OnCatalogPromotion BIT)
-- populate the table variable with the complete list of products
INSERT INTO #Products
SELECT ROW_NUMBER() OVER (ORDER BY Product.ProductID),
Product.ProductID, Name,
SUBSTRING(Description, 1, #DescriptionLength) + '...' AS Description,
Price, Image1FileName, Image2FileName, OnDepartmentPromotion, OnCatalogPromotion
FROM Product INNER JOIN ProductCategory
ON Product.ProductID = ProductCategory.ProductID
WHERE ProductCategory.CategoryID = #CategoryID
-- return the total number of products using an OUTPUT variable
SELECT #HowManyProducts = COUNT(ProductID) FROM #Products
-- extract the requested page of products
SELECT ProductID, Name, Description, Price, Image1FileName,
Image2FileName, OnDepartmentPromotion, OnCatalogPromotion
FROM #Products
WHERE RowNumber > (#PageNumber - 1) * #ProductsPerPage
AND RowNumber <= #PageNumber * #ProductsPerPage
You could do the sorting on the server. AJAX would eliminate the necessity of a full refresh, but there'd still be a delay. Sides, databases a generally very fast at sorting.
For these situations I employ techniques on the SQL Server side that not only leverage the database for the sorting, but also use custom paging to ONLY return the specific records needed.
It is a bit of a pain to implemement at first, but the performance is amazing afterwards!
How large is "a lot" of data? Hundreds of rows? Thousands?
Sorting can be done via JavaScript painlessly with Mochikit Sortable Tables. However, if the data takes a long time to sort (most likely a second or two [or three!]) then you may want to give the user some visual cue that soming is happening and the page didn't just freeze. For example, tint the screen (a la Lightbox) and display a "sorting" animation or text.