I'm trying to figure out how to query a resource to see how many rows it has before I request the entire resource at once, or whether I use paging to bring back rows in batches.
For example, this resource here:
https://data.cityofnewyork.us/Transportation/Bicycle-Routes/7vsa-caz7
1) In cases where I know the number of rows, I can use the $limit parameter to ensure I get back everything. For example, this dataset has about 17,000 rows, so giving it a $limit of 20000 gets all of them.
For example:
https://data.cityofnewyork.us/resource/cc5c-sm6z.geojson?$limit=20000
also...
2) I thought maybe to make a metadata call, but while this request here returns metadata, number of rows is not part of it:
https://data.cityofnewyork.us/api/views/metadata/v1/cc5c-sm6z
However, I would like to know how many rows are in the dataset before I decide how to request them: all at once with the $limit parameter, or paging with the $limit and $offset parameters.
Ideas?
One method could be to count the rows using the COUNT function in the API call.
Should note that YMMV on this approach. Generally, the maximum is around 50,000 rows before you need to switch to paging. Generally, I'll always throw a 50k limit and have paging ready if it's larger.
Related
I have a question about how to have the pagination work in server side having mysql as database. I have figured most of it out and it works perfectly fine but i was thinking is there any way to have the count query eliminated. Let's say user searches for the word jack and the results count are 25 paginated by 10 per page. my paginator needs to know the amount of all rows to show correct amount in paginator. So the way i am doing it now is first i use a sql query like this to have the count of all the rows which correspond to that criteria:
"SELECT COUNT(id) AS count FROM " + table + query
Then i do another query against database like this that uses LIMIT and OFFSET options to have that exact page:
"SELECT * FROM " + table + query + " LIMIT ? OFFSET ?"
Then i return two objects to client. first the count of all rows and the seconds the rows user needs to see now. My question is is this the most efficient way to do this or is there any better way to do it too?
You can achieve this with one query, but it will have burden on outputted data i.e. if you limit 1000 records for example, then total_records will show the number 1000 times with all rows in the result set. But at the same time, it will reduce 1 query:
SELECT
column1,
column2,
column3,
(SELECT COUNT(id) FROM `table`) AS total_records
FROM
`table`
LIMIT 0, 10
I didn't see anything wrong with your approach (although you can send the query to database in one trip). With the traditional way of pagination in database, you must know the total records, so it's just how to get the count.
improvements are mostly to do it in a different way.
Improvement 1: infinite scroll, this is get ride of pagination. May not be what you wanted, but we are seeing more and more website adopting this way. Does the user really need to know how many pages for a free text search?
Improvement 2: use ElasticSearch, instead of database. It's built for free text search and will definitely perform better than database. You can also get count (hits) and pages in one search request.
I have a collection with thousands of documents. Is there a way I can query the collection and return the first 500 documents? Then I want to load the next 500 documents (501-1000) and so on etc.
docDbClient.queryDocuments(collection._self, 'SELECT * FROM d ORDER BY d._ts DESC').toArray(function(error, arr) {});
Since skip and take are not part of the query language today (though marked as "planned" on UserVoice), you'd need to come up with an alternative approach. Cosmos DB has built-in paging (with continuation tokens), which allows you to read a chunk of data at a time. You can specify the maximum item count per page, and then, as you're ready for the next page, perform the next read with the continuation token received from the previous read.
Or you can come up with your own scheme, perhaps based on some specific property you have.
I want to get data from a database, to show on a page. There is a huge amount of rows in the table, so I'm using pages to avoid having to scroll forever.
I have functionnalities to search words (no specific columns), order by any column, and obviously change the page size and which page I am on.
I could, in theory, just ask the database for everything (SELECT * FROM myTable), send it to my html view, and work through the data entirely in javascript. The problem is, there is so much data that this is extremely slow using my structure (page controller calls my main logic, which calls a webservice, which calls the database), sometimes waiting up to 20 seconds for the original load of the page. After it's loaded, the javascript is usually fast.
Or, I could do most of that work in the controller, using Linq. I could also do the work in the webservice (it's mine), still in Linq. Or, I could straight away use WHERE, ORDER BY, COUNT, and a bunch of dynamic SQL requests so that I get instantly what I want from the database. But any of those forces me to refresh the page every time one of the parameters changes.
So I'm wondering about performance. For example, which is faster between:
var listObjects = ExecuteSQL("SELECT * FROM myTable");
return listObjects.Where(x => x.field == word).OrderBy(x => x.field);
and
var listObjects = ExecuteSQL("SELECT * FROM myTable WHERE field = :param1 ORDER BY field", word);
return listObjects;
And in what specific situations would using the different methods I've mentioned be better or worse?
No.
You want to do the work of selecting a block (pagefull) of data on your dataserver. That's it's job; it knows how to do it best.
So, forget the ExecuteSQL. You are pretty much shutting down everything's ability to help you. Try LINQ:
var page = (from m in MyTable
where m.field == param1
orderby m.field
select m)
.Skip((nPage-1)*pageLength).Take(pageLength);
That will generate the exact SQL to tell the Data Server to return just the rows you want.
How do I count the number of rows in a jqGrid?
To clarify, there is not much data involved so the grid is pulling all of its data back from the server in a single query, instead of using pagination.
jQuery("#myGrid").jqGrid('getGridParam', 'records');
Update
Note there are two parameters to determine record count:
records
integer
Readonly property. Gives the number of records returned as a result of a query to the server.
reccount
integer
Readonly property. Determines the exact number of rows in the grid. Do not confuse this with records parameter. Although in many cases they may be equal, there are cases where they are not. For example, if you define rowNum to be 15, but the request to the server returns 20 records, the records parameter will be 20, but the reccount parameter will be 15 (the grid you will have 15 records and not 20).
$("#grid").getGridParam("reccount");
Readonly property. Returns integer. Determines the exact number of rows in the grid. (And not the number of records fetched).
More information here.
Here is the code I have so far. It seems like there should be a better way:
jQuery("#myGrid").getDataIDs().length;
How about this?
jQuery("#myGrid tr").length;
Actually, you can take that a step further with the optional context parameter.
jQuery("tr", "#myGrid").length;
Either one will search for every "tr" inside of "#myGrid". However, from my own testing, specifying the context parameter is usually faster.
jQuery("#myGrid").jqGrid('getGridParam', 'records');
You could try:
jQuery("#GridId").jqGrid('getDataIDs');
On our web application, the search results are displayed in sortable tables. The user can click on any column and sort the result. The problem is some times, the user does a broad search and gets a lot of data returned. To make the sortable part work, you probably need all the results, which takes a long time. Or I can retrieve few results at a time, but then sorting won't really work well. What's the best practice to display sortable tables that might contain lots of data?
Thanks for all the advises. I will certainly going over these.
We are using an existing Javascript framework that has the sortable table; "lots" of results means hundreds. The problem is that our users are at some remote site and a lot of delay is the network time to send/receive data from the data center. Sorting the data at the database side and only send one page worth of results at a time is nice; but when the user clicks some column header, another round trip is done, which always add 3-4 seconds.
Well, I guess that might be the network team's problem :)
Using sorting paging at the database level is the correct answer. If your query returns 1000 rows, but you're only going to show the user 10 of them, there is no need for the other 990 to be sent across the network.
Here is a mysql example. Say you need 10 rows, 21-30, from the 'people' table:
SELECT * FROM people LIMIT 21, 10
You should be doing paging back on the database server. E.g. on SQL 2005 and SQL 2008 there are paging techniques. I'd suggest looking at paging options for whatever system you're looking at.
What database are you using as there some good paging option in SQL 2005 and upwards using ROW_NUMBER to allow you to do paging on the server. I found this good one on Christian Darie's blog
eg This procedure which is used to page products in a category. You just pass in the pagenumber you want and the number of products on the page etc
CREATE PROCEDURE GetProductsInCategory
(#CategoryID INT,
#DescriptionLength INT,
#PageNumber INT,
#ProductsPerPage INT,
#HowManyProducts INT OUTPUT)
AS
-- declare a new TABLE variable
DECLARE #Products TABLE
(RowNumber INT,
ProductID INT,
Name VARCHAR(50),
Description VARCHAR(5000),
Price MONEY,
Image1FileName VARCHAR(50),
Image2FileName VARCHAR(50),
OnDepartmentPromotion BIT,
OnCatalogPromotion BIT)
-- populate the table variable with the complete list of products
INSERT INTO #Products
SELECT ROW_NUMBER() OVER (ORDER BY Product.ProductID),
Product.ProductID, Name,
SUBSTRING(Description, 1, #DescriptionLength) + '...' AS Description,
Price, Image1FileName, Image2FileName, OnDepartmentPromotion, OnCatalogPromotion
FROM Product INNER JOIN ProductCategory
ON Product.ProductID = ProductCategory.ProductID
WHERE ProductCategory.CategoryID = #CategoryID
-- return the total number of products using an OUTPUT variable
SELECT #HowManyProducts = COUNT(ProductID) FROM #Products
-- extract the requested page of products
SELECT ProductID, Name, Description, Price, Image1FileName,
Image2FileName, OnDepartmentPromotion, OnCatalogPromotion
FROM #Products
WHERE RowNumber > (#PageNumber - 1) * #ProductsPerPage
AND RowNumber <= #PageNumber * #ProductsPerPage
You could do the sorting on the server. AJAX would eliminate the necessity of a full refresh, but there'd still be a delay. Sides, databases a generally very fast at sorting.
For these situations I employ techniques on the SQL Server side that not only leverage the database for the sorting, but also use custom paging to ONLY return the specific records needed.
It is a bit of a pain to implemement at first, but the performance is amazing afterwards!
How large is "a lot" of data? Hundreds of rows? Thousands?
Sorting can be done via JavaScript painlessly with Mochikit Sortable Tables. However, if the data takes a long time to sort (most likely a second or two [or three!]) then you may want to give the user some visual cue that soming is happening and the page didn't just freeze. For example, tint the screen (a la Lightbox) and display a "sorting" animation or text.