I have around 25M documents in my cluster. I need to read 1M documents at a time without any specific criterion. I don't have the access to the keys. So I need to create a view which will emit documents till I reach a counter which goes up to 1M.
I have written a Map function inside which I am trying to create a static variable, but JS doesn't support static variables. I am not sure how to do this operation. The map function which I have written is just to return 1000 documents and it is full of errors. Can someone help me with this functionality?
function (doc, meta) {
value = foo();
if(value < 1000)
{
emit(meta.id, null);
}else{
return;
}
}
function incrementor(){
if(typeof incrementor.counter == 'undefined'){
incrementor.counter = 0;
}
return ++incrementor.counter;
}
Reading a subset of documents with Views can be done using pagination: http://blog.couchbase.com/pagination-couchbase
The Map function is called for each mutation stored in the bucket so a counter approach like this does not make sense. If you want to split your indexes, you need to do that based on the content of the document. But you should really use pagination. This can also be achieved with N1Ql by the way.
Related
I want to get all the data from a table in Dynamo DB in nodejs, this is my code
const READ = async (payload) => {
const params = {
TableName: payload.TableName,
};
let scanResults = [];
let items;
do {
items = await dbClient.scan(params).promise();
items.Items.forEach((item) => scanResults.push(item));
params.ExclusiveStartKey = items.LastEvaluatedKey;
} while (typeof items.LastEvaluatedKey != "undefined");
return scanResults;
};
I implemented this and this is working fine, but our code review tool is flagging red that this is not optimized or causing some memory leak, I just cannot figure out why, I have read somewhere else that scanning API from dynamo DB is not the most efficient way to get all data in node or is there something else that I am missing to optimize this code
DO LIKE THIS ONLY IF YOUR DATA SIZE IS VERY LESS (less than 100 items or data size less than 1MB, that's I prefer and in that case you don't need a do-while loop)
Think about the following scenario, What about in case in future, more and more items will add in to DynamoDB table? - This will return all your data and put into the scanResults variable right? This will impact the memory. Also, DynamoDB scan operation is expensive - in terms of both memory and cost
It's perfectly okay to use SCAN operation if the data is very less. Otherwise, go with pagination (I always prefer this). If there are 1000's of items, then who will look in to all these in a single shot? So use pagination instead.
Lets take another scenario, If your requirement is to retrieve all the data for doing some analytics or aggregation. Then better store the aggregate data upfront into the table (same or different DynamoDB table) as an item or use some analytics database.
If your requirement is something else, elaborate it in the question.
Is there a way to get Google Optimize to return specific data values or even JSON objects for use with experiment callbacks?
We are using Google Optimize without using the visual editor. Instead, we simply want it to indicate which layout pattern to request from a separate API we set up a long time ago.
function gtag() {dataLayer.push(arguments)}
function implementExperimentA(value) {
if (value == '0') {
// Provide code for visitors in the original.
} else if (value == '1') {
// Provide code for visitors in first variant.
} else if (value == '2') {
// Provide code for visitors in section variant.
}
...
}
gtag('event', 'optimize.callback', {
name: '<experiment_id_A>',
callback: implementExperimentA
});
This code example is everywhere and basically what I want to use, but instead of value==1 etc, I want to to be able to use value as a request parameter.
$.get(url, {layoutId: value});
LayoutIds are not integers however. They are unique strings. So again, my question: Is there a way to get Google Optimize to return specific data values or even JSON objects for use with experiment callbacks? Or do I need to map all the Experiment indexes to their correlating API parameter values within my javascript code?
I'm trying to get a list of all code snippets of a GitLab (not GitHub!) project. The GitLab API provides allows me to collect at most 100 snippets per call, but provides Link Headers for the other (next,prev,first and last) pages (see this part of GitLab API).
What is the best way to perform asynchronous jQuery.get calls to all the pages and then pass all results in a single callback function?
Neither of the two ideas I have come up with seem very attractive:
Make a dummy call for the first page to get the total number of pages (X-Total-Pages in getResponseHeader), then generate all jQuery.gets and pass them to jQuery.when. Not too bad, but the call to the first page is wasted.
Run the jQuery.gets in a while loop and have each of them store its data to some global variable before launching the callback on that global variable. Disadvantage is that calls are not run in parallel and the global variable does not look like a clean solution.
Since I think this should be a somewhat common problem I was hoping there would be a clean solution somewhere?
EDIT
Here is an implementation of the first idea, to illustrate what I am asking for, namely how to avoid the first $.getJSON call whose results are not used?
function getAllSnippets(callback){
//Make a dummy call to get number of pages
data = {"private_token": "my_private_token", "per_page": 100, "page":1}
$.when(
$.getJSON("https://myserver/api/snippets",data)
).then(function( data, textStatus, jqXHR ) {
//Get the number of pages
numPages = jqXHR.getResponseHeader('X-Total-Pages')
console.log(numPages+' pages in total')
//Generate queries for each new page
var promises = [];
for (var iPage = 1; iPage < numPages+1; iPage++) {
data = {"private_token": "my_private_token", "per_page": 100, "page":iPage}
promises.push($.getJSON("https://myserver/api/snippets",data));
}
//Collect and merge results from all calls
Promise.all(promises).then(function(results) {
answers = []
for (var iPage = 0; iPage < numPages; iPage++){
answers = $.merge( answers, results[iPage] );
}
callback(answers);
}, function(err) {
alert("Failed to collect code snippets"+err);
});
});
}
In an effort to prevent certain objects from being created, I set a conditional in that type of object's beforeSave cloud function.
However, when two objects are created simultaneously, the conditional does not work accordingly.
Here is my code:
Parse.Cloud.beforeSave("Entry", function(request, response) {
var theContest = request.object.get("contest");
theContest.fetch().then(function(contest){
if (contest.get("isFilled") == true) {
response.error('This contest is full.');
} else {
response.success();
});
});
Basically, I don't want an Entry object to be created if a Contest is full. However, if there is 1 spot in the Contest remaining and two entries are saved simultaneously, they both get added.
I know it is an edge-case, but a legitimate concern.
Parse is using Mongodb which is a NoSQL database designed to be very scalable and therefore provides limited synchronisation features. What you really need here is mutual exclusion which is unfortunately not supported on a Boolean field. However Parse provides atomicity for counters and array fields which you can use to enforce some control.
See http://blog.parse.com/announcements/new-atomic-operations-for-arrays/
and https://parse.com/docs/js/guide#objects-updating-objects
Solved this by using increment and then doing the check in the save callback (instead of fetching the object and checking a Boolean on it).
Looks something like this:
Parse.Cloud.beforeSave("Entry", function(request, response) {
var theContest = request.object.get("contest");
theContest.increment("entries");
theContest.save().then(function(contest) {
if (contest.get("entries") > contest.get("maxEntries")) {
response.error('The contest is full.');
} else {
response.success();
}
});
}
Following typical REST standards, I broke up my resources into separate endpoints and calls. The main two objects in question here are List and Item (and of course, a list has a list of items, as well as some other data associated with it).
So if a user wants to retrieve his lists, he might make a Get request to api/Lists
Then the user might want to get the items in one of those lists and make a Get to api/ListItems/4 where 4 was found from List.listId retrieved in the previous call.
This is all well and good: the options.complete attribute of $.ajax lets me point to a callback method, so I can streamline these two events.
But things get very messy if I want to get the elements for all the lists in question. For example, let's assume I have a library function called makeGetRequest that takes in the end point and callback function, to make this code cleaner. Simply retrieving 3 elements the naive way results in this:
var success1 = function(elements){
var success2 = function(elements){
makeGetRequest("api/ListItems/3", finalSuccess);
}
makeGetRequest("api/ListItems/2", success2);
}
makeGetRequest("api/ListItems/1", success1);
Disgusting! This is the kind of thing in programming 101 we're smacked across the wrists for and pointed to loops. But how can you do this with a loop, without having to rely on external storage?
for(var i : values){
makeGetRequest("api/ListItems/" + i, successFunction);
}
function successFunction(items){
//I am called i-many times, each time only having ONE list's worth of items!
}
And even with storage, I would have to know when all have finished and retrieved their data, and call some master function that retrieves all the collected data and does something with it.
Is there a practice for handling this? This must have been solved many times before...
Try using a stack of endpoint parameters:
var params = [];
var results [];
params.push({endpoint: "api/ListItems/1"});
params.push({endpoint: "api/ListItems/2"});
params.push({endpoint: "api/ListItems/3"});
params.push({endpoint: "api/ListItems/4"});
Then you can make it recursive in your success handler:
function getResources(endPoint) {
var options = {} // Ajax Options
options.success = function (data) {
if (params.length > 0) {
results.push({endpoint: endpoint, data: data});
getResources(params.shift().endpoint);
}
else {
theMasterFunction(results)
}
}
$.get(endPoint, options)
}
And you can start it with a single call like this:
getResources(params.shift().endpoint);
Edit:
To keep everything self contained and out of global scope you can use a function and provide a callback:
function downloadResources(callback) {
var endpoints = [];
var results [];
endpoints.push({endpoint: "api/ListItems/1"});
endpoints.push({endpoint: "api/ListItems/2"});
endpoints.push({endpoint: "api/ListItems/3"});
endpoints.push({endpoint: "api/ListItems/4"});
function getResources(endPoint) {
var options = {} // Ajax Options
options.success = function (data) {
if (endpoints.length > 0) {
results.push({endpoint: endpoint, data: data});
getResources(endpoints.shift().endpoint);
}
else {
callback(results)
}
}
$.get(endPoint, options)
}
getResources(endpoints.shift().endpoint);
}
In use:
downloadResources(function(data) {
// Do stuff with your data set
});
dmck's answer is probably your best bet. However, another option is to do a bulk list option, so that your api supports requests like api/ListItems/?id=1&id=2&id=3.
You could also do an api search endpoint, if that fits your personal aesthetic more.