Inserting data via mongos and javascript

Inserting data via mongos and javascript - javascript

I want to execute a java script file via the mongos for inserting data to my sharding set. In addition to that I want to add a dynamic variable and the NULL value -
I would login (manually) to the shell by
mongo hostip:port/admin my_script.js
My js looks like:
var amount = 1000000;
var x=1;
var doc= '';
for (i=0; i<amount; i++)
{
doc = { a: '1', b: '2', c: 'text' , d: 'x', e: 'NULL'}
db.mycol.insert(doc);
x=x + 1
}
(Rather "x" I could just use "i")
does "d" writes the value of "x" or just the letter "x"?
does "e" writes the text "Null" or the .. let's say "database NULL"
Is the way I do that procedure correctly? (Concerning how I connect to mongos / the sharding set)
best regards
EDIT:
And very important - how can I figure out the time, the mongodb/sharding set needs to store all the data? And balanced it?
Edit 2nd:
Hi Ross,
I have a sharding set that consists of two shards (two replicasets). At the moment I'm testing and therefore I use the loop-counter as the shard key.
Is there a way to check the time within the javascript?
Update:
So measuring the time that is needed for storing the data is equivalent to the time the javascript is executed? (Or the time the mongo shell isn't accessible because of executing)
Is that assumption acceptable for measuring the query response time?
(where do I have to store the java script file?)

You dont need to keep multiple counters - as you are incrementing i on each iteration of the for loop. As you want the values and not strings the use i for the value of d and null instead of the string "NULL" - heres the cleaned up loop:
var amount = 1000000;
for (i=1; i<amount+1; i++) {
doc = { a: '1', b: '2', c: 'text' , d: i, e: null }
db.mycol.insert(doc);
}
Regarding how long it takes to store / balance your data - that depends on a few factors.
Firstly, what is your shard key? Is it a random value or is it an increasing value (like a timestamp). A random pattern for shard keys help ensure an even distribution of writes and if you know the ranges of the shard key, you could pre-split the shard to try and ensure that it stays balanced when loading data. If the shard key is increasing like a timestamp then most likely one shard will become hot and it will always be at the top end of the range and will have to split chunks and migrate the data to the other shards.
At MongoDB UK there were a couple of good presentations about sharding: Overview of sharding and Sharding best practices.
Update:
Regarding how long will it take for the shards to become balanced - this depends on the load on your machines. Balancing is a lightweight process so should be considered a background operation. Its important to note, that even with a sharded system as soon as the data is written to the mongos its accessible for querying against. So if a shard becomes imbalanced during a data load the data is still accessible - it may take time to rebalance the shard - depending on the load of the shard and the additions of new data, meaning chunks need to be split before migrating.
Update2
The inserts to mongos are synchronous, so the time it takes to run the script is the time it took to apply the inserts. There are other options about the durability of writes using getLastError essentially how long you block while the write is written. The shell calls getLastError() transparently but the default for your language of choice is to be asynchronous and not wait for a server response.
Where to store the javascript file? - Well thats up to you - its your application code. Most users will write an application in their preferred language and use the driver to call mongodb.

Related

JMeter reusing previous thread payload instead of new thread payload

I am trying to adapt a script I already have to run using .csv data input. When the script is ran without the .csv, it runs perfectly for any configurations I choose to use. When it runs using the .csv, whatever scenario is in the first row will run perfect, but everything from there on will fail. The reason for the failure is because some of my variables are being reused from the first thread and I don't know how to stop this from happening.
This is what my script looks like:
HTTP Request - GET ${url} (url is declared in the CSV data input, and changes each run)
-> postprocessor that extracts Variable_1, Variable_2 and Variable_3
Sampler1
-> JSR223 preprocessor: creates payloadSampler1 using javascript, example:
var payloadSampler1 = { };
payloadSampler1.age = vars.get("Variable_2");
payloadSampler1.birthDate = "1980-01-01";
payloadSampler1.phone = {};
payloadSampler1.phone.number = "555-555-5555";
vars.put("payloadSampler1", JSON.stringify(payloadSampler1));
Sampler2
-> JSR223 preprocessor: creates payloadSampler1 using javascript (same as above but for different values)
Sampler3
-> JSR223 preprocessor: creates payloadSampler1 using javascript (same as above but for different values)
Sampler4
-> JSR223 preprocessor: creates payloadSampler1 using javascript (same as above but for different values)
HTTP Request - POST ${url}/${Variable_1}/submit
-> JSR223 preprocessor: creates payloadSubmit using javascript, and mix and matching the results from the above samplers - like so:
var payloadSubmit = { };
if (vars.get("someVar") != "value" && vars.get("someVar") != "value2" && vars.get("differentVar") != "true") {
payloadSubmit.ageInfo = [${payloadSampler1}];
}
if (vars.get("someVar2") != "true") {
payloadSubmit.paymentInfo = [${payloadSampler2}];
}
payloadSubmit.emailInfo = [${payloadSampler3"}];
payloadSubmit.country = vars.get("Variable_3");
vars.put("payloadSubmit", JSON.stringify(payloadSubmit));
-> BodyData as shown in the screenshot:
request
I have a Debug PostProcessor to see the values of all these variables I am creating. For the first iteration of my script, everything is perfect. For the second one, however, the Debug PostProcessor shows the values for all payloadSamplers and all the Variables correctly changed to match the new row data (from the csv), but, the final variable, payloadSubmit just reuses whatever the values where for the first thread iteration.
Example:
Debug PostProcessor at the end of first iteration shows:
Variable_1=ABC
Variable_2=DEF
Variable_3=GHI
payloadSampler1={"age":"18","email":null,"name":{"firstName":"Charles"}},{"age":"38","email":null}}
payloadSampler2={"paymentChoice":{"cardType":"CreditCard","cardSubType":"VI"}},"amount":"9.99","currency":"USD"}
payloadSampler3={"email":"tes#email.com"}
payloadSubmit={"ageInfo":[{"age":"18","email":null,"name":{"firstName":"Charles"}},{"age":"38","email":null}],"paymentInfo":[{"paymentChoice":{"cardType":"CreditCard","cardSubType":"VI"}},"amount":"9.99","currency":"USD"],"emailInfo":[{"email":"tes#email.com"}],"country":"GHI"}
But at the end of the 2nd iteration it shows:
Variable_1=123
Variable_2=456
Variable_3=789
payloadSampler1={"age":"95","email":null,"name":{"firstName":"Sam"}},{"age":"12","email":null}}
payloadSampler2={"paymentChoice":{"cardType":"CreditCard","cardSubType":"DC"}},"amount":"19.99","currency":"USD"}
payloadSampler3={"email":"tes2#email.com"}
payloadSubmit={"ageInfo":[{"age":"18","email":null,"name":{"firstName":"Charles"}},{"age":"38","email":null}],"paymentInfo":[{"paymentChoice":{"cardType":"CreditCard","cardSubType":"VI"}},"amount":"9.99","currency":"USD"],"emailInfo":[{"email":"tes#email.com"}],"country":"USA"}
I can also see that the final HTTP Request is indeed sending the old values.
My very limited understanding is that because I am invoking the variables like so "${payloadSampler1}" it will use the value that was set for that the first time the sampler was ran (back in the 1st thread iteration). These are the things I have tried:
If I use vars.get("payloadSubmit") on the body of an HTTP Sampler, I get an error, so that is not an option. If I use vars.get("payloadSampler1") on the Samplers that create the variables, extra escape characters are added, which breaks my JSON. I have tried adding a counter to the end of the variable name and having that counter increase on each thread iteration, but the results is the same. All the variables and samplers other than the last one have updated values, but the last one will always reuse the variables from the first thread iteration.
I also tried to use ${__javaScript(vars.get("payloadSubmit_"+vars.get("ThreadIteration")))}, but the results are always the same.
And I have also tried using the ${__counter(,)} element, but if I set it to TRUE, it will always be 1 for each thread iteration, and if I set it to FALSE, it starts at 2 (I am assuming it is because I use counter in another sampler within this thread - but even after removing that counter this still happens).
I am obviously doing something (or many things) wrong.
If anyone can spot what my mistakes are, I would really appreciate hearing your thoughts. Or even being pointed to some resource I can read for an approach I can use for this. My knowledge of both javascript and jmeter is not great, so I am always open to learn more and correct my mistakes.
Finally, thanks a lot for reading through this wall of text and trying to make sense of it.

It's hard to tell where exactly your problem is without seeing the values of these someVar and payload, most probably something cannot be parsed as a valid JSON therefore on 2nd iteration your last JSR223 PostProcessor fails to run till the end and as the result your payloadSubmit variable value doesn't get updated. Take a closer look at JMeter GUI, there is an yellow triangle with exclamation sign there which indicates the number of errors in your scripts. Also it opens JMeter Log Viewer on click
if there is a red number next to the triangle - obviously you have a problem and you will need to see the jmeter.log file for the details.
Since JMeter 3.1 it is recommended to use Groovy language for any form of scripting mainly due to the fact that Groovy has higher performance comparing to other scripting options. Check out Parsing and producing JSON guide to learn more on how to work with JSON data in Groovy.

Transfer array from python to javascript

I have a test.txt file that has two columns of data (x's and y's), eg:
#test.txt
1 23
2 234
4 52
43 5
3 35
And a python program that reads in these values and stores them in x and y as so:
#test.py
# Read the file.
f = open('test.txt', 'r')
# read the whole file into a single variable, which is a list of every row of the file.
lines = f.readlines()
f.close()
# initialize some variable to be lists:
x = []
y = []
# scan the rows of the file stored in lines, and put the values into some variables:
for line in lines:
p = line.split()
x.append(float(p[0]))
y.append(float(p[1]))
I want to take these values stored in x and y and transfer them to two similar arrays in a javascript program to display them as a graph. How can I transfer between python and javascript?

Your question is slightly vague. There are two possible ways your question can be interpreted, here are the corresponding answers:
Your python code needs to transfer them to some javascript running in the browser/node.js application. In order to do this, the Python half of your application would need to either store the data in a database and expose the data via some sort of API, which the javascript half could consume. To go a little fancier, you could set up a connection between the two web servers (e.g. socket.io) and have the Python half send the arrays over the connection as they're created.
This is what I believe you're trying to do based on the code you've posted: Preprocess some data in Python, and then pass it over to some other javascript piece of the puzzle where there's no real-time aspect to it.
In this case, you could simply write the arrays to JSON, and parse that in Javascript. To write to a file, you could do:
import json
data = {'x': x, 'y': y}
# To write to a file:
with open("output.json", "w") as f:
json.dump(data, f)
# To print out the JSON string (which you could then hardcode into the JS)
json.dumps(data)

Its unclear as to why you want to transfer this, but you can write it out into a js file simply by formatting the list into var someList = [[x1,y1], [x2,y2]].
Would be easier to build the string up withing from line in lines but in case you do have actual need of maintaining separate x and y lists
zippedList = list(list(zipped) for zipped in zip(xList,yList))
print('var someList = {};'.format(zippedList))
which gives you
var someList = [[1, 23], [2, 234], [4, 52], [43, 5], [3, 35]];
Write this out to a js file as needed or append to an existing html as
with open('index.html','a') as f:
f.write('<script> var transferredList={}; </script>'.format(zippedList))

Because you mention "display" together with JavaScript, I somewhat assume that your goal is to have JavaScript in a web browser displaying the data that your python script reads from the file in the backend. Transferring from backend python to frontend JavaScript is a piece of cake, because both languages "speak" JSON.
In the backend, you will do something like
import json
response = json.dumps(response_object)
and in the frontend, you can right away use
var responseObject = JSON.parse(responseString);
It is also quite clear how the frontend will call the backend, namely using ajax, usually wrapped in a handy library like jQuery.
The field opens, when it comes to making your python script run in a web server. A good starting point for this would be the Python How-To for using python in the web. You will have to consult with your web hosting service, too, because most hosters have clear guidance whether they admit CGIor, for example FastCGI. Once this is clear, maybe you want to take a look at the Flask micro-framework or something similar, which are offering many services you will need right out-of-the-box.
I have mentioned all the buzzwords here to enable your own investigation ;)

I didn't put the arrays stored in variables x and y, but you can do it if you know at least a bit of JavaScript.
var Request=new XMLHttpRequest();//request object
Request.open("GET","test.txt",true);
/*
* Open request;
* Take 2 arguments, but third is optional (async).
* If you want async, then set onreadystatechange instead of onload event
*/
Request.onreadystatechange=function(){
if(this.status==400){//status is okay
var Lines=this.responseText.split("\n");
alert(Lines[0])//will alert the first line
}
/*
* this.status>=500 are request issues or user network delay
* this.status=404 means file not found
*/
}

How to phase out the json loading process to multiple steps to achieve improved user experience?

I am using a web service that serves travel related data from third party sources, this data is converted to JSON and
is used to formulate the output based on search criteria a user.
If the web service subscribes to multiple third party service providers, the application receives thousands of potential search results for some searches.Some of these JSON files created for these search results are as [high as 2-4 MB][1] which causes considerable delay in attempting to loading the json results.
The whole json result set is required for further sorting and filtering operations on the search results by the users.Moving the sort and filtering operations to the back-end is not a possibility as for now.
For small and medium JSON result sets the current implementation works out well but large JSON result sets cause performance degradation.
How could I phasing out the JSON loading process to multiple steps to achieve improved user experience even with very large JSON result sets?
Any leads on how I can overcome this problem would be much appreciated.

I did find some a way to solve this issue so i though might as well post it for other who may find this answer useful.
The web pagination mechanism will automatically improve responsiveness of the system, user experience, and may reduce clutter on the page. Unless the returning result set is guaranteed to be very small, any web application with search capabilities must have pagination. For instance, if the result set is less then 30 rows, pagination may be optional. However, if it's bigger then 100 rows, pagination is highly recommended, and if it's bigger then 500 rows, pagination is practically required. There are a lot of different ways to implement the pagination algorithm. Depending on the method used, both performance and the user experience will be affected.
Some Alternatives on phasing out the Json loading processes to multiple steps
If the JSON is large, you could always contemplate a "paging" metaphor, where you download the first x records, and then make subsequent requests for the next "pages" of server data. Given that additional server records can presumably be added between requests, we will have to think carefully about that implementation, but it's probably easy to do.
Another way of making the JSON more efficient, is to limit the data returned in the initial request. For example, the first request could return just identifiers or header information for all or a reasonable subset of the records, and then you could have subsequent requests for additional details (e.g. the user goes to a detail screen).
A particular permutation of the prior point would be if the JSON is currently including any large data elements (e.g. Base64-encoded binary data such as images or sound files). By getting those out of the initial response, that will address many issues. And if you could return URLs for this binary data, rather than the data itself, that would definitely lend itself for lazy loading (as well as giving you the ability to easy download the binary information rather than a Base64 encoding which is 33% larger). And if you're dealing with images, you might want to also contemplate the idea of thumbnails v large images, handling the latter, in particular, in a lazy-loading model.
If adopting the first approach.....
If adopting the first approach which seems more feasible given the current development scenario, you first need to change your web search result processing back end code so that in response to a request, it only delivers n records, starting at at a particular record number. You probably also need to return the total number of records on the server so you can give the user some visual cues as to how much there is to scroll through.
Second, you need to update your web app to support this model. Thus, in your UI, as you're scrolling through the results, you probably want your user interface to respond to scrolling events by showing either
(a) actual results if you've already fetched it; or
(b) some UI that visually indicates that the record in question is being retrieved and then asynchronously request the information from the server.
You can continue downloading and parsing the JSON in the background while the initial data is presented to the user. This will simultaneously yield the benefit that most people associate with "lazy loading" (i.e. you don't wait for everything before presenting the results to the user) and "eager loading" (i.e. it will be downloading the next records so they're already ready for the user when they go there).
General Implementation steps to follow for first approach
Figure out how many total rows of data there are once merge process is complete.
Retrieve the first n records;
Send request for subsequent n records and repeat this process in the background until full result set is received.
If the user scrolls to a record, you show it if you've got it, but if not, provide visually cue that you're going to get that data for them asynchronously.
when the data comes in merge JSON update the UI, if appropriate.
If by any channce filters and sorting options are applied on web end they should be disabled until full result set is received.
General Implementation steps to follow to generating page links
After choosing the best approach to retrieve the result set and traverse it, the application needs to create the actual page links for users to click. Below is a generic function pseudo-code, which should work with any server-side technology. This logic should reside in your application, and will work with both database-driven and middle-tier pagination algorithms.
The function takes three (3) parameters and returns HTML representing page links.The parameters are query text, staring location row number, and total number of the result set rows. The algorithm is clever enough to generate appropriate links based on where the user is in the navigation path.
Note: I set the default of 50 rows per page, and a page window to 10. This means that only 10 (or fewer) page links will be visible to the user.
private String getSearchLinks(String query, int start, int total) {
// assuming that initial page = 1, total = 0, and start is 0
String result = "";
int start = 0;
if (total == 0) { return ""; // no links }
Int page_size = 50; // number of rows per page
//page window - number of visible pagers per page
Int window = 10;
int pages = ceil(total / page_size );
result = "Pages:";
int current_page = (start / page_size ) + 1;
//numeric value of current page ex. if start is 51 : 51/50 =
// 1 + 1 = 2
Int left_link_count = ((current_page - 1) > (window / 2))
? (window / 2 - 1) : (current_page - 1);
Int pageNo = current_page - left_link_count;
if (pageNo > 1) { // show first page and if there are more
// links on the left
result += "1 .. ";
result += "«";
}
for (int i = 0; i < window-1; i++) {
if (pageNo > pages) {
break;
}
else if (pageNo == current_page) {
result += "" + pageNo + "";
}
else {
result += "pageNo";
}
pageNo++;
} // end for
if ((pageNo - 1) < pages) {
result += "»";
}
result += "
Showing"+((start > total)?total+1:start+1)+
" - "+(((start + 50)>total)?total:(start + 50))+"
of Total:"+total;
return result;
}
This logic does not care how the viewable portion of the result set is generated. Whether it is on the database side or on the application server's side, all this algorithm needs is a "total" number of results (that can be per-cached after the first retrieval), and the indicator ("start") containing which row number was the first on the last page user was looking at. This algorithm also shows the first page link if the user is on a page beyond the initial page window (for example, page 20), and correctly accounts for the result sets with a number of rows not enough for 10 pages. (for example, only 5 pages)
The main "for loop" generates the links and correctly computes the "start" parameter for the next page. The Query string and Total are always the same values.
Points to note
In a portal where internal and external data is used there would be
an unavoidable performance overhead because the system will have to
wait until all external hotel information is fully received and
merged and sorted with local hotel details. Only then would we be
able to know the exact result count.
Deciding how many additional requests are required would be somewhat
simplified if a standard result count per page is used.
This change would provide the user with a quick and uniform page load
time for all search operations and it would be populated with the
most significant result block of the search on the first instance.\
Additional page requests could be handled through asynchronous
requests.
Not only the hotel count but the content sent also needs to be
checked and optimized for performance gains.
Some instances for similar hotel counts for the generated JSON object
size varies considerably depending on the third party suppliers
subscribed too.
Once the successive JSON messages are received these JSON blocks
would need to be merged. Prior to the merging the user may need to be
notified new results are available and if he/she wishes to merge the
new results or work with existing primary result set.
Useful Resources :*
Implementing search result pagination in a web application
How big is too big for JSON?

Improving Twitter's typeahead.js performance with remote data using Django

I have a database with roughly 1.2M names. I'm using Twitter's typeahead.js to remotely fetch the autocomplete suggestions when you type someone's name. In my local environment this takes roughly 1-2 seconds for the results to appear after you stop typing (the autocomplete doesn't appear while you are typing), and 2-5+ seconds on the deployed app on Heroku (using only 1 dyno).
I'm wondering if the reason why it only shows the suggestions after you stop typing (and a few seconds delay) is because my code isn't as optimized?
The script on the page:
<script type="text/javascript">
$(document).ready(function() {
$("#navPersonSearch").typeahead({
name: 'people',
remote: 'name_autocomplete/?q=%QUERY'
})
.keydown(function(e) {
if (e.keyCode === 13) {
$("form").trigger('submit');
}
});
});
</script>
The keydown snippet is because without it my form doesn't submit for some reason when pushing enter.
my django view:
def name_autocomplete(request):
query = request.GET.get('q','')
if(len(query) > 0):
results = Person.objects.filter(short__istartswith=query)
result_list = []
for item in results:
result_list.append(item.short)
else:
result_list = []
response_text = json.dumps(result_list, separators=(',',':'))
return HttpResponse(response_text, content_type="application/json")
The short field in my Person model is also indexed. Is there a way to improve the performance of my typeahead?

I don't think this is directly related Django, but I may be wrong. I can offer some generic advice for this kind of situations:
(My money is on #4 or #5 below).
1) What is an average "ping" from your machine to Heroku? If it's far, that's a little bit extra overhead. Not much, though. Certainly not much when compared to then 8-9 seconds you are referring to. The penalty will be larger with https, mind you.
2) Check the value of waitLimitFn and rateLimitWait in your remote dataset. Are they the default?
3) In all likelyhood, the problem is database/dataset related. First thing to check is how long it takes you to establish a connection to the database (do you use a connection pool?).
4) Second thing: how long it takes to run the query. My bet is on this point or the next. Add debug prints, or use NewRelic (even the free plan is OK). Have a look at the generated query and make sure it is indexed. Have your DB "explain" the execution plan for such a query and make it is uses the index.
5) Third thing: are the results large? If, for example, you specify "J" as the query, I imagine there will be lots of answers. Just getting them and streaming them to the client will take time. In such cases:
5.1) Specify a minLength for your dataset. Make it at least 3, if not 4.
5.2) Limit the result set that your DB query returns. Make it return no more than 10, say.
6) I am no Django expert, but make sure the way you use your model in Django doesn't make it load the entire table into memory first. Just sayin'.
HTH.

results = Person.objects.filter(short__istartswith=query)
result_list = []
for item in results:
result_list.append(item.short)
Probably not the only cause of your slowness but this horrible from a performance point of view: never loop over a django queryset. To assemble a list from a django queryset you should always use values_list. In this specific case:
results = Person.objects.filter(short__istartswith=query)
result_list = results.values_list('short', flat=True)
This way you are getting the single field you need straight from the db instead of: getting all the table row, creating a Person instance from it and finally reading the single attribute from it.

Nitzan covered a lot of the main points that would improve performance, but unlike him I think this might be directly related to Django (at at least, sever side).
A quick way to test this would be to update your name_autocomplete method to simply return 10 random generated strings in the format that Typeahead expects. (The reason we want them random is so that Typeahead's caching doesn't skew any results).
What I suspect you will see is that Typeahead is now running pretty quick and you should start seeing results appear as soon as your minLength of string has been typed.
If that is the case then we will need to into what could be slowing the query up, my Python skills are non-existent so I can't help you there sorry!
If that isn't the case then I would maybe consider doing some logging of when $('#navPersonSearch') calls typeahead:initialized and typeahead:opened to see if they bring up anything odd.

You can use django haystack, and your server side code would be roughly like:
def autocomplete(request):
sqs = SearchQuerySet().filter(content_auto=request.GET.get('q', ''))[:5] # or how many names you need
suggestions = [result.first_name for result in sqs]
# you have to configure typeahead how to process returned data, this is a simple example
data = json.dumps({'q': suggestions})
return HttpResponse(data, content_type='application/json')

Saving app state with localstorage

Just a quick question about saving an apps state using local storage. I'm about to start work on an iOS web app and I'm wondering if there may be any advantages or disadvantage to either of these models. Also, is there any major performance hit to saving every tiny change of the app state into local storage?
Number 1
Save the entire app state object as an JSON string to a single local storage key value pair.
var appstate = {
string: 'string of text',
somebool: true,
someint: 16
}
localStorage.setItem('appState', JSON.stringify(appstate));
Number 2
Save each variable of the app state to it's own key value pair in local storage.
var appstate = {
string: 'string of text',
somebool: true,
someint: 16
}
localStorage.setItem('string', appstate.string);
localStorage.setItem('bool', appstate.somebool);
localStorage.setItem('int', appstate.someint);

The only reason I would think it could be more efficient to store values separately would be if you anticipate values changing independently of each other. If they are stored separately you could refresh one set of values without touching another, and therefore have better handling of value expiration. If, for example, somebool changes frequently but the rest does not, it might be better to store it separately. Group together data with similar expiration and volatility.
Otherwise, if you just want to save the entire application state, I would think that a single string would be fine.

Consider reads vs. writes (changes). For frequent reads, it doesn't really matter because JavaScript objects are hashes with constant (O(1)) response time (see Javascript big-O property access performance). For writes, as nullability says, there is a difference. For the single-object design, frequent writes could get slower if you end up with a large number of (or large-sized) properties in the object.

We Keep Coding

JavaScript is the programming language of the Web.