I'm building a system in Ruby using WebSockets that will notify JS clients of changes to collections that are applicable to the models & collections the JS client is viewing. I would like to have the JS client periodically send registration messages to the WebSocket telling it what models it is currently viewing, and also the collections (or collection subset specified by query).
So in order to make this work, the API hosting the WebSocket server will need to test if a query matches a document that has been updated/created. I would like to do this without sending a query to Mongo, and I found a solution in the C driver that would work on the (mongo) client side: http://api.mongodb.org/c/current/mongoc_matcher_new.html
http://api.mongodb.org/c/current/matcher.html
Unfortunately I didn't see a way of calling this method through the Ruby drivers. Any clue how I might be able to use the mongoc_matcher_new function in Ruby? Or does anyone have a better suggestion to improve the architecture of this solution to only send applicable updates to JS clients?
I don't think that you're going to be able to do it without querying back to Mongo. The standard way of doing this is by tailing the oplog, but the oplog is only going to give you the database/collection and the _id. So I don't think you would be able to support an arbitrary query with just the oplog, you would need to fetch the document in order to determine a match.
I would suggest that you take a look at how Meteor does this. This blog post gives an overview of their approach. There is also a wiki page with more specifics on OplogObserveDriver.
I eventually used FFI to run the code I needed:
MongoC.test_query_match(document_string, json_query_string)
C code:
#include <bcon.h>
#include <mongoc.h>
#include <stdio.h>
int test_query_match(char *document, char *query) {
mongoc_matcher_t *matcher;
bson_t *d;
bson_t *q;
bson_error_t doc_parse_error;
bson_error_t query_parse_error;
bson_error_t matcher_error;
d = bson_new_from_json(document, strlen(document), &doc_parse_error);
q = bson_new_from_json(query, strlen(query), &query_parse_error);
matcher = mongoc_matcher_new(q, &matcher_error);
if ( !matcher ) {
bson_destroy(q);
bson_destroy(d);
return 0;
}
int match = mongoc_matcher_match(matcher, d);
bson_destroy(q);
bson_destroy(d);
mongoc_matcher_destroy(matcher);
return match;
}
The Ruby FFI code:
require 'ffi'
module MongoC
extend FFI::Library
ffi_lib 'c'
ffi_lib File.dirname(__FILE__) + '/mongoc/mongoc.so'
attach_function :test_query_match, [:string, :string], :int
end
Related
Node.js is a perfect match for our web project, but there are few computational tasks for which we would prefer Python. We also already have a Python code for them.
We are highly concerned about speed, what is the most elegant way how to call a Python "worker" from node.js in an asynchronous non-blocking way?
This sounds like a scenario where zeroMQ would be a good fit. It's a messaging framework that's similar to using TCP or Unix sockets, but it's much more robust (http://zguide.zeromq.org/py:all)
There's a library that uses zeroMQ to provide a RPC framework that works pretty well. It's called zeroRPC (http://www.zerorpc.io/). Here's the hello world.
Python "Hello x" server:
import zerorpc
class HelloRPC(object):
'''pass the method a name, it replies "Hello name!"'''
def hello(self, name):
return "Hello, {0}!".format(name)
def main():
s = zerorpc.Server(HelloRPC())
s.bind("tcp://*:4242")
s.run()
if __name__ == "__main__" : main()
And the node.js client:
var zerorpc = require("zerorpc");
var client = new zerorpc.Client();
client.connect("tcp://127.0.0.1:4242");
//calls the method on the python object
client.invoke("hello", "World", function(error, reply, streaming) {
if(error){
console.log("ERROR: ", error);
}
console.log(reply);
});
Or vice-versa, node.js server:
var zerorpc = require("zerorpc");
var server = new zerorpc.Server({
hello: function(name, reply) {
reply(null, "Hello, " + name, false);
}
});
server.bind("tcp://0.0.0.0:4242");
And the python client
import zerorpc, sys
c = zerorpc.Client()
c.connect("tcp://127.0.0.1:4242")
name = sys.argv[1] if len(sys.argv) > 1 else "dude"
print c.hello(name)
For communication between node.js and Python server, I would use Unix sockets if both processes run on the same server and TCP/IP sockets otherwise. For marshaling protocol I would take JSON or protocol buffer. If threaded Python shows up to be a bottleneck, consider using Twisted Python, which
provides the same event driven concurrency as do node.js.
If you feel adventurous, learn clojure (clojurescript, clojure-py) and you'll get the same language that runs and interoperates with existing code on Java, JavaScript (node.js included), CLR and Python. And you get superb marshalling protocol by simply using clojure data structures.
If you arrange to have your Python worker in a separate process (either long-running server-type process or a spawned child on demand), your communication with it will be asynchronous on the node.js side. UNIX/TCP sockets and stdin/out/err communication are inherently async in node.
I've had a lot of success using thoonk.js along with thoonk.py. Thoonk leverages Redis (in-memory key-value store) to give you feed (think publish/subscribe), queue and job patterns for communication.
Why is this better than unix sockets or direct tcp sockets? Overall performance may be decreased a little, however Thoonk provides a really simple API that simplifies having to manually deal with a socket. Thoonk also helps make it really trivial to implement a distributed computing model that allows you to scale your python workers to increase performance, since you just spin up new instances of your python workers and connect them to the same redis server.
I'd consider also Apache Thrift http://thrift.apache.org/
It can bridge between several programming languages, is highly efficient and has support for async or sync calls. See full features here http://thrift.apache.org/docs/features/
The multi language can be useful for future plans, for example if you later want to do part of the computational task in C++ it's very easy to do add it to the mix using Thrift.
I'd recommend using some work queue using, for example, the excellent Gearman, which will provide you with a great way to dispatch background jobs, and asynchronously get their result once they're processed.
The advantage of this, used heavily at Digg (among many others) is that it provides a strong, scalable and robust way to make workers in any language to speak with clients in any language.
Update 2019
There are several ways to achieve this and here is the list in increasing order of complexity
Python Shell, you will write streams to the python console and it
will write back to you
Redis Pub Sub, you can have a channel
listening in Python while your node js publisher pushes data
Websocket connection where Node acts as the client and Python acts
as the server or vice-versa
API connection with Express/Flask/Tornado etc working separately with an API endpoint exposed for the other to query
Approach 1 Python Shell Simplest approach
source.js file
const ps = require('python-shell')
// very important to add -u option since our python script runs infinitely
var options = {
pythonPath: '/Users/zup/.local/share/virtualenvs/python_shell_test-TJN5lQez/bin/python',
pythonOptions: ['-u'], // get print results in real-time
// make sure you use an absolute path for scriptPath
scriptPath: "./subscriber/",
// args: ['value1', 'value2', 'value3'],
mode: 'json'
};
const shell = new ps.PythonShell("destination.py", options);
function generateArray() {
const list = []
for (let i = 0; i < 1000; i++) {
list.push(Math.random() * 1000)
}
return list
}
setInterval(() => {
shell.send(generateArray())
}, 1000);
shell.on("message", message => {
console.log(message);
})
destination.py file
import datetime
import sys
import time
import numpy
import talib
import timeit
import json
import logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
size = 1000
p = 100
o = numpy.random.random(size)
h = numpy.random.random(size)
l = numpy.random.random(size)
c = numpy.random.random(size)
v = numpy.random.random(size)
def get_indicators(values):
# Return the RSI of the values sent from node.js
numpy_values = numpy.array(values, dtype=numpy.double)
return talib.func.RSI(numpy_values, 14)
for line in sys.stdin:
l = json.loads(line)
print(get_indicators(l))
# Without this step the output may not be immediately available in node
sys.stdout.flush()
Notes: Make a folder called subscriber which is at the same level as source.js file and put destination.py inside it. Dont forget to change your virtualenv environment
I've been searching around the internet for a way to define a query in JavaScript, pass that query to PHP. Let PHP set up a MySQL connection, execute the query and return the results json encoded.
However my concern is with the security of this method since users could tamper with the queries and do things you don't want them to do or request data you do not want them to see.
Question
In an application/plugin like this, what kind of security measures would you suggest to prevent users from requesting information I don't want them to?
Edit
The end result of my plugin will be something like
var data = Querier({
table: "mytable",
columns: {"column1", "column2", "column3"},
where: "column2='blablabla'",
limit: "10"
});
I'm going to let that function make an AJAX request and execute a query in PHP using the above data. I would like to know what security risks this throws up and how to prevent them.
It's unclear from your question whether you're allowing users to type queries that will be run against your database, or if your code running in the browser is doing it (e.g., not the user).
If it's the user: You'd have to really trust them, since they can (and probably will) destroy your database.
If it's your code running in the browser that's creating them: Don't do that. Instead, have client-side code send data to the server, and formulate the queries on the server using full precautions to prevent SQL Injection (parameterized queries, etc.).
Re your update:
I can see at least a couple issues:
Here's a risk right here:
where: "column2='blablabla'"
Now, suppose I decide to get my hands on that before it gets sent to the server and change it to:
where: "column2=');DROP TABLE Stuff; --"
You can't send a complete WHERE clause to the server, because you can't trust it. This is the point of parameterized queries:
Instead, specify the columns by name and on the PHP side, be sure you're doing correct handling of parameter values (more here).
var data = Querier({
table: "mytable",
columns: {"column1", "column2", "column3"},
where: {
column2: {
op: '=',
value: 'blablabla'
}
}
limit: "10"
});
Now you can build your query without blindly trusting the text from the client; you'll need to do thorough validation of column names, operators, etc.
Exposing information about your scheme to the entire world is giving up information for free. Security is an onion, and one of the outer layers of that onion is obscurity. It's not remotely sufficient unto itself, but it's a starting point. So don't let your client code (and therefore anyone reading it) know what your table names and column names are. Consider using server-side name mapping, etc.
Depending on how you intend to do, you might have a hole bigger than the one made in this economy or no hole at all.
If you are going to write the query on client-side, and send to php, I would create a user with only select, insert, delete and update, without permissions to access any other database.
Ignore this if you use SQlite.
I advise against this!
If you build the query on server-side, just stuff to the server the data you want!
I would change the code into something like this:
var link = QuerierLink('sql.php');//filename to use for the query
var data = Querier('users',link);//locks access to only this table
data.select({
columns: ['id','name','email'],
where: [
{id:{'>':5}},
{name:{'like':'%david%'}}
],
limit:10
});
Which, on server-side, would generate the query:
select `id`,`name`,`email` from `db.users` where `id`>5 and `name` like '%david%' limit 10
This would be a lot better to use.
With prepared statements, you use:
select `id`,`name`,`email` from `db.users` where `id`>:id and `name` like :name limit 10
Passing to PDO, pseudo-code:
$query='select `id`,`name`,`email` from `'.$database_name.'.users` where `id`>:id and `name` like :name limit 10';
$result=$PDO->exec($query,array(
'id'=>5,
'name'=>'%david%'
)
);
This is the prefered way, since you have more control over what is passed.
Also, set the exact database name along the name of the table, so you avoid users accessing stuff from other tables/databases.
Other databases include information_schema, which has every single piece of information from your entire databasem, including user list and restrictions.
Ignore this for SQlite.
If you are going to use MySQL/MariaDB/other you should disable all read/write permissions.
You really don't want anyone writting files into your server! Specially into any location they wish.
The risk: They have a new puppy for the attackers to do what they wish! This is a massive hole.
Solution: Disable FILE privileges or limit the access to a directory where you block external access using .htaccess, using the argument --secure_file_priv or the system variable ##secure_file_priv.
If you use SQlite, just create a .sqlite(3) file, based on a template file, for each client connecting. Then you delete the file when the user closes the connection or scrap every n minutes for files older than x time.
The risk: Filling your disk with .sqlite files.
Solution: Clear the files sooner or use a ramdisk with a cron job.
I've wanted to implement something like this a long ago and this was a good way to exercice my mind.
Maybe I'll implement it like this!
Introducing easy JavaScript data access
So you want to rapidly prototype a really cool Web 2.0 JavaScript application, but you don't want to spend all your time writing the wiring code to get to the database? Traditionally, to get data all the way from the database to the front end, you need to write a class for each table in the database with all the create, read, update, and delete (CRUD) methods. Then you need to put some marshalling code atop that to provide an access layer to the front end. Then you put JavaScript libraries on top of that to access the back end. What a pain!
This article presents an alternative method in which you use a single database class to wrap multiple database tables. A single driver script connects the front end to the back end, and another wrapper class on the front end gives you access to all the tables you need.
Example/Usage
// Sample functions to update authors
function updateAuthorsTable() {
dbw.getAll( function(data) {
$('#authors').html('<table id="authors"><tr><td>ID</td><td>Author</td></tr></table>');
$(data).each( function( ind, author ) {
$('#authors tr:last').after('<tr><td>'+author.id+'</td><td>'+author.name+'</td></tr>');
});
});
}
$(document).ready(function() {
dbw = new DbWrapper();
dbw.table = 'authors';
updateAuthorsTable();
$('#addbutton').click( function() {
dbw.insertObject( { name: $('#authorname').val() },
function(data) {
updateAuthorsTable();
});
});
});
I think this is exactly what you're looking for. This way you won't have to build it yourself.
The more important thing is to be careful about the rights you grant to your MySQL user for this kind of operations.
For instance, you don't want them to DROP a database, nor executing such request:
LOAD DATA LOCAL INFILE '/etc/passwd' INTO TABLE test FIELDS TERMINATED BY '\n';
You have to limit the operations enabled to this MySQL user, and the tables he has accessed.
Access to total database:
grant select on database_name.*
to 'user_name'#'localhost' identified by 'password';
Access to a table:
grant select on database_name.table_name
to 'user_name'#'localhost' identified by 'password';
Then... what else... This should avoid unwanted SQL injection for updating/modifying tables or accessing other tables/databases, at least, as long as SELECT to a specific table/database is the only privillege you grant to this user.
But it won't avoid an user to launch a silly bad-performance request which might require all your CPU.
var data = Querier({
table: "mytable, mytable9, mytable11, mytable12",
columns: {"mytable.column1", "count(distinct mytable11.column2)",
"SUM(mytable9.column3)"},
where: "column8 IN(SELECT column7 FROM mytable2
WHERE column4 IN(SELECT column5 FROM mytable3)) ",
limit: "500000"
});
You have to make some check on the data passed if you don't want your MySQL server possibly down.
I am using breeze to query a table of customers. I have to implement a very complex query so I've decided to pass a parameter to the method and let the server to make the query. The problem is that using the method TAKE of BREEZE the list of customers that is returned by the server has a different order from the returned by the server.
I have made some test, and only this order is changed, when I am using the method TAKE of BREEZE. This is a bit of my code in the server and in the client:
//CLIENT
function(searchText,resultArrayObservable, inlineCountObservable){
query = new breeze.EntityQuery("CustomersTextSearch");
.skip(0)
.take(30)
.inlineCount(true)
.withParameters({ '': searchText});
return manager.executeQuery(query).then(function(data){
//The data results are not in the same order as the server resturn.
inlineCountObservable(data.inlineCount);
resultArrayObservable(customerDto.mapToCustomerDtos(data.results));
});
}
//SERVER ASP.NET WEB API
[HttpGet]
public IQueryable<Customer> CustomersTextSearch(string textSearch = "")
{
//Here, customers has the rigth order.
var customers= _breezeMmUow.Customers.GetBySearchText(textSearch, CentreId);
return customers;
}
Maybe is not a BUG, maybe I am doing something incorrect. Can somebody help me?
-------------EDIT-------------
1.3.2
Fix for Breeze/EF bug involving a single query with "expand", "orderBy", and "take" performing incorrect ordering.
I have found in the breeze page, that the problem was fixed, but I have the last version and it is still not working well with TAKE.
This was a bug, and has been fixed in Breeze v 1.3.6, available now.
I have a web service that returns a JSON object when the web service is queried and a match is found, an example of a successful return is below:
{"terms":[{"term":{"termName":"Focus Puller","definition":"A focus puller or 1st assistant camera..."}}]}
If the query does not produce a match it returns:
Errant query: SELECT termName, definition FROM terms WHERE termID = xxx
Now, when I access this through my Win 8 Metro app I parson the JSON notation object using the following code to get a JS object:
var searchTerm = JSON.parse(Result.responseText)
I then have code that processes searchTerm and binds the returned values to the app page control. If I enter in a successful query that finds match in the DB everything works great.
What I can't work out is a way of validating a bad query. I want to test the value that is returned by var searchTerm = JSON.parse(Result.responseText) and continue doing what I'm doing now if it is a successful result, but then handle the result differently on failure. What check should I make to test this? I am happy to implement additional validation either in my app or in the web service, any advice is appreciated.
Thanks!
There are a couple of different ways to approach this.
One approach would be to utilize the HTTP response headers to relay information about the query (i.e. HTTP 200 status for a found record, 404 for a record that is not found, 400 for a bad request, etc.). You could then inspect the response code to determine what you need to do. The pro of this approach is that this would not require any change to the response message format. The con might be that you then have to modify the headers being returned. This is more typical of the approach used with true RESTful services.
Another approach might be to return success/error messaging as part of the structured JSON response. Such that your JSON might look like:
{
"result":"found",
"message":
{
"terms":[{"term":{"termName":"Focus Puller","definition":"A focus puller or 1st assistant camera..."}}]}
}
}
You could obviously change the value of result in the data to return an error and place the error message in message.
The pros here is that you don't have to worry about header modification, and that your returned data would always be parse-able via JSON.parse(). The con is that now you have extra verbosity in your response messaging.
ok, I'm finally at my wits' end. I have a have an XMPP server (Openfire) running, and trying to connect via JavaScript using JSJaC. The strange thing is that I can establish a connection for some users, but not for all. I can reproduce the following behavior: create two accounts (username/password), namely r/pwd and rr/pwd with the result:
r/pwd works
rr/pwd doesn't work.
So far, each account with a user name consisting of only one character works. This is strange enough. On the other side, old accounts, e.g., alice/a work. The whole connection problem is quite new, and I cannot trace it to any changes I've made.
And to make my confusion complete with any instant messenger supporting XMPP, all accounts work, incl., e.g., rr/pwd. So assume, the error must be somewhere in my JavaScript code. Here's he relevant snippet:
...
oArgs = new Object();
oArgs.domain = this.server;
oArgs.resource = this.resource;
oArgs.username = "r";
oArgs.pass = "pwd";
this.connection.connect(oArgs);
The code above works, but setting oArgs.username = "rr", and it fails.
I would be grateful for any hints. I'm quite sure that it must be something really stupid I miss here.
Christian
Adding oArgs.authtype = 'nonsasl' to the argument list when creating the xmpp connection using JSJaC solved my problem. I haven't tried Joe's command to alter the SASL settings in Openfire; I'm scared to ruing my running system :).