Overview
On two separate Azure instances, on first one node.js servers are running and they connect to single node redis server running on second instance and node.js is trying to keep active connection with redis. Node Redis module is being used to store and retrieve data in redis and socket.io-emitter module is used to allow different servers to send messages based on collective connectivity of clients.
Problem
After the initial connection is done after sometime (sporadic)the connection freezes and finally crashes with ETIMEDOUT error being thrown from all servers.
What have I tried.
i. Added socket_keepalive first and then together with socket_initialdelay
const redis = require('redis');
let options = {socket_keepalive : true, socket_initialdelay : 200000};
global.client = redis.createClient(port, host, options);
ii. With socket.io-emitter module tried initialising it with new redis object using node redis module itself but the notifications stopped working after that so retracted back to the same thing.
This stopped the notification to devices individually
let options = {socket_keepalive : true, socket_initialdelay : 200000};
let redis_socket = require('redis');
let pub = redis_socket.createClient(port, host, options);
let ioz = require('socket.io-emitter')(pub);
*Obviously the timed out issue exists with the working method.
iii. On redis's server the timeout config is set at 0 and tcpdelay was 300 seconds but we tried changing it to 0 (tcpdelay) but still the problem persists.
It definitely breaks my head because same piece of code works in another environment but what is causing this is still a mystery, after investigating a bit more I realised that the clients connected (available with monitor command) drops after some time and hence etimedout error is thrown.
Same redis machine is also used for caching and it is working without any issue.
Looks like you might be hitting the TCP idle timeout of 4 minutes.
According the self-documented config for Redis 3.2, the value for tcp-keepalive has to be non-zero for it to work. So, you might want to set a value like 120 (240 / 2) instead and try again.
Related
So I have a Swift client, Node.js server, and am using socket.io. I have an issue where when the user changes from WiFi to LTE (passively, if they turn off wifi manually it works fine) while being connected to the server, for some reason they don't reconnect to the server (just hit a ping timeout). I've tried increasing ping timeout to 50 seconds with no effect. My users interact with each other while being connected to the same room so this is a big issue.
My connection code on the client-side looks like this:
var socket: SocketIOClient?
fileprivate var manager: SocketManager?
func establishConnection(_ completion: (() -> Void)? = nil) {
let socketUrlString: String = serverURL
self.manager = SocketManager(socketURL: URL(string: socketUrlString)!, config: [.forceWebsockets(true), .log(false), .reconnects(true), .extraHeaders(["id": myDatabaseID])])
self.socket = manager?.defaultSocket
self.socket?.connect()
//self.socket?.on events go here
}
On the client side, my connection code looks like:
const io = require('socket.io')(http, {
pingTimeout: 10000
});
io.on('connection', onConnection);
function onConnection(socket){
let headerDatabaseID = socket.handshake.headers.id
//in the for loop below, I essentially disconnect any socket that has the same database ID as the one that just connected (in this case, when a client is in a room with other clients,
//and then he/she switches from WiFi to LTE, the client reconnects to the socket server and this removes the old connection from the room it was in)
for (let [id, connectedSocket] of io.sockets.sockets) {
if (connectedSocket.databaseID == headerDatabaseID && socket.id != id) {
connectedSocket.disconnect()
break
}
}
//socket.on() events here
}
My issue is this--how do I go about reconnecting the client when it makes the passive network switch (WiFi -> LTE or vice versa)? I thought that just adding .reconnects(true) would work but for some reason, it's not...
Please let me know if I can be more detailed/helpful or if you'd like to see other codes.
I believe the solution to you problem can be either simple or complex; that depends on your requirements. I assume that each chat room has its own ID.
If you store that ID in memory on the device, when the user reconnects, you can have the socket reconnect to the room ID you had last and they will re-join that room. This is insecure.
If rooms are protected and not public, someone may be able to connect to a room that they are not allowed in if they know/can guess the room ID. To solve that problem, you'd need to implement some sort of authentication or server side database that keep keep track of that sort of stuff.
Considering the behavior varies based on whether the handoff is manual or passive it sounds like the issue is on the iOS client. I notice that you are using sockets - it seems to be some sort of custom sockets package, right? Is there a reason for using this? URLSession is a higher level implementation and it manages things like handoff.
There is something called Wifi assist, developed by apple, to manage handoff. It is part of the OS and manages this internally. According to apple: "Using URLSession and the Network framework already gives us the new WiFi assist benefits.". This was released in iOS 9, in Sept 2015.
But if you are using some other kind of sockets, whatever this "socketIOClient" is - especially packages developed prior to Sept 2015, you are probably bypassing Wifi assist. The latest version of SocketIO client I see was written in 2015 and it appears support for this package was discontinued when iOS 9 came out.
When the user manually changes the connection this is manually prompting the OS to tear down & reestablish the connection, whereas with passive tradeoff it normally relies on this Wifi Assist.
You could try to programmatically tear down & reestablish the connection when you detect that a passive handoff has occurred, but I wouldn't recommend this... for starters, it will make your code much messier. It will probably degrade the user experience. But worse, this may not be the only problem you run into using this outdated socketIO package. There's really no telling what kind of maintenance problems you will wind up with. Better to just refactor your code to use the up to date networking mechanisms provided by iOS.
If .reconnects(true) isn't working, you can try to manually take care of the problem with Apple's Reachability. This may make it easier - it's the Reachability functionality "re-written in Swift with closures."
In your case, you might use it as such:
let reachability = try! Reachability()
reachability.whenReachable = { reachability in
if reachability.connection == .wifi {
print("Reachable via WiFi")
self.socket.disconnect();
establishConnection() //this is your method defined in the question
} else {
print("Reachable via Cellular")
self.socket.disconnect();
establishConnection() //this is your method defined in the question
}
}
reachability.whenUnreachable = { _ in
print("Not reachable")
}
do {
try reachability.startNotifier()
} catch {
print("Unable to start notifier")
}
I am using the library node_redis as the client for a micro-service message client that I am writing. Clients get messages from their application in their outbox that they need to send to other services. Everything is working great but I am trying to build some resilience on the part of the application that uses the redis client to communicate with the redis-server.
My idea is that the redis client-server connection status should be highly available to the clients. I mean that if a connection goes down, I would like to know within the second instead of the default timeout of 300 seconds. Currently I am using the free redislabs tier hosted on AWS, but I should be moving this to run in its own container on my kubernetes cluster.
I need to know the state of client connections in the network because I would like to not send messages when the network conditions are not right and not rely on the error handling to handle this sort of event. Knowing how often and when these high latency events occur will also help me diagnose and improve my network and my microservices.
Note: I wanted to set the connect_timeout value in the client options but this is listed as deprecated.
Something like that?
var redis = require('redis');
var client = redis.createClient();
var reconnectAfter = 15000;
client.on( 'error', function () {
console.log( (new Date()) + " Redis: disconnect");
setTimeout( connect, reconnectAfter);
});
connect = function(){
client = redis.createClient();
}
So, I have a Express NodeJS server that is making a connection with another app via an upagraded WebSocket uri for a data feed. If this app goes down, then obviously the WebSocket connection gets closed. I need to reconnect with this uri once the app comes back online.
My first approach was to use a while loop in the socket.onclose function to keep attempting to make the re-connection once the app comes back online, but this didn't seem to work as planned. My code looks like this:
socket.onclose = function(){
while(socket.readyState != 1){
try{
socket = new WebSocket("URI");
console.log("connection status: " + socket.readyState);
}
catch(err) {
//send message to console
}
}
};
This approach keeps giving me a socket.readyState of 0, even after the app the URI is accessing is back online.
Another approach I took was to use the JavaScript setTimout function to attempt to make the connection by using an exponential backoff algorithm. Using this approach, my code in the socket.onclose function looks like this:
socket.onclose = function(){
var time = generateInterval(reconnAttempts); //generateInterval generates the random time based on the exponential backoff algorithm
setTimeout(function(){
reconnAttempts++; //another attempt so increment reconnAttempts
socket = new WebSocket("URI");
}, time);
};
The problem with this attempt is that if the app is still offline when the socket connection is attempted, I get the following error, for obvious reasons, and the node script terminates:
events.js:85
throw er; // Unhandled 'error' event
Error: connect ECONNREFUSED
at exports._errnoException (util.js:746:11)
at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1010:19)
I also began using the forever node module to ensure that my node script is always running and to make sure it gets restarted after an unexpected exit. Even though I'm using forever, after a few restarts, forever just stops the script anyway.
I am basically just looking for a way to make my NodeJS server more robust and automatically re-connect with another server that may have gone down for some reason, instead of having to manually restart the node script.
Am I completely off base with my attempts? I am a noob when it comes to NodeJS so it may even be something stupid that I'm overlooking, but I have been researching this for a day or so now and all of my attempts don't seem to work as planned.
Any suggestions would be greatly appreciated! Thanks!
Few suggestions
1) Start using domain which prevents your app from an unexpected termination. Ie your app will run under the domain(run method of domain). You can implement some alert mechanism such as email or sms to which will notify when any error occurs.
2) Start using socket.io for websocket communication, it automatically handles the reconnection. Socket.io uses keep-alive heartbeat and continuously polls from the server.
3) Start using pm2 instead of forever. Pm2 allows clustering for your app which improves the performance.
I think this may improve your app's performance, stability and robustness.
In my webapp client side script I'm using the OrientDB Javascript API (orientdb-api.js). When the script initializes I run this code:
var orientdb = new ODatabase("http://localhost:2480/testapp");
var orientdbinfo = orientdb.open('root', 'admin');
This works fine and I can do all the various queries etc, as long as I don't wait more than 15 seconds between them. If I do, I get "error 401 (Unauthorised)" returned.
I know for a fact that this is the socket connection timing out. The timeframe matches the 15000ms timeout setting in the config. Also, as a test I've built a little button that calls the orientdb.open method above and reopens the connection. After I hit that button I can access the DB again.
Currently the queries and commands are being called directly in my script as I trigger actions from my web UI. Am I just being lazy and am I actually supposed to wrap every query in a function that tests the connection first and re-initializes if it is closed, or is there something I'm missing? If the former, what is an elegant way of coding that? If the latter, what am I missing?
To get around this I'm running a setInterval function that opens a new socket every 14 seconds. That will get me through my testing for sure, but I realise it's a hack.
When you start the OrientDB server, it creates two sockets: 2424 (binary) and 2480 (HTTP).
Because OrientJS uses the binary protocol, you need to connect to port 2424.
Try:
var orientdb = new ODatabase("http://localhost:2424/testapp");
var orientdbinfo = orientdb.open('root', 'admin');
And the socket should stay open (longer).
I am setting up a simple Node.js REST service to interface with Elasticsearch, using the official Javascript client. I'm running this code locally, but the cluster is located remotely. When I go trough the browser, with the _head plugin, I can connect ES and query with no problem. However, doing so via the Javascript client times out all requests. I set up the ElasticSearch object, but sending any request to it simply doesn't work. I don't think it's a network issue, because I can access ES trough the browser. This is how I request something, a very basic get:
var elasticsearch = require("elasticsearch");
var es = new elasticsearch.Client({
host: "https://my-address:9200/", // also tried without protocol part and trailing slashes
log: "error",
sniffOnStart: true
});
es.get({
index: "things",
type: "someThing",
id: "42"
}).then(doSomeStuff, handleStuffFailed);
This fails with a simple error message Errror: Request timeout after 30000ms.
Am I missing something here? I've read trough the client docs, and this seems like the basic "hello world" for the client.
Try extending the requestTimeout parameter when instantiating the ES Client.
client = new elasticsearch.Client({
host : 'http://localhost:9200',
requestTimeout: 60000
});
I had a long-running process which took just under 10 minutes. By making the requestTimeout value 60000 (10 mins) the process could complete without timing out.
We also had this issue on QBox because of sniffOnStart.
Try with this config:
var es = new elasticsearch.Client({
host: "my-address:9200",
log: "trace",
sniffOnStart: true
});
You'll see that the added nodes ip are the private ip.
On our side, we decided to disable the sniffing and add manually the array of public node host addresses like this:
var es = new elasticsearch.Client({
hosts: ["my-address1:9200", "my-address2:9200", "my-address3:9200"],
log: "error"
});
Regarding timeouts in elastic search, you need to differentiate between two types of timeouts:
Initialization timeout: When you initialize ES: requestTimeout, pingTimeout both of which defaults to 30000ms. Read more: Configuration documentation in elastic search documentation
Operation-based timeouts: Many operations such as bulk, create, delete, index can set timeout too. Say if you have a huge bulk object to insert, you can just set operation-based timeouts: Elastic Search documentations
You should know that operation-based timeouts overwrite the initializationRequestTimeout.
Check here regarding this issue: https://github.com/elastic/elasticsearch-js/issues/186
I guess we need to use the requestTimeout variable as mentioned above.
Check following items if you see
Discover: Request Timeout after 30000ms
Make sure Elasticsearch CPU/Memory is not chocking
If there is a lot of data for query window then it is possible that request
times out within 30000ms
Increase timeout for kibana in kibana.yml --> elasticsearch.requestTimeout:
120000
Restart kibana service
Decrease ammount of data loaded by kibana dashboard
discover:sampleSize under Management - Advanced settings --> Change the
value accordingly
If there is any Load Balancer in between then increase timeout settings
from LB as well.
If you're running more than one node per server, try locking down the number of processors that each jvm gets access to. we had this problem and doing this fixed it. We think one node was using too much system resources and it would cause the other node on the same server to be slow to respond to the master node when queried for status.